|
A variable is a
factor to R if it is a categorical variable. For example, let’s say we have
a variable for occupation like: occup={"doctor","engineer","software
programmer"} – thus, the variable occup has three categories. R knows it as
a factor. When used in a regression model, R will make dummy variables out
of it automatically, assigning first level as reference (in this instance
"doctor"). Sometimes, however, a categorical variable may contain numerical
values. For example, let’s say grades for a student carry numbers like
1,2,3, and so on. We can represent grades as grades={1,2,3,4}. Here,
"grades" is a categorical variable with levels 1,2,3, and 4. Unless
specifically stated as factor(grades), R will treat grades as a numerical
continuous variable in the context of a regression equation. This
distinction between factors and regressors (continuous variables that can be
put in a linear regression equation) needs to be emphasized. Also, R creates
dummy variables automatically; you need not specify them in the equation. |