How to Create Dummy Variables Using ifelse Statement in R with an Example
A dummy variable is a type of variable that represents a categorical variable as a numerical variable that takes on one of two values: 1 or 0. In this tutorial, I’ll show a step-by-step process how to create dummy variables using ifelse statement in R with an Example.
Let’s create a sample dataframe with three variables id, age, marital_status, salary and we want to predict salary by age and marital_status. In the example of the tutorial, I’ll use the following sample data frame:
>df <- data.frame(
id = c("001", "002", "003"),
age = c(35, 20, 40),
marital_status = c("Married", "Single", "Divorced"),
salary = c(41000, 21000, 42000)
)
Now, let’s run the below code to see the variable types of the dataframe.
>str(df)
'data.frame': 3 obs. of 4 variables:
$ id : chr "001" "002" "003"
$ age : num 35 20 40
$ marital_status: chr "Married" "Single" "Divorced"
$ salary : num 41000 21000 42000
To use marital_status as a predictor variable in a regression model, we have to convert it into a dummy variable. Since, it is currently a character variable that can take on three different values (“Single”, “Married”, or “Divorced”).
Now, we can use the ifelse() statement in R, to define dummy variables and then define the final data frame we’d like to use to build the regression model:
>marital_status_single <- ifelse(df$marital_status== 'Single', 1, 0)
>marital_status_married <- ifelse(df$marital_status== 'Married', 1, 0)
>marital_status_divorced <- ifelse(df$marital_status== 'Divorced', 1, 0)
Re-creating the dataframe with the newly created dummy variables
>ddf <- data.frame(id = df$id,
age = df$age,
salary = df$salary,
marital_status_Single = marital_status_Single,
marital_status_Married = marital_status_Married,
marital_status_Divorced = marital_status_Divorced
)
To see the newly created final data frame we have to execute the following command:
>str(ddf)
'data.frame': 3 obs. of 6 variables:
$ id : chr "001" "002" "003"
$ age : num 35 20 40
$ salary : num 41000 21000 42000
$ marital_status_Single : num 0 1 0
$ marital_status_Married : num 1 0 0
$ marital_status_Divorced: num 0 0 1
In this tutorial, I tried to show how to create dummy variables using ifelse statement in R with an example. Hope you have enjoyed the tutorial. If you want to get updated, like the facebook page https://www.facebook.com/LearningBigDataAnalytics and stay connected.