How to Create Dummy Variables Using get_dummies in R with Examples

A dummy variable is a type of variable that represents a categorical variable as a numerical variable that takes on one of two values: 1 or 0. In this tutorial, I’ll show a step-by-step process how to create dummy variables using get_dummies.() in R with some examples.

Let’s create a sample dataframe with five variables (id, age, marital_status, city, salary) as below to check the functionality of get_dummies.():

df <- data.frame(
id = c(1, 2, 3),
age = c(35, 20, 40),
marital_status = c("Married", "Single", "Divorced"),
city = c("Dhaka", "Gazipur", NA),
salary = c(41000, 21000, 42000)
)

Now, let’s run the below code to see the variable types of the dataframe.

>str(df)

'data.frame': 3 obs. of 5 variables:
$ id : num 1 2 3
$ age : num 35 20 40
$ marital_status: chr "Married" "Single" "Divorced"
$ city : chr "Dhaka" "Gazipur" NA
$ salary : num 41000 21000 42000

Here is the basic structure of the get_dummies.() function. In this part, I’ll try to explain of each parameter of the function.

get_dummies.(
.df,
cols = NULL,
prefix = TRUE,
prefix_sep = “_”,
drop_first = FALSE,
dummify_na = TRUE
)

Parameter Parameter Value/Description
.df A data.frame or data.table
cols NULL – makes dummy variables for all character & factor variables

Single variable name – convert inputted single variable or vector to dummy variable

Multiple variable name – convert inputted variables or vector of unquoted variables to dummy variables

prefix TRUE – a prefix will be added to new variable (s)

FALSE – no prefix will be added to new variable (s)

prefix_sep ” –  will be used for no separator before new dummy variable (s)

‘_’, ‘.’ etc. can be used as separator before new dummy variable (s)

NULL –  by default ‘_’ separator is used before new dummy variable (s)

drop_first TRUE – The first dummy variable will be dropped

FALSE – The first dummy variable will not be dropped, it is default value

dummify_na TRUE – NAs will also used for dummy variables

FALSE – NAs will not used for dummy variables, it is default value

Example-1: If you want to make all character/factor variables into dummy variables, you can to write the following code:

df %>%
get_dummies.()

id age marital_status city salary marital_status_married marital_status_single marital_status_divorced city_Dhaka city_Gazipur city_NA
1: 1 35 married Dhaka 41000 1 0 0 1 0 0
2: 2 20 single Gazipur 21000 0 1 0 0 1 0
3: 3 40 divorced Rajshahi 42000 0 0 1 0 0 1

Example-2: If you want to make one specific variable into dummy variables, you can to write the following code:

df %>%
get_dummies.(marital_status)

id age marital_status city salary marital_status_married marital_status_single marital_status_divorced
1: 1 35 married Dhaka 41000 1 0 0
2: 2 20 single Gazipur 21000 0 1 0
3: 3 40 divorced Rajshahi 42000 0 0 1

Example-3: If you want to make specific multiple variables into dummy variables, you can to write the following code:

df %>%
get_dummies.(c(marital_status, city))

id age marital_status city salary marital_status_married marital_status_single marital_status_divorced city_Dhaka city_Gazipur city_NA
1: 1 35 married Dhaka 41000 1 0 0 1 0 0
2: 2 20 single Gazipur 21000 0 1 0 0 1 0
3: 3 40 divorced Rajshahi 42000 0 0 1 0 0 1

Example-4: If you want to make put ‘.’ before dummy variables, you can to write the following code:

df %>%
get_dummies.(prefix_sep = ".", drop_first = TRUE)


id age marital_status city salary marital_status.single marital_status.divorced city.Gazipur city.NA
1: 1 35 married Dhaka 41000 0 0 0 0
2: 2 20 single Gazipur 21000 1 0 1 0
3: 3 40 divorced Rajshahi 42000 0 1 0 1

Example-5: If you don’t want dummify NA variables, you can to write the following code:

df %>%
get_dummies.(c(marital_status, city), dummify_na = FALSE)


id age marital_status city salary marital_status_married marital_status_single marital_status_divorced city_Dhaka city_Gazipur
1: 1 35 married Dhaka 41000 1 0 0 1 0
2: 2 20 single Gazipur 21000 0 1 0 0 1
3: 3 40 divorced <NA> 42000 0 0 1 0 0

In this tutorial, I tried to show how to create dummy variables using get_dummies.() function in R with examples. Hope you have enjoyed the tutorial. If you want to get updated, like the facebook page https://www.facebook.com/LearningBigDataAnalytics and stay connected.

Add a Comment