How to Subset a Data Frame in R with Examples
Sometimes you may need to create a subset of a data frame. There are many ways to do it in R. In this tutorial, I’ll show you different methods how to subset a Data Frame in R with some examples.
Let’s create a sample data frame to see different methods of subsets in practical as below:
df <- data.frame( id = c(2, 3, 4, 6, 7, 8, 9, 10), name = c("Minhaj","Rahman","Khan","Fahim","Rahman","Fayeza","Lamia","Akash"), avg_marks=c(99, 32, 91, 81, 50, 31, 65, 100), category = c("Green","Red","Green","Orange","White","Red","Yellow","Green") )
id name avg_marks category 1 2 Minhaj 99 Green 2 3 Rahman 32 Red 3 4 Khan 91 Green 4 6 Fahim 81 Orange 5 7 Rahman 50 White 6 8 Fayeza 31 Red 7 9 Lamia 65 Yellow 8 10 Akash 100 Green
To work with data frame you need to know the basic syntax of a data frame. Here is the basic syntax of a data frame in R is as follows:
df[rows, columns]
Example 1: Subset Data Frame by Selecting Columns
In this example I’ll show you how to subset a data frame by column names. The following code will select all rows for columns ‘id’ and ‘avg_marks’:
df[ , c('id', 'avg_marks')]
id avg_marks 1 2 99 2 3 32 3 4 91 4 6 81 5 7 50 6 8 31 7 9 65 8 10 100
In this example I’ll show you how to subset a data frame by column index. The following code will select all rows for the column index 1 and 3:
df[ , c(1, 3)]
id avg_marks 1 2 99 2 3 32 3 4 91 4 6 81 5 7 50 6 8 31 7 9 65 8 10 100
Example 2: Subset Data Frame by Excluding Columns
In this example, I’ll show you how to subset a data frame by excluding specific column’s name. The following code will select all rows but specific columns excluding column:
cols <- names(df) %in% c('name','category')
df[,!cols]
id avg_marks 1 2 99 2 3 32 3 4 91 4 6 81 5 7 50 6 8 31 7 9 65 8 10 100
In this example, I’ll show you how to subset a data frame by excluding specific column’s index. The following code will select all rows but specific columns excluding column index 2 and 4:
df[ , -c(2, 4)]
id avg_marks 1 2 99 2 3 32 3 4 91 4 6 81 5 7 50 6 8 31 7 9 65 8 10 100
Example 3: Subset Data Frame by Selecting Specific Rows
In this example, I’ll show you how to subset a data frame by selecting specific rows’ index. The following code shows how to subset a data frame by specific rows’s index 1, 3 & 5:
df[c(1, 3, 5), ]
id name avg_marks category 1 2 Minhaj 99 Green 3 4 Khan 91 Green 5 7 Rahman 50 White
In this example, I’ll show you how to subset a data frame by selecting specific range of rows’ index. The following code shows how to subset a data frame by specific rows’s range of index 1 to 3:
df[1:3, ]
id name avg_marks category 1 2 Minhaj 99 Green 2 3 Rahman 32 Red 3 4 Khan 91 Green
Example 4: Subset Data Frame By Using Subset Function with Conditions
In this example, I’ll show you how to subset a data frame by using a certain condition. The following code shows how to subset a data frame by using subset function to select rows and columns that meet certain condition:
subset(df, avg_marks > 80)
id name avg_marks category 1 2 Minhaj 99 Green 3 4 Khan 91 Green 4 6 Fahim 81 Orange 8 10 Akash 100 Green
In this example, I’ll show you how to subset a data frame by using multiple conditions with OR (|) operators. The following code shows how to subset a data frame by using subset function to select rows and columns that meet multiple conditions with OR operators:
subset(df, avg_marks > 80 | avg_marks < 33)
id name avg_marks category 1 2 Minhaj 99 Green 2 3 Rahman 32 Red 3 4 Khan 91 Green 4 6 Fahim 81 Orange 6 8 Fayeza 31 Red 8 10 Akash 100 Green
In this example, I’ll show you how to subset a data frame by using multiple conditions with AND (&) operators. The following code shows how to subset a data frame by using subset function to select rows and columns that meet multiple conditions with OR operators:
subset(df, avg_marks > 95 & category=='Green')
id name avg_marks category 1 2 Minhaj 99 Green 8 10 Akash 100 Green
In this example, I’ll show you how to subset a data frame by using multiple conditions with selecting certain columns. The following code shows how to subset a data frame by using subset function to select rows and certain columns that meet multiple conditions:
subset(df, avg_marks > 95 & category=='Green', select=c('id','avg_marks'))
id avg_marks 1 2 99 8 10 100
In this tutorial, I have shown how to subset a data frame in different ways in R programming. In addition, please subscribe to the facebook page https://www.facebook.com/LearningBigDataAnalytics to get updates on new posts.