How to Subset a Data Frame in R with Examples

Sometimes you may need to create a subset of a data frame. There are many ways to do it in R. In this tutorial, I’ll show you different methods how to subset a Data Frame in R with some examples.

Let’s create a sample data frame to see different methods of subsets in practical  as below:

df <- data.frame(
id = c(2, 3, 4, 6, 7, 8, 9, 10),
name = c("Minhaj","Rahman","Khan","Fahim","Rahman","Fayeza","Lamia","Akash"),
avg_marks=c(99, 32, 91, 81, 50, 31, 65, 100),
category = c("Green","Red","Green","Orange","White","Red","Yellow","Green")
)
  id name avg_marks category
1 2 Minhaj 99 Green
2 3 Rahman 32 Red
3 4 Khan 91 Green
4 6 Fahim 81 Orange
5 7 Rahman 50 White
6 8 Fayeza 31 Red
7 9 Lamia 65 Yellow
8 10 Akash 100 Green

To work with data frame you need to know the basic syntax of a data frame. Here is the basic syntax of a data frame in R is as follows:

df[rows, columns]

Example 1: Subset Data Frame by Selecting Columns

In this example I’ll show you how to subset a data frame by column names. The following code will select all rows for columns ‘id’ and ‘avg_marks’:

df[ , c('id', 'avg_marks')]

  id avg_marks
1 2 99
2 3 32
3 4 91
4 6 81
5 7 50
6 8 31
7 9 65
8 10 100

In this example I’ll show you how to subset a data frame by column index. The following code will select all rows for the column index 1 and 3:

df[ , c(1, 3)]

  id avg_marks
1 2 99
2 3 32
3 4 91
4 6 81
5 7 50
6 8 31
7 9 65
8 10 100

Example 2: Subset Data Frame by Excluding Columns

In this example, I’ll show you how to subset a data frame by excluding specific column’s name. The following code will select all rows but specific columns excluding column:

cols <- names(df) %in% c('name','category')
df[,!cols]

  id avg_marks
1 2 99
2 3 32
3 4 91
4 6 81
5 7 50
6 8 31
7 9 65
8 10 100

In this example, I’ll show you how to subset a data frame by excluding specific column’s index. The following code will select all rows but specific columns excluding column index 2 and 4:

df[ , -c(2, 4)]

  id avg_marks
1 2 99
2 3 32
3 4 91
4 6 81
5 7 50
6 8 31
7 9 65
8 10 100

Example 3: Subset Data Frame by Selecting Specific Rows

In this example, I’ll show you how to subset a data frame by selecting specific rows’ index. The following code shows how to subset a data frame by specific rows’s index 1, 3 & 5:

df[c(1, 3, 5), ]

  id   name avg_marks category
1  2 Minhaj        99    Green
3  4   Khan        91    Green
5  7 Rahman        50    White

In this example, I’ll show you how to subset a data frame by selecting specific range of rows’ index. The following code shows how to subset a data frame by specific rows’s range of index 1 to 3:

df[1:3, ]

  id   name avg_marks category
1  2 Minhaj        99    Green
2  3 Rahman        32      Red
3  4   Khan        91    Green

Example 4: Subset Data Frame By Using Subset Function with Conditions

In this example, I’ll show you how to subset a data frame by using a certain condition. The following code shows how to subset a data frame by using subset function to select rows and columns that meet certain condition:

subset(df, avg_marks > 80)

  id   name avg_marks category
1  2 Minhaj        99    Green
3  4   Khan        91    Green
4  6  Fahim        81   Orange
8 10  Akash       100    Green

In this example, I’ll show you how to subset a data frame by using multiple conditions with OR (|) operators. The following code shows how to subset a data frame by using subset function to select rows and columns that meet multiple conditions with OR operators:

subset(df, avg_marks > 80 | avg_marks < 33)

  id   name avg_marks category
1  2 Minhaj        99    Green
2  3 Rahman        32      Red
3  4   Khan        91    Green
4  6  Fahim        81   Orange
6  8 Fayeza        31      Red
8 10  Akash       100    Green

In this example, I’ll show you how to subset a data frame by using multiple conditions with AND (&) operators. The following code shows how to subset a data frame by using subset function to select rows and columns that meet multiple conditions with OR operators:

subset(df, avg_marks > 95 & category=='Green')

  id   name avg_marks category
1  2 Minhaj        99    Green
8 10  Akash       100    Green

In this example, I’ll show you how to subset a data frame by using multiple conditions with selecting certain columns. The following code shows how to subset a data frame by using subset function to select rows and certain columns that meet multiple conditions:

subset(df, avg_marks > 95 & category=='Green', select=c('id','avg_marks'))

  id avg_marks
1  2        99
8 10       100

In this tutorial, I have shown how to subset a data frame in different ways in R programming. In addition, please subscribe to the facebook page https://www.facebook.com/LearningBigDataAnalytics to get updates on new posts.

Add a Comment