if ifelse() had more if’s

Problem

The ifelse() function only allows for one “if” statement, two cases. You could add nested “if” statements, but that’s just a pain, especially if the 3+ conditions you want to use are all on the same level, conceptually. Is there a way to specify multiple conditions at the same time?

Context

I was recently given some survey data to clean up. It looked something like this (but obviously much larger):

tabletest.png

I needed to classify people in this data set based on whether they had passed or failed certain tests.

I wanted to separate the people into three groups:

  • People who passed both tests: Group A
  • People who passed one test: Group B
  • People who passed neither test: Group C

I thought about using a nested ifelse statement, and I certainly could have done that. But that approach didn’t make sense to me. The tests are equivalent and not given in any order; I simply want to sort the people into three equal groups. Any nesting of “if” statements would seem to imply a hierarchy that doesn’t really exist in the data. Not to mention that I hate nesting functions. It’s confusing and hard to read. 

Solution

Once again, dplyr to the rescue! I’m becoming more and more of a tidyverse fan with each passing day. 

Turns out, dplyr has a function for exactly this purpose: case_when(). It’s also known as “a general vectorised if,” but I like to think of it as “if ifelse() had more if’s.” 

Here’s the syntax:

library(dplyr)
df <- df %>%
mutate(group = case_when(test1 & test2 ~ "A", # both tests: group A
xor(test1, test2) ~ "B", # one test: group B
!test1 & !test2 ~ "C" # neither test: group C
))

Output:

tabletest2.PNG

Let me translate the above into English. After loading the package, I reassign df, the name of my data frame, to a modified version of the old df. Then (%>%), I use the mutate function to add a new column called group. The contents of the column will be defined by the case_when() function.

case_when(), in this example, took three conditions, which I’ve lined up so you can read them more easily. The condition is on the left side of the ~, and the resulting category (A, B, or C) is on the right. I used logical operators for my conditions. The newest one to me was the xor() function, which is an exclusive or: only one of the conditions in the parentheses can be TRUE, not both. 

Outcome

Easily make conditional assignments within a data frame. This function is a little less succinct than ifelse(), so I’m probably not going to use it for applications with only two cases, where ifelse() would work fine. But for three or more cases, it can’t be beat. Notice that I could have added any number of conditions to my case_when() statement, with no other caveats.

I love this function, and I think we should all be using it.

9 thoughts on “if ifelse() had more if’s”

  1. Hi kaijagahm,
    in the given example, wouldn’t it be easier to use rowsums on the two columns?
    LETTERS[1:3][rowSums(df[ , 2:3])+1]
    It’s a one-liner, no need to use any add-on library, surely much faster, and easy to extend to more columns.

    Like

    1. Thanks for the suggestion! Yes, that would definitely have worked. I like that case_when can be extended to cases that don’t involve logicals, and that it’s integrable with other dplyr commands, since I use dplyr for a lot of data cleaning. In the actual data, too, there were lots of columns interspersed with the ones I needed to refer to, and I think the indexing would have become hard to follow. It’s a good idea to keep both alternatives in mind!

      Like

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s