Control Flow#

R is just another programming language and here we are going to introduce some common programming paradigms that empower you to write code efficiently and control the flow of information.

Download Rmd Version#

If you wish to engage with this course content via Rmd, then please click the link below to download the Rmd file.

Download control_flow.Rmd

Learning Objectives#

  • Learn the fundamental concepts of control flow in R programming, including loops and conditional statements

  • Understand how to use for loops to automate repetitive tasks in R

  • Learn to implement if statements for decision-making processes in R

  • Recognize the importance of NA values and how to handle missing data in R

  • Learn to integrate loops with conditional statements for complex data processing

For Loops#

Code is often focused around highly repetative tasks. Suppose we want to print each word in a sentence. One way is to use six print statements:

best_practice <- c("Let", "the", "computer", "do", "the", "work")

print(best_practice[1])
print(best_practice[2])
print(best_practice[3])
print(best_practice[4])
print(best_practice[5])
print(best_practice[6])
[1] "Let"
[1] "the"
[1] "computer"
[1] "do"
[1] "the"
[1] "work"

but that’s a bad approach for two reasons:

  1. It doesn’t scale: if we want to print the elements in a vector that’s hundreds long, we’d be better off just typing them in.

  2. It’s fragile: if we give it a longer vector, it only prints part of the data, and if we give it a shorter input, it returns NA values because we’re asking for elements that don’t exist!

Here’s a better approach:

We can loop though best_practice to print our sentence more easily. We can count the length of the sentence here, using len.

len <- 0
for (word in best_practice) {
  print(word)
  len <- len + 1
}
# Number of vowels
len
[1] "Let"
[1] "the"
[1] "computer"
[1] "do"
[1] "the"
[1] "work"
6

The improved version of printing uses a for loop to repeat an operation—in this case, printing—once for each thing in a collection. The general form of a loop is:

for (variable in collection) {
  do things with variable
}

We can name the loop variable, here word, anything we like (with a few restrictions Introduction to R (CRAN Manual), e.g. the name of the variable cannot start with a digit). in is part of the for syntax. Note that the body of the loop is enclosed in curly braces { }. For a single-line loop body, as here, the braces aren’t needed, but it is good practice to include them, as we did.

It’s worth tracing the execution of this little program step by step. Since there are five elements in the vector best_practice, the statement inside the loop will be executed five times. The first time around, len is zero (the value assigned to it on line 1) and word is "Let". The statement adds 1 to the old value of len, producing 1, and updates len to refer to that new value. The next time around, word is "the" and len is 1, so len is updated to be 2. After three more updates, len is 5; since there is nothing left in the vector best_practice for R to process, the loop finishes.

# after the loop, the value of 'word' is
word
'work'

Note that a loop variable is just a variable that’s being used to record progress in a loop. It still exists after the loop is over, and we can re-use variables previously defined as loop variables as well.

Note also that finding the length of a vector is such a common operation that R actually has a built-in function to do it called length:

length(best_practice)
6

length is much faster than any R function we could write ourselves, and much easier to read than a two-line loop; it will also give us the length of many other things that we haven’t met yet, so we should always use it when we can.

Activity#

Can you edit the for loop to print the sentence backwards?

if Statements#

As well as repeating tasks, it’s possible we want R to only perform certain tasks in certain situations. To do this we need to write code that automatically decides between multiple options. The tool R gives us for doing this is called a conditional statement, and looks like this:

num <- 37
if (num > 100) {
  print("greater")
} else {
  print("not greater")
}
print("done")
[1] "not greater"
[1] "done"

The second line of this code uses an if statement to tell R that we want to make a choice. If the following test is true, the body of the if (i.e., the lines in the curly braces underneath it) are executed. If the test is false, the body of the else is executed instead. Only one or the other is ever executed:

Flowchart illustrating a conditional statement in Python. It starts with a decision diamond labeled "x > 0?". If true, the flow continues to a box labeled "print('Positive')". If false, the flow moves to a box labeled "print('Non-positive')". This shows the basic structure of an if-else statement.

In the example above, the test num > 100 returns the value FALSE, which is why the code inside the if block was skipped and the code inside the else statement was run instead.

num > 100
FALSE

And as you likely guessed, the opposite of FALSE is TRUE.

num < 100
TRUE

Conditional statements don’t have to include an else. If there isn’t one, R simply does nothing if the test is false:

num <- 53
if (num > 100) {
  print("num is greater than 100")
}

We can also chain several tests together when there are more than two options. This makes it simple to write code that returns the sign of a number:

if (num > 100) {
    print("num is greater than 100")
} else if (num == 100) {
    print("num is 100")
} else {
    print("num is less than 100")
}
num <- 100 # set num and see which statement you end up in

if (num > 100) {
    print("num is greater than 100")
} else if (num == 100) {
    print("num is 100")
} else {
    print("num is less than 100")
}

Note that when combining else and if in an else if statement (similar to elif in Python), the if portion still requires a direct input condition. This is never the case for the else statement alone, which is only executed if all other conditions go unsatisfied. Note that the test for equality uses two equal signs, ==.

Other Comparisons#

Other tests include greater than or equal to (>=), less than or equal to (<=), and not equal to (!=).

We can also combine tests. An ampersand, &, symbolizes “and”. A vertical bar, |, symbolizes “or”. & is only true if both parts are true:

if (1 > 0 & -1 > 0) {
    print("both parts are true")
} else {
  print("at least one part is not true")
}
[1] "at least one part is not true"

while | is true if either part is true:

if (1 > 0 | -1 > 0) {
    print("at least one part is true")
} else {
  print("neither part is true")
}
[1] "at least one part is true"

In this case, “either” means “either or both”, not “either one or the other but not both”.

Activity: Combining for loops and control flow#

Write a for loop that iterates through numbers 1 to 10 but only prints numbers greater than 3 and less than 7.

Summary Quiz#

In R, what value is used to represent missing data?

What would the following R code do? for (i in 1:5) { print(i) }

Which of the following is the correct syntax for a for loop in R?