Control Flow#
R is just another programming language and here we are going to introduce some common programming paradigms that empower you to write code efficiently and control the flow of information.
Download Rmd Version#
If you wish to engage with this course content via Rmd, then please click the link below to download the Rmd file.
Learning Objectives#
Learn the fundamental concepts of control flow in R programming, including loops and conditional statements
Understand how to use for loops to automate repetitive tasks in R
Learn to implement if statement for decision-making processes in R
Recognize the importance of NA values and how to handle missing data in R
Learn to integrate loops with conditional statements for complex data processing
For Loops#
Code is often focused around highly repetative tasks.
Suppose we want to print each word in a sentence.
One way is to use six print
statements:
%%R
best_practice <- c("Let", "the", "computer", "do", "the", "work")
print_words <- function(sentence) {
print(sentence[1])
print(sentence[2])
print(sentence[3])
print(sentence[4])
print(sentence[5])
print(sentence[6])
}
print_words(best_practice)
[1] "Let"
[1] "the"
[1] "computer"
[1] "do"
[1] "the"
[1] "work"
but that’s a bad approach for two reasons:
It doesn’t scale: if we want to print the elements in a vector that’s hundreds long, we’d be better off just typing them in.
It’s fragile: if we give it a longer vector, it only prints part of the data, and if we give it a shorter input, it returns
NA
values because we’re asking for elements that don’t exist!
%%R
best_practice[-6]
[1] "Let" "the" "computer" "do" "the"
%%R
print_words(best_practice[-6])
[1] "Let"
[1] "the"
[1] "computer"
[1] "do"
[1] "the"
[1] NA
Not Available#
R has has a special variable, NA
, for designating missing values that are
Not Available in a data set. See ?NA
and (An Introduction to R)[http://cran.r-project.org/doc/manuals/r-release/R-intro.html#Missing-values]
for more details.
Here’s a better approach:
%%R
print_words <- function(sentence) {
for (word in sentence) {
print(word)
}
}
print_words(best_practice)
[1] "Let"
[1] "the"
[1] "computer"
[1] "do"
[1] "the"
[1] "work"
This is shorter—certainly shorter than something that prints every character in a hundred-letter string—and more robust as well.
The improved version of print_words
uses a for loop to repeat an operation—in this case, printing—once for each thing in a collection.
The general form of a loop is:
for (variable in collection) {
do things with variable
}
We can name the loop variable anything we like (with a few restrictions, e.g. the name of the variable cannot start with a digit). in
is part of the for
syntax. Note that the body of the loop is enclosed in curly braces { }
. For a single-line loop body, as here, the braces aren’t needed, but it is good practice to include them, as we did.
Here’s another loop that repeatedly updates a variable:
%%R
len <- 0
vowels <- c("a", "e", "i", "o", "u")
for (v in vowels) {
len <- len + 1
}
# Number of vowels
len
[1] 5
It’s worth tracing the execution of this little program step by step. Since there are five elements in the vector vowels
, the statement inside the loop will be executed five times. The first time around, len
is zero (the value assigned to it on line 1) and v
is "a"
. The statement adds 1 to the old value of len
, producing 1, and updates len
to refer to that new value. The next time around, v
is "e"
and len
is 1, so len
is updated to be 2. After three more updates, len
is 5; since there is nothing left in the vector vowels
for R to process, the loop finishes.
Note that a loop variable is just a variable that’s being used to record progress in a loop. It still exists after the loop is over, and we can re-use variables previously defined as loop variables as well:
%%R
letter <- "z"
for (letter in c("a", "b", "c")) {
print(letter)
}
[1] "a"
[1] "b"
[1] "c"
%%R
# after the loop, letter is
letter
[1] "c"
Note also that finding the length of a vector is such a common operation that R actually has a built-in function to do it called length
:
%%R
length(vowels)
[1] 5
length
is much faster than any R function we could write ourselves, and much easier to read than a two-line loop; it will also give us the length of many other things that we haven’t met yet, so we should always use it when we can.
Activity#
Can you edit the for loop to print the sentence backwards?
if Statements#
As well as repeating tasks, it’s possible we want R to only perform certain tasks in certain situations. To do this we need to write code that automatically decides between multiple options. The tool R gives us for doing this is called a conditional statement, and looks like this:
%%R
num <- 37
if (num > 100) {
print("greater")
} else {
print("not greater")
}
print("done")
[1] "not greater"
[1] "done"
The second line of this code uses an if
statement to tell R that we want to make a choice. If the following test is true, the body of the if
(i.e., the lines in the curly braces underneath it) are executed. If the test is false, the body of the else
is executed instead. Only one or the other is ever executed:
In the example above, the test num > 100
returns the value FALSE
, which is why the code inside the if
block was skipped and the code inside the else
statement was run instead.
%%R
num > 100
[1] FALSE
And as you likely guessed, the opposite of FALSE
is TRUE
.
%%R
num < 100
[1] TRUE
Conditional statements don’t have to include an else
. If there isn’t one, R simply does nothing if the test is false:
%%R
num <- 53
if (num > 100) {
print("num is greater than 100")
}
We can also chain several tests together when there are more than two options. This makes it simple to write a function that returns the sign of a number:
%%R
sign <- function(num) {
if (num > 0) {
return(1)
} else if (num == 0) {
return(0)
} else {
return(-1)
}
}
sign(-3)
[1] -1
%%R
sign(0)
[1] 0
%%R
sign(2/3)
[1] 1
Note that when combining else
and if
in an else if
statement (similar to elif
in Python), the if
portion still requires a direct input condition. This is never the case for the else
statement alone, which is only executed if all other conditions go unsatisfied.
Note that the test for equality uses two equal signs, ==
.
Other Comparisons#
Other tests include greater than or equal to (>=
), less than or equal to (<=
), and not equal to (!=
).
We can also combine tests.
An ampersand, &
, symbolizes “and”.
A vertical bar, |
, symbolizes “or”.
&
is only true if both parts are true:
%%R
if (1 > 0 & -1 > 0) {
print("both parts are true")
} else {
print("at least one part is not true")
}
[1] "at least one part is not true"
while |
is true if either part is true:
%%R
if (1 > 0 | -1 > 0) {
print("at least one part is true")
} else {
print("neither part is true")
}
[1] "at least one part is true"
In this case, “either” means “either or both”, not “either one or the other but not both”.
Activity: Combining for loops and control flow#
Write a for loop that iterates through numbers 1 to 10 but only prints numbers greater than 3 and less than 7.