Loading Data#

Download Rmd Version#

If you wish to engage with this course content via Rmd, then please click the link below to download the Rmd file.

Download load_data.Rmd

Learning Objectives#

  • Learn how to load data from CSV files into R using the read.csv function.

  • Recognize and understand the arguments for the read.csv functions, particularly file and header

  • Execute data loading operations and assign the data to a variable for further use

  • Understand the significance of the header argument and its default value in the read.csv function

  • Gain awareness of other useful arguments and methods for importing data using read.csv

Loading Data From Files#

More often than not you will need to load data from a file into R that you want to analyse. We are going to use a dataset in the file “data/worms.csv” which you can download from the course webpage. This is a comma-separated values (CSV) format which means that a comma is used to indicate the end of a column.

We need to tell our computer where the file that contains the values is. If we forget this step we’ll get an error message when trying to read the file. We can load the data into R using read.csv.

Assuming you have downloaded the file into your current working directory, you can execute the following

%%R
read.csv(file = "data/worms.csv", header = TRUE)
          Field.Name Area Slope Vegetation Soil.pH  Damp Worm.density
1        Nashs Field  3.6    11  Grassland     4.1 FALSE            4
2     Silwood Bottom  5.1     2     Arable     5.2 FALSE            7
3      Nursery Field  2.8     3  Grassland     4.3 FALSE            2
4        Rush Meadow  2.4     5     Meadow     4.9  TRUE            5
5    Gunness Thicket  3.8     0      Scrub     4.2 FALSE            6
6           Oak Mead  3.1     2  Grassland     3.9 FALSE            2
7       Church Field  3.5     3  Grassland     4.2 FALSE            3
8            Ashurst  2.1     0     Arable     4.8 FALSE            4
9        The Orchard  1.9     0    Orchard     5.7 FALSE            9
10     Rookery Slope  1.5     4  Grassland     5.0  TRUE            7
11       Garden Wood  2.9    10      Scrub     5.2 FALSE            8
12      North Gravel  3.3     1  Grassland     4.1 FALSE            1
13      South Gravel  3.7     2  Grassland     4.0 FALSE            2
14 Observatory Ridge  1.8     6  Grassland     3.8 FALSE            0
15        Pond Field  4.1     0     Meadow     5.0  TRUE            6
16      Water Meadow  3.9     0     Meadow     4.9  TRUE            8
17         Cheapside  2.2     8      Scrub     4.7  TRUE            4
18        Pound Hill  4.4     2     Arable     4.5 FALSE            5
19        Gravel Pit  2.9     1  Grassland     3.5 FALSE            1
20         Farm Wood  0.8    10      Scrub     5.1  TRUE            3

We have provided two arguments to this function:

  1. file - the name of the file we want to read,

  2. header - whether the first line of the file contains names for the columns of data.

The filename needs to be a character string (or string for short), so we put it in quotes. The header argument needs to be a logical, we have set TRUE indicating that the data file does have column headers.

Since we didn’t tell it to do anything else with the function’s output, the console will display the full contents of the file worms.csv. Try it out.

read.csv reads the file, but we can’t easily use data unless we assign it to a variable. Let’s re-run read.csv and save the result:

%%R
df <- read.csv(file = "data/worms.csv", header = TRUE)

Some of the functions we introduced earlier can be used to summarise the properties of the worms dataset.

Other options for Reading CSV Files#

read.csv actually has many more arguments that you may find useful when importing your own data in the future. You can learn more about these options here.

Loading Data With Header#

What happens if you forget to put header = FALSE? The default value is header = TRUE, which you can check with ?read.csv or help(read.csv). What do you expect will happen if you leave the default value? Before you run any code, think about what will happen to the first few rows of your data frame, and its overall size.

Summary Quiz#