Data Types and Structures#

Learning Objectives#

  • Understand and identify the five core data types used in R (Numeric, Integer, Complex, Character, Logical)

  • Learn how to use functions like typeof(), class(), mode(), length(), and attributes() to examine the features of variables or objects in R

  • Learn methods to address columns in data frames using the $ operator or square brackets

  • Practice defining variables and using R functions to profile their characteristics

  • Understand and perform conversions between different data types in R

Inbuilt Datasets#

R has a number of datasets included for you to practice with. We can get a list of these by running the command data()

%%R
data()
File: /var/folders/r7/wblx0jw96hz08nvjz9p3zsgr0000gp/T//RtmpcTNU8Z/RpackageIQR5eca5ebeecfa
Data sets in package ‘.’:



bp_dataset_session4     

code_session2           

plotting_arguments      

r_training_session_2_script

                        

r_training_session_3_script

                        

r_training_session_4    

worms                   



Data sets in package ‘datasets’:



AirPassengers           Monthly Airline Passenger Numbers 1949-1960

BJsales                 Sales Data with Leading Indicator

BJsales.lead (BJsales)

                        Sales Data with Leading Indicator

BOD                     Biochemical Oxygen Demand

CO2                     Carbon Dioxide Uptake in Grass Plants

ChickWeight             Weight versus age of chicks on different diets

DNase                   Elisa assay of DNase

EuStockMarkets          Daily Closing Prices of Major European Stock

                        Indices, 1991-1998

Formaldehyde            Determination of Formaldehyde

HairEyeColor            Hair and Eye Color of Statistics Students

Harman23.cor            Harman Example 2.3

Harman74.cor            Harman Example 7.4

Indometh                Pharmacokinetics of Indomethacin

InsectSprays            Effectiveness of Insect Sprays

JohnsonJohnson          Quarterly Earnings per Johnson & Johnson Share

LakeHuron               Level of Lake Huron 1875-1972

LifeCycleSavings        Intercountry Life-Cycle Savings Data

Loblolly                Growth of Loblolly pine trees

Nile                    Flow of the River Nile

Orange                  Growth of Orange Trees

OrchardSprays           Potency of Orchard Sprays

PlantGrowth             Results from an Experiment on Plant Growth

Puromycin               Reaction Velocity of an Enzymatic Reaction

Seatbelts               Road Casualties in Great Britain 1969-84

Theoph                  Pharmacokinetics of Theophylline

Titanic                 Survival of passengers on the Titanic

ToothGrowth             The Effect of Vitamin C on Tooth Growth in

                        Guinea Pigs

UCBAdmissions           Student Admissions at UC Berkeley

UKDriverDeaths          Road Casualties in Great Britain 1969-84

UKgas                   UK Quarterly Gas Consumption

USAccDeaths             Accidental Deaths in the US 1973-1978

USArrests               Violent Crime Rates by US State

USJudgeRatings          Lawyers' Ratings of State Judges in the US

                        Superior Court

USPersonalExpenditure   Personal Expenditure Data

UScitiesD               Distances Between European Cities and Between

                        US Cities

VADeaths                Death Rates in Virginia (1940)

WWWusage                Internet Usage per Minute

WorldPhones             The World's Telephones

ability.cov             Ability and Intelligence Tests

airmiles                Passenger Miles on Commercial US Airlines,

                        1937-1960

airquality              New York Air Quality Measurements

anscombe                Anscombe's Quartet of 'Identical' Simple Linear

                        Regressions

attenu                  The Joyner-Boore Attenuation Data

attitude                The Chatterjee-Price Attitude Data

austres                 Quarterly Time Series of the Number of

                        Australian Residents

beaver1 (beavers)       Body Temperature Series of Two Beavers

beaver2 (beavers)       Body Temperature Series of Two Beavers

cars                    Speed and Stopping Distances of Cars

chickwts                Chicken Weights by Feed Type

co2                     Mauna Loa Atmospheric CO2 Concentration

crimtab                 Student's 3000 Criminals Data

discoveries             Yearly Numbers of Important Discoveries

esoph                   Smoking, Alcohol and (O)esophageal Cancer

euro                    Conversion Rates of Euro Currencies

euro.cross (euro)       Conversion Rates of Euro Currencies

eurodist                Distances Between European Cities and Between

                        US Cities

faithful                Old Faithful Geyser Data

fdeaths (UKLungDeaths)

                        Monthly Deaths from Lung Diseases in the UK

freeny                  Freeny's Revenue Data

freeny.x (freeny)       Freeny's Revenue Data

freeny.y (freeny)       Freeny's Revenue Data

infert                  Infertility after Spontaneous and Induced

                        Abortion

iris                    Edgar Anderson's Iris Data

iris3                   Edgar Anderson's Iris Data

islands                 Areas of the World's Major Landmasses

ldeaths (UKLungDeaths)

                        Monthly Deaths from Lung Diseases in the UK

lh                      Luteinizing Hormone in Blood Samples

longley                 Longley's Economic Regression Data

lynx                    Annual Canadian Lynx trappings 1821-1934

mdeaths (UKLungDeaths)

                        Monthly Deaths from Lung Diseases in the UK

morley                  Michelson Speed of Light Data

mtcars                  Motor Trend Car Road Tests

nhtemp                  Average Yearly Temperatures in New Haven

nottem                  Average Monthly Temperatures at Nottingham,

                        1920-1939

npk                     Classical N, P, K Factorial Experiment

occupationalStatus      Occupational Status of Fathers and their Sons

precip                  Annual Precipitation in US Cities

presidents              Quarterly Approval Ratings of US Presidents

pressure                Vapor Pressure of Mercury as a Function of

                        Temperature

quakes                  Locations of Earthquakes off Fiji

randu                   Random Numbers from Congruential Generator

                        RANDU

rivers                  Lengths of Major North American Rivers

rock                    Measurements on Petroleum Rock Samples

sleep                   Student's Sleep Data

stack.loss (stackloss)

                        Brownlee's Stack Loss Plant Data

stack.x (stackloss)     Brownlee's Stack Loss Plant Data

stackloss               Brownlee's Stack Loss Plant Data

state.abb (state)       US State Facts and Figures

state.area (state)      US State Facts and Figures

state.center (state)    US State Facts and Figures

state.division (state)

                        US State Facts and Figures

state.name (state)      US State Facts and Figures

state.region (state)    US State Facts and Figures

state.x77 (state)       US State Facts and Figures

sunspot.month           Monthly Sunspot Data, from 1749 to "Present"

sunspot.year            Yearly Sunspot Data, 1700-1988

sunspots                Monthly Sunspot Numbers, 1749-1983

swiss                   Swiss Fertility and Socioeconomic Indicators

                        (1888) Data

treering                Yearly Treering Data, -6000-1979

trees                   Diameter, Height and Volume for Black Cherry

                        Trees

uspop                   Populations Recorded by the US Census

volcano                 Topographic Information on Auckland's Maunga

                        Whau Volcano

warpbreaks              The Number of Breaks in Yarn during Weaving

women                   Average Heights and Weights for American Women





Use ‘data(package = .packages(all.available = TRUE))’

to list the data sets in all *available* packages.

---

This command won’t produce any output but open a file in the scripts pane that lists the available datasets. In this session, we will be using the built-in R data sets stored in the variables iris & mtcars. By having our data stored in a variable means we can easily use it with R functions and start to manipulate or process it. Note that we are using the word data very loosely, it can refer to any form of information we want to store.

Exploring datasets#

Let’s take a closer look at the iris data.

First, let’s ask what type of thing iris is using the function class()

%%R
class(iris)
[1] "data.frame"

The output tells us that it is a data frame. A data frame is an example of R object, and can be defined by certain properties. A data frame is comparable to a spreadsheet in MS Excel as it is a 2 dimensional object (i.e. has rows and columns). Data frames are very useful for storing data, especially if you are used to work with your data in tables. A typical data frame of experimental data contains individual observations in rows and variables in columns. We can see the shape, or dimensions, of the data frame with the function dim():

%%R
dim(iris)
[1] 150   5

This tells us that our data frame, iris, has 150 rows and 5 columns.

To explore data frames, there are a number of relevant functions:

  • head() - shows first 6 rows

  • tail() - shows last 6 rows

  • dim() - returns the dimensions of data frame (i.e. number of rows and number of columns)

  • nrow() - number of rows

  • ncol()- number of columns

  • str() - structure of data frame - name, type and preview of data in each column

  • names() or colnames() - both show the names attribute for a data frame

Subsetting Data#

There are many occasions we want to “look” at some part of the data. Extract a subset is known as slicing. If we want to get a single value from the data frame, we can index a specific position using square brackets. If you are familiar with matrices we index in the same way. For example, to get the element in the top left corner, i.e. in the first row and first column we run:

%%R
iris[1, 1]
[1] 5.1

We can use this principle to extract any entry of the matrix. For example the 30th row and 3rd column

%%R
iris[30,3]
[1] 1.6

An index like ``[30, 3]``` selects a single element of a data frame, but we can select larger sections as well. For example, we can select the first three columns of values for the first four rows like this:

%%R
iris[1:4, 1:3]
  Sepal.Length Sepal.Width Petal.Length
1          5.1         3.5          1.4
2          4.9         3.0          1.4
3          4.7         3.2          1.3
4          4.6         3.1          1.5

The slice 1:4 means, “Start at index 1 and go to index 4.” The slice does not need to start at 1, e.g. the line below selects rows 5 through 10:

%%R
iris[5:10, 1:3]
   Sepal.Length Sepal.Width Petal.Length
5           5.0         3.6          1.4
6           5.4         3.9          1.7
7           4.6         3.4          1.4
8           5.0         3.4          1.5
9           4.4         2.9          1.4
10          4.9         3.1          1.5

We can use the function c(), which stands for combine, to select non-contiguous entries:

%%R
iris[c(3, 8, 37, 56), c(1,3)]
   Sepal.Length Petal.Length
3           4.7          1.3
8           5.0          1.5
37          5.5          1.3
56          5.7          4.5

We also don’t have to provide a slice for either the rows or the columns. If we don’t include a slice for the rows, R returns all the rows; if we don’t include a slice for the columns, R returns all the columns. If we don’t provide a slice for either rows or columns, e.g. iris[, ], R returns the full data frame.

%%R
iris[5, ]
  Sepal.Length Sepal.Width Petal.Length Petal.Width Species
5            5         3.6          1.4         0.2  setosa
%%R
iris[, 2]
  [1] 3.5 3.0 3.2 3.1 3.6 3.9 3.4 3.4 2.9 3.1 3.7 3.4 3.0 3.0 4.0 4.4 3.9 3.5
 [19] 3.8 3.8 3.4 3.7 3.6 3.3 3.4 3.0 3.4 3.5 3.4 3.2 3.1 3.4 4.1 4.2 3.1 3.2
 [37] 3.5 3.6 3.0 3.4 3.5 2.3 3.2 3.5 3.8 3.0 3.8 3.2 3.7 3.3 3.2 3.2 3.1 2.3
 [55] 2.8 2.8 3.3 2.4 2.9 2.7 2.0 3.0 2.2 2.9 2.9 3.1 3.0 2.7 2.2 2.5 3.2 2.8
 [73] 2.5 2.8 2.9 3.0 2.8 3.0 2.9 2.6 2.4 2.4 2.7 2.7 3.0 3.4 3.1 2.3 3.0 2.5
 [91] 2.6 3.0 2.6 2.3 2.7 3.0 2.9 2.9 2.5 2.8 3.3 2.7 3.0 2.9 3.0 3.0 2.5 2.9
[109] 2.5 3.6 3.2 2.7 3.0 2.5 2.8 3.2 3.0 3.8 2.6 2.2 3.2 2.8 2.8 2.7 3.3 3.2
[127] 2.8 3.0 2.8 3.0 2.8 3.8 2.8 2.8 2.6 3.0 3.4 3.1 3.0 3.1 3.1 3.1 2.7 3.2
[145] 3.3 3.0 2.5 3.0 3.4 3.0

Addressing Columns by name#

Columns can also be addressed by name, with either the $ operator (i.e. iris$Petal.Length) or square brackets (ie. iris[,'Petal.Length']).

Data Types#

One key feature of a data frame is each column is classed as a specific data type.

Data types, or modes, define what the values are and how they can be used.

There are 5 core data types:

  • Numeric - all real numbers e.g. 7.5

  • Integer - e.g. 2

  • Complex - numbers with real and imaginary parts e.g. 1+4i

  • Character - consists of letters or numbers or symbols or a combination of these e.g. “a”, “f5”, “datatypes”, “Learning R is fun”.

  • Logical - only takes TRUE or FALSE

These data types are also used to characterise other one dimensional R objects such as individual values or vectors (more on these later). R provides a number of functions to examine the features of variables or objects.

Some examples:

  • typeof() - what is the object’s data type (on the data storage level (“what the computer sees”))?

  • class() - what is the object’s data type (on the abstract type level(“what R sees”))?

  • mode() - what is the object’s data type (on the data storage level (“what the computer sees”))?

  • length()- how long is it?

  • attributes() - does it have any metadata?

  • str - display the internal structure of an object.

  • is.numeric(), is.character(), is.complex(), is.logical() - returns TRUE when an object is the datatype queried, FALSE if not

Let’s define some variables we can use R to profile the charateristics of

%%R
x <- "dataset"
typeof(x)
[1] "character"
%%R
y <- 1:10
y
 [1]  1  2  3  4  5  6  7  8  9 10
%%R
typeof(y)
[1] "integer"
%%R
class(y)
[1] "integer"
%%R
length(y)
[1] 10

Converting Between Datatypes#

It can be critical that R has correctly assigned the right data type to your variable. If it has not you may run into errors when processing it. You therefore may want to convert between different data types. This can be done with the series of functions as.numeric, as.character etc.

For example, let’s convert our numeric variable y into a character.

%%R
y <- as.character(y)
typeof(y)
[1] "character"

Activity: Determining Data Yypes#

Create a variable with the numbers 9,2,200, and 14. What class do you predict this variable to be? Use an R function to confirm your answer. If it is not the data type you expected, can you force R to convert it to an integer? Divide each element of the variable in half. Does this change the type of variable?

Data Structures#

R has a number of inbuilt structures that can be used to store datasets. We have encountered one of these already the data.frame. Other include:

  • strings

  • vectors

  • data frames

  • matrices

  • arrays

  • lists

  • factors

Strings#

A string is a run of characters. e.g. “hello”. or “a189jde2mjo”. They are enclosed in quotes.

Vectors#

A vector is the most common and basic data structure in R. They are an ordered collection of basic data types of a given length. They are one-dimensional. We can think of each column of a data frame as a vector.

The concatenate or combine c() function will explicitly construct a vector.

%%R
v_num <- c(9,20,12)
v_log <- c(TRUE, FALSE, FALSE, TRUE)

As we create these vectors you should see them listed in your environment pane. We can then call the name of the varible to see what the value is at that point in time.

%%R
v_num 
[1]  9 20 12
%%R
v_log
[1]  TRUE FALSE FALSE  TRUE

The function c() can also be used to add elements to a vector.

%%R
v_name <- c("Sarah", "Tracy", "Jon")
v_name <- c(v_name, "Annette")
v_name
[1] "Sarah"   "Tracy"   "Jon"     "Annette"

If we want to create a series of numbers, like 1 to 140, we can bypass the c() constructor and just write

%%R
(v_int <- 1:4)
[1] 1 2 3 4

When we call a vector atomic, we mean that the vector only holds data of a single data type.

Data Frame#

A data frame is a two dimensional structure consisting of rows and columns.

We can create a data frames using the function data.frame(). We will use the predefined vector letter to get the first ten letters in the alphabet.

%%R
dat <- data.frame(id = letters[1:10], x = 1:10, y = 11:20)
dat
   id  x  y
1   a  1 11
2   b  2 12
3   c  3 13
4   d  4 14
5   e  5 15
6   f  6 16
7   g  7 17
8   h  8 18
9   i  9 19
10  j 10 20

Columns are variables, and each column will have a specified data type, which all entries must adhere to. Rows are observations. Rows and columns will have rownames and colnames that you can use to extract specific rows or columns respectively.

If you read in a table of data from a file, it will typically be represented by a data.frame.

Functions to explore data frames#

  • head() - shows first 6 rows

  • tail() - shows last 6 rows

  • dim() - returns the dimensions of data frame (i.e. number of rows and number of columns)

  • nrow() - number of rows

  • ncol()- number of columns

  • str() - structure of data frame - name, type and preview of data in each column

  • names() or colnames() - both show the names attribute for a data frame

Matrices#

A matrix is another two dimensional object, but it differs to a data.frame as all columns/entries must be of the same type. It is more efficient memory wise than a data.frame, but can not be used as a substitute to all data.frames.

We can construct a matrix as follows:

%%R
m<-matrix(1:6, nrow=2, ncol=3)
m
     [,1] [,2] [,3]
[1,]    1    3    5
[2,]    2    4    6

When creating a matrix, it is important to remember that matrices are filled column-wise If that is not what you want, you can use the byrow argument (a logical: can be TRUE or FALSE) to specify how the matrix is filled.

We can confirm the data type of a matrix with the function class.

%%R
class(m)
[1] "matrix" "array" 

Arrays#

Arrays are n dimensional storage structures. A one dimensional array is a vector, a two dimensional array is a matrix.

We will not be using this type of object, but it is included for completeness.

Lists#

A list in R is a collection of objects and elements, which themselves can be a heterogeneous mix of other objects including vectors, matrices, data.frames, functions, strings, numbers. They tend to be used to collate different data types connected in some way.

We can create a list of the vectors, data.frames we have already constructed as follows.

%%R
l <- list(v_log, v_num, iris)
l
[[1]]
[1]  TRUE FALSE FALSE  TRUE

[[2]]
[1]  9 20 12

[[3]]
    Sepal.Length Sepal.Width Petal.Length Petal.Width    Species
1            5.1         3.5          1.4         0.2     setosa
2            4.9         3.0          1.4         0.2     setosa
3            4.7         3.2          1.3         0.2     setosa
4            4.6         3.1          1.5         0.2     setosa
5            5.0         3.6          1.4         0.2     setosa
6            5.4         3.9          1.7         0.4     setosa
7            4.6         3.4          1.4         0.3     setosa
8            5.0         3.4          1.5         0.2     setosa
9            4.4         2.9          1.4         0.2     setosa
10           4.9         3.1          1.5         0.1     setosa
11           5.4         3.7          1.5         0.2     setosa
12           4.8         3.4          1.6         0.2     setosa
13           4.8         3.0          1.4         0.1     setosa
14           4.3         3.0          1.1         0.1     setosa
15           5.8         4.0          1.2         0.2     setosa
16           5.7         4.4          1.5         0.4     setosa
17           5.4         3.9          1.3         0.4     setosa
18           5.1         3.5          1.4         0.3     setosa
19           5.7         3.8          1.7         0.3     setosa
20           5.1         3.8          1.5         0.3     setosa
21           5.4         3.4          1.7         0.2     setosa
22           5.1         3.7          1.5         0.4     setosa
23           4.6         3.6          1.0         0.2     setosa
24           5.1         3.3          1.7         0.5     setosa
25           4.8         3.4          1.9         0.2     setosa
26           5.0         3.0          1.6         0.2     setosa
27           5.0         3.4          1.6         0.4     setosa
28           5.2         3.5          1.5         0.2     setosa
29           5.2         3.4          1.4         0.2     setosa
30           4.7         3.2          1.6         0.2     setosa
31           4.8         3.1          1.6         0.2     setosa
32           5.4         3.4          1.5         0.4     setosa
33           5.2         4.1          1.5         0.1     setosa
34           5.5         4.2          1.4         0.2     setosa
35           4.9         3.1          1.5         0.2     setosa
36           5.0         3.2          1.2         0.2     setosa
37           5.5         3.5          1.3         0.2     setosa
38           4.9         3.6          1.4         0.1     setosa
39           4.4         3.0          1.3         0.2     setosa
40           5.1         3.4          1.5         0.2     setosa
41           5.0         3.5          1.3         0.3     setosa
42           4.5         2.3          1.3         0.3     setosa
43           4.4         3.2          1.3         0.2     setosa
44           5.0         3.5          1.6         0.6     setosa
45           5.1         3.8          1.9         0.4     setosa
46           4.8         3.0          1.4         0.3     setosa
47           5.1         3.8          1.6         0.2     setosa
48           4.6         3.2          1.4         0.2     setosa
49           5.3         3.7          1.5         0.2     setosa
50           5.0         3.3          1.4         0.2     setosa
51           7.0         3.2          4.7         1.4 versicolor
52           6.4         3.2          4.5         1.5 versicolor
53           6.9         3.1          4.9         1.5 versicolor
54           5.5         2.3          4.0         1.3 versicolor
55           6.5         2.8          4.6         1.5 versicolor
56           5.7         2.8          4.5         1.3 versicolor
57           6.3         3.3          4.7         1.6 versicolor
58           4.9         2.4          3.3         1.0 versicolor
59           6.6         2.9          4.6         1.3 versicolor
60           5.2         2.7          3.9         1.4 versicolor
61           5.0         2.0          3.5         1.0 versicolor
62           5.9         3.0          4.2         1.5 versicolor
63           6.0         2.2          4.0         1.0 versicolor
64           6.1         2.9          4.7         1.4 versicolor
65           5.6         2.9          3.6         1.3 versicolor
66           6.7         3.1          4.4         1.4 versicolor
67           5.6         3.0          4.5         1.5 versicolor
68           5.8         2.7          4.1         1.0 versicolor
69           6.2         2.2          4.5         1.5 versicolor
70           5.6         2.5          3.9         1.1 versicolor
71           5.9         3.2          4.8         1.8 versicolor
72           6.1         2.8          4.0         1.3 versicolor
73           6.3         2.5          4.9         1.5 versicolor
74           6.1         2.8          4.7         1.2 versicolor
75           6.4         2.9          4.3         1.3 versicolor
76           6.6         3.0          4.4         1.4 versicolor
77           6.8         2.8          4.8         1.4 versicolor
78           6.7         3.0          5.0         1.7 versicolor
79           6.0         2.9          4.5         1.5 versicolor
80           5.7         2.6          3.5         1.0 versicolor
81           5.5         2.4          3.8         1.1 versicolor
82           5.5         2.4          3.7         1.0 versicolor
83           5.8         2.7          3.9         1.2 versicolor
84           6.0         2.7          5.1         1.6 versicolor
85           5.4         3.0          4.5         1.5 versicolor
86           6.0         3.4          4.5         1.6 versicolor
87           6.7         3.1          4.7         1.5 versicolor
88           6.3         2.3          4.4         1.3 versicolor
89           5.6         3.0          4.1         1.3 versicolor
90           5.5         2.5          4.0         1.3 versicolor
91           5.5         2.6          4.4         1.2 versicolor
92           6.1         3.0          4.6         1.4 versicolor
93           5.8         2.6          4.0         1.2 versicolor
94           5.0         2.3          3.3         1.0 versicolor
95           5.6         2.7          4.2         1.3 versicolor
96           5.7         3.0          4.2         1.2 versicolor
97           5.7         2.9          4.2         1.3 versicolor
98           6.2         2.9          4.3         1.3 versicolor
99           5.1         2.5          3.0         1.1 versicolor
100          5.7         2.8          4.1         1.3 versicolor
101          6.3         3.3          6.0         2.5  virginica
102          5.8         2.7          5.1         1.9  virginica
103          7.1         3.0          5.9         2.1  virginica
104          6.3         2.9          5.6         1.8  virginica
105          6.5         3.0          5.8         2.2  virginica
106          7.6         3.0          6.6         2.1  virginica
107          4.9         2.5          4.5         1.7  virginica
108          7.3         2.9          6.3         1.8  virginica
109          6.7         2.5          5.8         1.8  virginica
110          7.2         3.6          6.1         2.5  virginica
111          6.5         3.2          5.1         2.0  virginica
112          6.4         2.7          5.3         1.9  virginica
113          6.8         3.0          5.5         2.1  virginica
114          5.7         2.5          5.0         2.0  virginica
115          5.8         2.8          5.1         2.4  virginica
116          6.4         3.2          5.3         2.3  virginica
117          6.5         3.0          5.5         1.8  virginica
118          7.7         3.8          6.7         2.2  virginica
119          7.7         2.6          6.9         2.3  virginica
120          6.0         2.2          5.0         1.5  virginica
121          6.9         3.2          5.7         2.3  virginica
122          5.6         2.8          4.9         2.0  virginica
123          7.7         2.8          6.7         2.0  virginica
124          6.3         2.7          4.9         1.8  virginica
125          6.7         3.3          5.7         2.1  virginica
126          7.2         3.2          6.0         1.8  virginica
127          6.2         2.8          4.8         1.8  virginica
128          6.1         3.0          4.9         1.8  virginica
129          6.4         2.8          5.6         2.1  virginica
130          7.2         3.0          5.8         1.6  virginica
131          7.4         2.8          6.1         1.9  virginica
132          7.9         3.8          6.4         2.0  virginica
133          6.4         2.8          5.6         2.2  virginica
134          6.3         2.8          5.1         1.5  virginica
135          6.1         2.6          5.6         1.4  virginica
136          7.7         3.0          6.1         2.3  virginica
137          6.3         3.4          5.6         2.4  virginica
138          6.4         3.1          5.5         1.8  virginica
139          6.0         3.0          4.8         1.8  virginica
140          6.9         3.1          5.4         2.1  virginica
141          6.7         3.1          5.6         2.4  virginica
142          6.9         3.1          5.1         2.3  virginica
143          5.8         2.7          5.1         1.9  virginica
144          6.8         3.2          5.9         2.3  virginica
145          6.7         3.3          5.7         2.5  virginica
146          6.7         3.0          5.2         2.3  virginica
147          6.3         2.5          5.0         1.9  virginica
148          6.5         3.0          5.2         2.0  virginica
149          6.2         3.4          5.4         2.3  virginica
150          5.9         3.0          5.1         1.8  virginica

Factors#

Sometimes considered as a data type, as confusingly it is a possible response to the function type, and therefore a valid option for columns in a data frame. A factor is designed for categorical variables. They have a finite number of “levels”, which are the options that any element of that vector can take. They are actually stored as integers which can make them quite powerful for subsetting.

It is very easy for character vectors to be inappropriately stored as factors and vice versa. In fact, R’s default when loading data is to store a string as a factor. Conversely if we define a vector we have to explicit convert it to a vector

For example, if we define a vector of months, the default is to class it as a vector of characters. We have to actively coerce it into a factor.

%%R
a      <- c("March","February","February","November","February","March","March","March","February","November")
class(a)
[1] "character"
%%R
fact <- as.factor(a)
class(a)
[1] "character"

Activity: Data Exploration#

Use these functions to explore the mtcars dataset

  • How large is the dataset?

  • What type is the object?

  • What value is in the 6th row of the 4th column?

Summary Quiz#