Data Types and Structures#
Learning Objectives#
Understand and identify the five core data types used in R (Numeric, Integer, Complex, Character, Logical)
Learn how to use functions like
typeof()
,class()
,mode()
,length()
, andattributes()
to examine the features of variables or objects in RLearn methods to address columns in data frames using the
$
operator or square bracketsPractice defining variables and using R functions to profile their characteristics
Understand and perform conversions between different data types in R
Inbuilt Datasets#
R has a number of datasets included for you to practice with. We can get a list of these by running the command data()
%%R
data()
File: /var/folders/r7/wblx0jw96hz08nvjz9p3zsgr0000gp/T//RtmpcTNU8Z/RpackageIQR5eca5ebeecfa
Data sets in package ‘.’:
bp_dataset_session4
code_session2
plotting_arguments
r_training_session_2_script
r_training_session_3_script
r_training_session_4
worms
Data sets in package ‘datasets’:
AirPassengers Monthly Airline Passenger Numbers 1949-1960
BJsales Sales Data with Leading Indicator
BJsales.lead (BJsales)
Sales Data with Leading Indicator
BOD Biochemical Oxygen Demand
CO2 Carbon Dioxide Uptake in Grass Plants
ChickWeight Weight versus age of chicks on different diets
DNase Elisa assay of DNase
EuStockMarkets Daily Closing Prices of Major European Stock
Indices, 1991-1998
Formaldehyde Determination of Formaldehyde
HairEyeColor Hair and Eye Color of Statistics Students
Harman23.cor Harman Example 2.3
Harman74.cor Harman Example 7.4
Indometh Pharmacokinetics of Indomethacin
InsectSprays Effectiveness of Insect Sprays
JohnsonJohnson Quarterly Earnings per Johnson & Johnson Share
LakeHuron Level of Lake Huron 1875-1972
LifeCycleSavings Intercountry Life-Cycle Savings Data
Loblolly Growth of Loblolly pine trees
Nile Flow of the River Nile
Orange Growth of Orange Trees
OrchardSprays Potency of Orchard Sprays
PlantGrowth Results from an Experiment on Plant Growth
Puromycin Reaction Velocity of an Enzymatic Reaction
Seatbelts Road Casualties in Great Britain 1969-84
Theoph Pharmacokinetics of Theophylline
Titanic Survival of passengers on the Titanic
ToothGrowth The Effect of Vitamin C on Tooth Growth in
Guinea Pigs
UCBAdmissions Student Admissions at UC Berkeley
UKDriverDeaths Road Casualties in Great Britain 1969-84
UKgas UK Quarterly Gas Consumption
USAccDeaths Accidental Deaths in the US 1973-1978
USArrests Violent Crime Rates by US State
USJudgeRatings Lawyers' Ratings of State Judges in the US
Superior Court
USPersonalExpenditure Personal Expenditure Data
UScitiesD Distances Between European Cities and Between
US Cities
VADeaths Death Rates in Virginia (1940)
WWWusage Internet Usage per Minute
WorldPhones The World's Telephones
ability.cov Ability and Intelligence Tests
airmiles Passenger Miles on Commercial US Airlines,
1937-1960
airquality New York Air Quality Measurements
anscombe Anscombe's Quartet of 'Identical' Simple Linear
Regressions
attenu The Joyner-Boore Attenuation Data
attitude The Chatterjee-Price Attitude Data
austres Quarterly Time Series of the Number of
Australian Residents
beaver1 (beavers) Body Temperature Series of Two Beavers
beaver2 (beavers) Body Temperature Series of Two Beavers
cars Speed and Stopping Distances of Cars
chickwts Chicken Weights by Feed Type
co2 Mauna Loa Atmospheric CO2 Concentration
crimtab Student's 3000 Criminals Data
discoveries Yearly Numbers of Important Discoveries
esoph Smoking, Alcohol and (O)esophageal Cancer
euro Conversion Rates of Euro Currencies
euro.cross (euro) Conversion Rates of Euro Currencies
eurodist Distances Between European Cities and Between
US Cities
faithful Old Faithful Geyser Data
fdeaths (UKLungDeaths)
Monthly Deaths from Lung Diseases in the UK
freeny Freeny's Revenue Data
freeny.x (freeny) Freeny's Revenue Data
freeny.y (freeny) Freeny's Revenue Data
infert Infertility after Spontaneous and Induced
Abortion
iris Edgar Anderson's Iris Data
iris3 Edgar Anderson's Iris Data
islands Areas of the World's Major Landmasses
ldeaths (UKLungDeaths)
Monthly Deaths from Lung Diseases in the UK
lh Luteinizing Hormone in Blood Samples
longley Longley's Economic Regression Data
lynx Annual Canadian Lynx trappings 1821-1934
mdeaths (UKLungDeaths)
Monthly Deaths from Lung Diseases in the UK
morley Michelson Speed of Light Data
mtcars Motor Trend Car Road Tests
nhtemp Average Yearly Temperatures in New Haven
nottem Average Monthly Temperatures at Nottingham,
1920-1939
npk Classical N, P, K Factorial Experiment
occupationalStatus Occupational Status of Fathers and their Sons
precip Annual Precipitation in US Cities
presidents Quarterly Approval Ratings of US Presidents
pressure Vapor Pressure of Mercury as a Function of
Temperature
quakes Locations of Earthquakes off Fiji
randu Random Numbers from Congruential Generator
RANDU
rivers Lengths of Major North American Rivers
rock Measurements on Petroleum Rock Samples
sleep Student's Sleep Data
stack.loss (stackloss)
Brownlee's Stack Loss Plant Data
stack.x (stackloss) Brownlee's Stack Loss Plant Data
stackloss Brownlee's Stack Loss Plant Data
state.abb (state) US State Facts and Figures
state.area (state) US State Facts and Figures
state.center (state) US State Facts and Figures
state.division (state)
US State Facts and Figures
state.name (state) US State Facts and Figures
state.region (state) US State Facts and Figures
state.x77 (state) US State Facts and Figures
sunspot.month Monthly Sunspot Data, from 1749 to "Present"
sunspot.year Yearly Sunspot Data, 1700-1988
sunspots Monthly Sunspot Numbers, 1749-1983
swiss Swiss Fertility and Socioeconomic Indicators
(1888) Data
treering Yearly Treering Data, -6000-1979
trees Diameter, Height and Volume for Black Cherry
Trees
uspop Populations Recorded by the US Census
volcano Topographic Information on Auckland's Maunga
Whau Volcano
warpbreaks The Number of Breaks in Yarn during Weaving
women Average Heights and Weights for American Women
Use ‘data(package = .packages(all.available = TRUE))’
to list the data sets in all *available* packages.
---
This command won’t produce any output but open a file in the scripts pane that lists the available datasets.
In this session, we will be using the built-in R data sets stored in the variables iris
& mtcars
.
By having our data stored in a variable means we can easily use it with R functions and start to
manipulate or process it. Note that we are using the word data very loosely, it can refer to any form of
information we want to store.
Exploring datasets#
Let’s take a closer look at the iris
data.
First, let’s ask what type of thing iris
is using the function class()
%%R
class(iris)
[1] "data.frame"
The output tells us that it is a data frame. A data frame is an example of R object, and can be defined by certain properties. A data frame is comparable to a spreadsheet in MS Excel as it is a 2 dimensional object (i.e. has rows and columns). Data frames are very useful for storing data, especially if you are used to work with your data in tables. A typical data frame of experimental data contains individual observations in rows and variables in columns.
We can see the shape, or dimensions, of the data frame with the function dim()
:
%%R
dim(iris)
[1] 150 5
This tells us that our data frame, iris
, has 150 rows and 5 columns.
To explore data frames, there are a number of relevant functions:
head()
- shows first 6 rowstail()
- shows last 6 rowsdim()
- returns the dimensions of data frame (i.e. number of rows and number of columns)nrow()
- number of rowsncol()
- number of columnsstr()
- structure of data frame - name, type and preview of data in each columnnames()
orcolnames()
- both show the names attribute for a data frame
Subsetting Data#
There are many occasions we want to “look” at some part of the data. Extract a subset is known as slicing. If we want to get a single value from the data frame, we can index a specific position using square brackets. If you are familiar with matrices we index in the same way. For example, to get the element in the top left corner, i.e. in the first row and first column we run:
%%R
iris[1, 1]
[1] 5.1
We can use this principle to extract any entry of the matrix. For example the 30th row and 3rd column
%%R
iris[30,3]
[1] 1.6
An index like ``[30, 3]``` selects a single element of a data frame, but we can select larger sections as well. For example, we can select the first three columns of values for the first four rows like this:
%%R
iris[1:4, 1:3]
Sepal.Length Sepal.Width Petal.Length
1 5.1 3.5 1.4
2 4.9 3.0 1.4
3 4.7 3.2 1.3
4 4.6 3.1 1.5
The slice 1:4
means, “Start at index 1 and go to index 4.” The slice does not need to start at 1, e.g. the line below selects rows 5 through 10:
%%R
iris[5:10, 1:3]
Sepal.Length Sepal.Width Petal.Length
5 5.0 3.6 1.4
6 5.4 3.9 1.7
7 4.6 3.4 1.4
8 5.0 3.4 1.5
9 4.4 2.9 1.4
10 4.9 3.1 1.5
We can use the function c()
, which stands for combine, to select non-contiguous entries:
%%R
iris[c(3, 8, 37, 56), c(1,3)]
Sepal.Length Petal.Length
3 4.7 1.3
8 5.0 1.5
37 5.5 1.3
56 5.7 4.5
We also don’t have to provide a slice for either the rows or the columns. If we don’t include a slice for the rows, R returns all the rows; if we don’t include a slice for the columns, R returns all the columns. If we don’t provide a slice for either rows or columns, e.g. iris[, ]
, R returns the full data frame.
%%R
iris[5, ]
Sepal.Length Sepal.Width Petal.Length Petal.Width Species
5 5 3.6 1.4 0.2 setosa
%%R
iris[, 2]
[1] 3.5 3.0 3.2 3.1 3.6 3.9 3.4 3.4 2.9 3.1 3.7 3.4 3.0 3.0 4.0 4.4 3.9 3.5
[19] 3.8 3.8 3.4 3.7 3.6 3.3 3.4 3.0 3.4 3.5 3.4 3.2 3.1 3.4 4.1 4.2 3.1 3.2
[37] 3.5 3.6 3.0 3.4 3.5 2.3 3.2 3.5 3.8 3.0 3.8 3.2 3.7 3.3 3.2 3.2 3.1 2.3
[55] 2.8 2.8 3.3 2.4 2.9 2.7 2.0 3.0 2.2 2.9 2.9 3.1 3.0 2.7 2.2 2.5 3.2 2.8
[73] 2.5 2.8 2.9 3.0 2.8 3.0 2.9 2.6 2.4 2.4 2.7 2.7 3.0 3.4 3.1 2.3 3.0 2.5
[91] 2.6 3.0 2.6 2.3 2.7 3.0 2.9 2.9 2.5 2.8 3.3 2.7 3.0 2.9 3.0 3.0 2.5 2.9
[109] 2.5 3.6 3.2 2.7 3.0 2.5 2.8 3.2 3.0 3.8 2.6 2.2 3.2 2.8 2.8 2.7 3.3 3.2
[127] 2.8 3.0 2.8 3.0 2.8 3.8 2.8 2.8 2.6 3.0 3.4 3.1 3.0 3.1 3.1 3.1 2.7 3.2
[145] 3.3 3.0 2.5 3.0 3.4 3.0
Addressing Columns by name#
Columns can also be addressed by name, with either the $
operator (i.e. iris$Petal.Length
) or square brackets (ie. iris[,'Petal.Length']
).
Data Types#
One key feature of a data frame is each column is classed as a specific data type.
Data types, or modes, define what the values are and how they can be used.
There are 5 core data types:
Numeric - all real numbers e.g. 7.5
Integer - e.g. 2
Complex - numbers with real and imaginary parts e.g. 1+4i
Character - consists of letters or numbers or symbols or a combination of these e.g. “a”, “f5”, “datatypes”, “Learning R is fun”.
Logical - only takes TRUE or FALSE
These data types are also used to characterise other one dimensional R objects such as individual values or vectors (more on these later). R provides a number of functions to examine the features of variables or objects.
Some examples:
typeof()
- what is the object’s data type (on the data storage level (“what the computer sees”))?class()
- what is the object’s data type (on the abstract type level(“what R sees”))?mode()
- what is the object’s data type (on the data storage level (“what the computer sees”))?length()
- how long is it?attributes()
- does it have any metadata?str
- display the internal structure of an object.is.numeric()
,is.character()
,is.complex()
,is.logical()
- returns TRUE when an object is the datatype queried, FALSE if not
Let’s define some variables we can use R to profile the charateristics of
%%R
x <- "dataset"
typeof(x)
[1] "character"
%%R
y <- 1:10
y
[1] 1 2 3 4 5 6 7 8 9 10
%%R
typeof(y)
[1] "integer"
%%R
class(y)
[1] "integer"
%%R
length(y)
[1] 10
Converting Between Datatypes#
It can be critical that R has correctly assigned the right data type to your variable. If it has not you may run into
errors when processing it. You therefore may want to convert between different data types. This can be done with the
series of functions as.numeric
, as.character
etc.
For example, let’s convert our numeric variable y
into a character.
%%R
y <- as.character(y)
typeof(y)
[1] "character"
Activity: Determining Data Yypes#
Create a variable with the numbers 9,2,200, and 14. What class do you predict this variable to be? Use an R function to confirm your answer. If it is not the data type you expected, can you force R to convert it to an integer? Divide each element of the variable in half. Does this change the type of variable?
Data Structures#
R has a number of inbuilt structures that can be used to store datasets. We have encountered one of these already the data.frame. Other include:
strings
vectors
data frames
matrices
arrays
lists
factors
Strings#
A string is a run of characters. e.g. “hello”. or “a189jde2mjo”. They are enclosed in quotes.
Vectors#
A vector is the most common and basic data structure in R. They are an ordered collection of basic data types of a given length. They are one-dimensional. We can think of each column of a data frame as a vector.
The concatenate or combine c()
function will explicitly construct a vector.
%%R
v_num <- c(9,20,12)
v_log <- c(TRUE, FALSE, FALSE, TRUE)
As we create these vectors you should see them listed in your environment pane. We can then call the name of the varible to see what the value is at that point in time.
%%R
v_num
[1] 9 20 12
%%R
v_log
[1] TRUE FALSE FALSE TRUE
The function c()
can also be used to add elements to a vector.
%%R
v_name <- c("Sarah", "Tracy", "Jon")
v_name <- c(v_name, "Annette")
v_name
[1] "Sarah" "Tracy" "Jon" "Annette"
If we want to create a series of numbers, like 1 to 140, we can bypass the c()
constructor and just write
%%R
(v_int <- 1:4)
[1] 1 2 3 4
When we call a vector atomic, we mean that the vector only holds data of a single data type.
Data Frame#
A data frame is a two dimensional structure consisting of rows and columns.
We can create a data frames using the function data.frame()
. We will use the predefined vector letter
to get the first ten letters in the alphabet.
%%R
dat <- data.frame(id = letters[1:10], x = 1:10, y = 11:20)
dat
id x y
1 a 1 11
2 b 2 12
3 c 3 13
4 d 4 14
5 e 5 15
6 f 6 16
7 g 7 17
8 h 8 18
9 i 9 19
10 j 10 20
Columns are variables, and each column will have a specified data type, which all entries must adhere to.
Rows are observations. Rows and columns will have rownames
and colnames
that you can use to extract
specific rows or columns respectively.
If you read in a table of data from a file, it will typically be represented by a data.frame.
Functions to explore data frames#
head()
- shows first 6 rowstail()
- shows last 6 rowsdim()
- returns the dimensions of data frame (i.e. number of rows and number of columns)nrow()
- number of rowsncol()
- number of columnsstr()
- structure of data frame - name, type and preview of data in each columnnames()
orcolnames()
- both show the names attribute for a data frame
Matrices#
A matrix is another two dimensional object, but it differs to a data.frame as all columns/entries must be of the same type. It is more efficient memory wise than a data.frame, but can not be used as a substitute to all data.frames.
We can construct a matrix as follows:
%%R
m<-matrix(1:6, nrow=2, ncol=3)
m
[,1] [,2] [,3]
[1,] 1 3 5
[2,] 2 4 6
When creating a matrix, it is important to remember that matrices are filled column-wise If that is not what you want, you can use the byrow
argument (a logical: can be TRUE
or FALSE
) to specify how the matrix is filled.
We can confirm the data type of a matrix with the function class
.
%%R
class(m)
[1] "matrix" "array"
Arrays#
Arrays are n dimensional storage structures. A one dimensional array is a vector, a two dimensional array is a matrix.
We will not be using this type of object, but it is included for completeness.
Lists#
A list in R is a collection of objects and elements, which themselves can be a heterogeneous mix of other objects including vectors, matrices, data.frames, functions, strings, numbers. They tend to be used to collate different data types connected in some way.
We can create a list of the vectors, data.frames we have already constructed as follows.
%%R
l <- list(v_log, v_num, iris)
l
[[1]]
[1] TRUE FALSE FALSE TRUE
[[2]]
[1] 9 20 12
[[3]]
Sepal.Length Sepal.Width Petal.Length Petal.Width Species
1 5.1 3.5 1.4 0.2 setosa
2 4.9 3.0 1.4 0.2 setosa
3 4.7 3.2 1.3 0.2 setosa
4 4.6 3.1 1.5 0.2 setosa
5 5.0 3.6 1.4 0.2 setosa
6 5.4 3.9 1.7 0.4 setosa
7 4.6 3.4 1.4 0.3 setosa
8 5.0 3.4 1.5 0.2 setosa
9 4.4 2.9 1.4 0.2 setosa
10 4.9 3.1 1.5 0.1 setosa
11 5.4 3.7 1.5 0.2 setosa
12 4.8 3.4 1.6 0.2 setosa
13 4.8 3.0 1.4 0.1 setosa
14 4.3 3.0 1.1 0.1 setosa
15 5.8 4.0 1.2 0.2 setosa
16 5.7 4.4 1.5 0.4 setosa
17 5.4 3.9 1.3 0.4 setosa
18 5.1 3.5 1.4 0.3 setosa
19 5.7 3.8 1.7 0.3 setosa
20 5.1 3.8 1.5 0.3 setosa
21 5.4 3.4 1.7 0.2 setosa
22 5.1 3.7 1.5 0.4 setosa
23 4.6 3.6 1.0 0.2 setosa
24 5.1 3.3 1.7 0.5 setosa
25 4.8 3.4 1.9 0.2 setosa
26 5.0 3.0 1.6 0.2 setosa
27 5.0 3.4 1.6 0.4 setosa
28 5.2 3.5 1.5 0.2 setosa
29 5.2 3.4 1.4 0.2 setosa
30 4.7 3.2 1.6 0.2 setosa
31 4.8 3.1 1.6 0.2 setosa
32 5.4 3.4 1.5 0.4 setosa
33 5.2 4.1 1.5 0.1 setosa
34 5.5 4.2 1.4 0.2 setosa
35 4.9 3.1 1.5 0.2 setosa
36 5.0 3.2 1.2 0.2 setosa
37 5.5 3.5 1.3 0.2 setosa
38 4.9 3.6 1.4 0.1 setosa
39 4.4 3.0 1.3 0.2 setosa
40 5.1 3.4 1.5 0.2 setosa
41 5.0 3.5 1.3 0.3 setosa
42 4.5 2.3 1.3 0.3 setosa
43 4.4 3.2 1.3 0.2 setosa
44 5.0 3.5 1.6 0.6 setosa
45 5.1 3.8 1.9 0.4 setosa
46 4.8 3.0 1.4 0.3 setosa
47 5.1 3.8 1.6 0.2 setosa
48 4.6 3.2 1.4 0.2 setosa
49 5.3 3.7 1.5 0.2 setosa
50 5.0 3.3 1.4 0.2 setosa
51 7.0 3.2 4.7 1.4 versicolor
52 6.4 3.2 4.5 1.5 versicolor
53 6.9 3.1 4.9 1.5 versicolor
54 5.5 2.3 4.0 1.3 versicolor
55 6.5 2.8 4.6 1.5 versicolor
56 5.7 2.8 4.5 1.3 versicolor
57 6.3 3.3 4.7 1.6 versicolor
58 4.9 2.4 3.3 1.0 versicolor
59 6.6 2.9 4.6 1.3 versicolor
60 5.2 2.7 3.9 1.4 versicolor
61 5.0 2.0 3.5 1.0 versicolor
62 5.9 3.0 4.2 1.5 versicolor
63 6.0 2.2 4.0 1.0 versicolor
64 6.1 2.9 4.7 1.4 versicolor
65 5.6 2.9 3.6 1.3 versicolor
66 6.7 3.1 4.4 1.4 versicolor
67 5.6 3.0 4.5 1.5 versicolor
68 5.8 2.7 4.1 1.0 versicolor
69 6.2 2.2 4.5 1.5 versicolor
70 5.6 2.5 3.9 1.1 versicolor
71 5.9 3.2 4.8 1.8 versicolor
72 6.1 2.8 4.0 1.3 versicolor
73 6.3 2.5 4.9 1.5 versicolor
74 6.1 2.8 4.7 1.2 versicolor
75 6.4 2.9 4.3 1.3 versicolor
76 6.6 3.0 4.4 1.4 versicolor
77 6.8 2.8 4.8 1.4 versicolor
78 6.7 3.0 5.0 1.7 versicolor
79 6.0 2.9 4.5 1.5 versicolor
80 5.7 2.6 3.5 1.0 versicolor
81 5.5 2.4 3.8 1.1 versicolor
82 5.5 2.4 3.7 1.0 versicolor
83 5.8 2.7 3.9 1.2 versicolor
84 6.0 2.7 5.1 1.6 versicolor
85 5.4 3.0 4.5 1.5 versicolor
86 6.0 3.4 4.5 1.6 versicolor
87 6.7 3.1 4.7 1.5 versicolor
88 6.3 2.3 4.4 1.3 versicolor
89 5.6 3.0 4.1 1.3 versicolor
90 5.5 2.5 4.0 1.3 versicolor
91 5.5 2.6 4.4 1.2 versicolor
92 6.1 3.0 4.6 1.4 versicolor
93 5.8 2.6 4.0 1.2 versicolor
94 5.0 2.3 3.3 1.0 versicolor
95 5.6 2.7 4.2 1.3 versicolor
96 5.7 3.0 4.2 1.2 versicolor
97 5.7 2.9 4.2 1.3 versicolor
98 6.2 2.9 4.3 1.3 versicolor
99 5.1 2.5 3.0 1.1 versicolor
100 5.7 2.8 4.1 1.3 versicolor
101 6.3 3.3 6.0 2.5 virginica
102 5.8 2.7 5.1 1.9 virginica
103 7.1 3.0 5.9 2.1 virginica
104 6.3 2.9 5.6 1.8 virginica
105 6.5 3.0 5.8 2.2 virginica
106 7.6 3.0 6.6 2.1 virginica
107 4.9 2.5 4.5 1.7 virginica
108 7.3 2.9 6.3 1.8 virginica
109 6.7 2.5 5.8 1.8 virginica
110 7.2 3.6 6.1 2.5 virginica
111 6.5 3.2 5.1 2.0 virginica
112 6.4 2.7 5.3 1.9 virginica
113 6.8 3.0 5.5 2.1 virginica
114 5.7 2.5 5.0 2.0 virginica
115 5.8 2.8 5.1 2.4 virginica
116 6.4 3.2 5.3 2.3 virginica
117 6.5 3.0 5.5 1.8 virginica
118 7.7 3.8 6.7 2.2 virginica
119 7.7 2.6 6.9 2.3 virginica
120 6.0 2.2 5.0 1.5 virginica
121 6.9 3.2 5.7 2.3 virginica
122 5.6 2.8 4.9 2.0 virginica
123 7.7 2.8 6.7 2.0 virginica
124 6.3 2.7 4.9 1.8 virginica
125 6.7 3.3 5.7 2.1 virginica
126 7.2 3.2 6.0 1.8 virginica
127 6.2 2.8 4.8 1.8 virginica
128 6.1 3.0 4.9 1.8 virginica
129 6.4 2.8 5.6 2.1 virginica
130 7.2 3.0 5.8 1.6 virginica
131 7.4 2.8 6.1 1.9 virginica
132 7.9 3.8 6.4 2.0 virginica
133 6.4 2.8 5.6 2.2 virginica
134 6.3 2.8 5.1 1.5 virginica
135 6.1 2.6 5.6 1.4 virginica
136 7.7 3.0 6.1 2.3 virginica
137 6.3 3.4 5.6 2.4 virginica
138 6.4 3.1 5.5 1.8 virginica
139 6.0 3.0 4.8 1.8 virginica
140 6.9 3.1 5.4 2.1 virginica
141 6.7 3.1 5.6 2.4 virginica
142 6.9 3.1 5.1 2.3 virginica
143 5.8 2.7 5.1 1.9 virginica
144 6.8 3.2 5.9 2.3 virginica
145 6.7 3.3 5.7 2.5 virginica
146 6.7 3.0 5.2 2.3 virginica
147 6.3 2.5 5.0 1.9 virginica
148 6.5 3.0 5.2 2.0 virginica
149 6.2 3.4 5.4 2.3 virginica
150 5.9 3.0 5.1 1.8 virginica
Factors#
Sometimes considered as a data type, as confusingly it is a possible response to the function type
, and therefore
a valid option for columns in a data frame. A factor is designed for categorical variables. They have a finite number of
“levels”, which are the options that any element of that vector can take. They are actually stored as integers
which can make them quite powerful for subsetting.
It is very easy for character vectors to be inappropriately stored as factors and vice versa. In fact, R’s default when loading data is to store a string as a factor. Conversely if we define a vector we have to explicit convert it to a vector
For example, if we define a vector of months, the default is to class it as a vector of characters. We have to actively coerce it into a factor.
%%R
a <- c("March","February","February","November","February","March","March","March","February","November")
class(a)
[1] "character"
%%R
fact <- as.factor(a)
class(a)
[1] "character"
Activity: Data Exploration#
Use these functions to explore the mtcars dataset
How large is the dataset?
What type is the object?
What value is in the 6th row of the 4th column?