Summary#
Download Rmd Version#
If you wish to engage with this course content via Rmd, then please click the link below to download the Rmd file.
Review#
This course has provided an introduction to working with data using packages in the Tidyverse. We looked at
Using
readr
to read csv data with explicit column specifications, and touched on other packages for reading other kinds of files e.g. Excel workbooks.Using
dplyr
to manipulate data (selecting columns, filtering, summarising, sorting, etc.) and for joining multiple dataframes together.Using
tidyr
to reshape data e.g. from ‘matrix’ format to tidy format.Working with strings and dates using the
stringr
andlubridate
packages.
We also discussed some key philosophical underpinnings of the Tidyverse:
Adopting the convention of putting data into tidy format.
The idea that the functions in Tidyverse packages are designed with very consistent ‘interfaces’ e.g. taking in a dataframe and returning a new dataframe.
The ‘functional’ style of programming, where we can set up pipelines for performing successive transformations of data.
Next steps / further resources#
This course has given you a grounding in using the Tidyverse and you now feel confident using it in your own work. To take your knowledge further, the following extra resources are available:
Cheatsheets for several Tidyverse packages, including the ones we’ve covered in this course, are available at https://posit.co/resources/cheatsheets/. These are a great way to quickly look up function(s) that help you perform some concrete task. Often it’s a good idea to look at the cheatsheet to know what functions(s) you need, then use the help system in R to read the documentation, or search the web for further advice.
The R for Data Science book (2nd ed.) by Hadley Wickham, Mine Çetinkaya-Rundel and Garrett Grolemund is a freely available, online book which covers lots of aspects of working with and visualising data using the Tidyverse in a way that’s approachable for non-expert R programmers.
A natural follow up to this course would be to learn about plotting and visualising data in R with the
ggplot2
package, which is part of the Tidyverse. The R for Data Science book above is a great resource to start with for learning how to plot withggplot2
.