Summary#

Review#

This course has provided an introduction to working with data using packages in the Tidyverse. We looked at

  • Using readr to reading csv data with explicit column specifications, and touched on other packages for reading other kinds of files e.g. Excel workbooks.

  • Using dplyr to manipulate data (selecting columns, filtering, summarising, sorting, etc.) and for joining multiple dataframes together.

  • Using tidyr to reshape data e.g. from ‘matrix’ format to tidy format.

  • Working with strings and dates using the stringr and lubridate packages.

We also discussed some key philosophical underpinnings of the Tidyverse:

  • Adopting the convention of putting data into tidy format.

  • The idea that the functions in Tidyverse packages are designed with very consistent ‘interfaces’ e.g. taking in a dataframe and returning a new dataframe.

  • The ‘functional’ style of programming, where we can set up pipelines for performing successive transformations of data.

Next steps / further resources#

This course has given you a grounding in using the Tidyverse and you now feel confident using it in your own work. To take your knowledge further, the following extra resources are available:

  • Cheatsheets for several Tidyverse packages, including the ones we’ve covered in this course, are available at https://posit.co/resources/cheatsheets/. These are a great way to quickly look up function(s) that help you perform some concrete task. Often it’s a good idea to look at the cheatsheet to know what functions(s) you need, then use the help system in R to read the documentation, or search the web for further advice.

  • The R for Data Science book (2nd ed.) by Hadley Wickham, Mine Çetinkaya-Rundel and Garrett Grolemund is a freely available, online book which covers lots of aspects of working with and visualising data using the Tidyverse in a way that’s approachable for non-expert R programmers.

  • A natural follow up to this course would be to learn about plotting and visualising data in R with the ggplot2 package, which is part of the Tidyverse. The R for Data Science book above is a great resource to start with for learning how to plot with ggplot2.