Python for Data Analysis#

Course Description#

This course provides an in-depth exploration of Python for data analysis, focusing on essential libraries and tools such as NumPy, Pandas, Matplotlib, and Plotly. Additionally, it covers critical software development practices, including testing, virtual environments, and version control, to ensure code reproducibility and collaboration in research projects. By the end of the course, participants will be adept at performing data manipulation, analysis, and visualisation tasks, and will have a solid understanding of maintaining and sharing their code efficiently.

Course Objectives#

  • Grasp the fundamentals of Python programming, including data types, control structures, and functions

  • Learn how to load, clean, and manipulate data using Pandas for effective data analysis

  • Learn to use NumPy for numerical operations and handling large datasets efficiently

  • Understand the use of Pandas for handling research problem datasets

  • Create a variety of static and interactive visualisations to represent data insights, covering Matplotlib and Plotly

  • Apply machine learning techniques using Scikit-Learn for predictive modelling

  • Implement testing framework, manage dependencies with virtual environment

  • Learn methods to ensure that research and analyses can be reproduced and validated by other

Pre-requisite Knowledge#

Attendees should have taken the Introduction to Python course described here.

Installation Guide#

As this course extends upon Introduction to Python, the installation instructions are the same, available here.

Developers#

The developers of this course are:

  • Michael Saunby

  • Simon Kirby

  • Liam Berrisford

Course Delivery Content#

There are currently no additional content that is used outside of the self study notes to deliver this course.

License Info#

Instructional Material

The instructional material in this course is copyright © 2024 University of Exeter and is made available under the Creative Commons Attribution 4.0 International licence (https://creativecommons.org/licenses/by/4.0/). Instructional material consists of material that is contained within the “individual_modules/python_for_data_analysis” directory, and images folders in this directory, with the exception of code snippets and example programs found in files within these folders. Such code snippets and example programs are considered software for the purposes of this licence.

Software

Except where otherwise noted, software provided in this repository is made available under the MIT licence (https://opensource.org/licenses/MIT).

Copyright © 2024 University of Exeter

Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the “Software”), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions:

The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.

THE SOFTWARE IS PROVIDED “AS IS”, WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.