Python for Data Analysis#

Course Description#

This course provides an in-depth exploration of Python for data analysis, focusing on essential libraries and tools such as NumPy, Pandas, Matplotlib, and Plotly. Additionally, it covers critical software development practices, including testing, virtual environments, and version control, to ensure code reproducibility and collaboration in research projects. By the end of the course, participants will be adept at performing data manipulation, analysis, and visualisation tasks, and will have a solid understanding of maintaining and sharing their code efficiently.

Course Objectives#

  • Grasp the fundamentals of Python programming, including data types, control structures, and functions.

  • Learn how to load, clean, and manipulate data using Pandas for effective data analysis.

  • Learn to use NumPy for numerical operations and handling large datasets efficiently.

  • Understand the use of Pandas for handling research problem datasets.

  • Create a variety of static and interactive visualisations to represent data insights, covering Matplotlib and Plotly.

  • Apply machine learning techniques using Scikit-Learn for predictive modelling.

  • Implement testing framework, manage dependencies with virtual environment.

  • Learn methods to ensure that research and analyses can be reproduced and validated by other s.

Pre-requisite Knowledge#

Attendees should have taken the Introduction to Python course described Intro To Python.

The interactive network visualisation below displays the prerequisite structure for this course within the training program. Each node represents a course that you may need to complete beforehand, and the arrows show the recommended order in which to take them, leading up to your selected course. You can click on any course node to view more information about that course. This interactive tool helps you clearly see the learning path required to access this course, making it easier to plan your progress with the Coding for Reproducible Research Training (CfRR) initiative.

Pre-Reqs Subnetwork

Sign-up#

To check for upcoming course dates and to register, please visit the Workshop Schedule and Sign-up page.

This course is currently accepting applications.

Installation Guide#

For this course, attendees will need:

  • The Python programming language

  • A way to edit and run code (we will use JupyterLab in a web browser)

  • An up-to-date web browser. We recommend a current version of Chrome, Safari, Firefox, or Edge, with the course team generally using Chrome.

This guide shows how to set everything up using the command line, with a focus on:

  • Installing Python

  • Creating and using a dedicated Python environment (so we avoid global installs)

  • Installing the course requirements

  • Starting JupyterLab from the command line

For more background on Python environments and why we use them, see the CfRR short course on the topic: CfRR Short Course on Python Environments

1. Overview of the tools we will use#

During the course you will primarily use:

  • A terminal / command line:

    • Windows: Command Prompt or PowerShell

    • macOS: Terminal

    • Linux: your preferred terminal

  • Python 3

  • A virtual environment in the course folder: .venv

  • JupyterLab, launched from the command line with jupyter lab

  • A modern web browser

We deliberately do not rely on Anaconda Navigator or other GUI launchers, as was used in the Introduction to Python, to keep the process consistent and reproducible across platforms.

2. Install Python#

2.1 Check if Python is already installed#

Open a terminal / command prompt and run:

python --version

If that doesn’t work, try:

python3 --version
  • If you see something like Python 3.10.x or higher, you already have a usable version.

  • If you get an error, you’ll need to install Python.

2.2 Install Python on Windows#

  1. Visit https://www.python.org/downloads/.

  2. Download the latest Python 3 installer for Windows.

  3. Run the installer and tick:

    Add python.exe to PATH

  4. Choose “Install Now” and accept the defaults.

After installation, open Command Prompt or PowerShell and check:

python --version

You may want to check python3 --version. You should now see the version that you installed.

2.3 Install Python on macOS#

  1. Visit https://www.python.org/downloads/.

  2. Download the latest Python 3 installer for macOS.

  3. Run the .pkg installer and follow the prompts.

Then open Terminal and run:

python --version

You may want to check python3 --version. You should now see the version that you installed.

2.4 Install Python on Linux#

On many Linux systems, Python 3 is already installed. Check with:

python3 --version

If it is missing, install Python 3 (and the venv module) using your package manager. For example, on Debian/Ubuntu:

sudo apt update
sudo apt install python3 python3-venv

3. Create a dedicated Python environment (.venv)#

To keep your system clean and make the course setup reproducible, we will create a Python virtual environment in the course folder. This environment will hold all the packages used in the course and avoid global installs.

4.1 Create the environment#

In the directory where you plan to conduct your work, run one of the following:

# If 'python' runs Python 3 on your system:
python -m venv .venv

# If you need to use 'python3':
python3 -m venv .venv

This creates a folder named .venv that contains its own Python interpreter and installed packages.

5. Activate the environment#

You must activate the environment each time you open a new terminal before working on the course.

When the environment is active, you will see (.venv) at the start of your prompt.

5.1 macOS / Linux#

From the folder you created the environment in, run:

source .venv/bin/activate

Your prompt should now look something like:

(.venv) yourname@computer <directory> %

To deactivate the environment later:

deactivate

5.2 Windows – Command Prompt#

From the folder you created the environment in, run:

.\.venv\Scriptsctivate.bat

The prompt will change to something like:

(.venv) C:\Users\you\folder>

To deactivate:

deactivate

5.3 Windows – PowerShell#

From the folder you created the environment in, run:

.\.venv\Scripts\Activate.ps1

If you see a message about execution policy, you may need to allow local scripts:

Set-ExecutionPolicy -Scope CurrentUser -ExecutionPolicy RemoteSigned

Then try activating again.


6. Install the course requirements inside .venv#

In this course we will manually install the core Python libraries we need, so you can clearly see what is being installed and can repeat the process for your own projects.

The key libraries we will use are:

  • NumPy – efficient numerical computing with arrays

  • pandas – working with tabular data (data frames)

  • Matplotlib – creating plots and figures

  • scikit-learn – basic machine learning and modelling

  • JupyterLab – running notebooks in your browser

From inside the folder you have created, with the environment activated, run:

pip install numpy pandas matplotlib scikit-learn jupyterlab

This command installs NumPy, pandas, Matplotlib, scikit-learn and JupyterLab into .venv only, keeping your system Python clean and making your setup reproducible.

If you are using additional plotting or analysis libraries later, you can install them in the same way, for example:

pip install seaborn

You can inspect what’s installed with:

pip list

7. Step 6 – Launch JupyterLab from the command line#

We will use JupyterLab as our main editor and environment for running Python code during the course. JupyterLab runs in your web browser but is started from the command line.

7.1 Starting JupyterLab#

  1. Open a terminal / command prompt.

  2. Go to your course folder:

    cd /path/to/folder
    
  3. Activate your environment (.venv) as described above.

  4. Once you see (.venv) in your prompt, run:

    jupyter lab
    

This will:

  • Start a local Jupyter server.

  • Open a tab in your web browser titled “JupyterLab”.

  • Show a menu where you can create a new notebook, open a terminal, and many other actions.

You can close JupyterLab by:

  • Shutting down the server in the terminal with Ctrl+C and confirming, and

  • Closing the browser tab.

Key point: Always activate .venv before running jupyter lab so that Jupyter uses the correct Python and packages.

7.2 Jupyter Notebook vs JupyterLab#

  • JupyterLab is an editor / integrated development environment (IDE) where you can:

    • Browse files

    • Create and edit Jupyter notebooks

    • Open terminals

    • Work with multiple notebooks and scripts side by side

  • A Jupyter notebook (.ipynb file) is a document format that mixes:

    • Code cells (which you can run)

    • Text (Markdown) cells

    • Outputs (figures, tables, etc.)

Jupyter notebooks are widely used in research and teaching for explaining, demonstrating, and exploring methods. JupyterLab is the modern environment in which you interact with these notebooks.

If you prefer the classic notebook interface, you can launch it instead with:

jupyter notebook

(Still after activating .venv.)

Optional: Using VS Code instead of JupyterLab#

While this course uses JupyterLab as the primary environment for running and interacting with notebooks, Visual Studio Code (VS Code) is also a widely used and fully supported option for Python development. Participants are therefore welcome to use VS Code if it better fits their existing workflow.

VS Code can be used to:

  • Open and edit Python scripts (.py) and Jupyter notebooks (.ipynb)

  • Run code using the same virtual environment (.venv) created for this course

  • Work with notebooks directly inside the editor using the Jupyter extension

  • Combine notebooks, scripts, terminals, and file browsing in a single interface

Using VS Code with this course#

1. Install VS Code and required extensions#
  1. Download and install VS Code from:
    https://code.visualstudio.com/

  2. Open VS Code.

  3. Install the following extensions when prompted, or via the Extensions panel:

    • Python (by Microsoft)

    • Jupyter (by Microsoft)

These extensions enable Python execution, environment selection, and notebook support.

2. Open the course folder in VS Code#
  1. In VS Code, select File → Open Folder…

  2. Choose the folder where you created the .venv environment.

  3. VS Code will load the folder and detect the virtual environment automatically in many cases.

Important: The .venv folder must be inside the project directory for VS Code to detect it reliably.

3. Select the correct Python interpreter#

VS Code must be configured to use the Python interpreter from .venv.

  1. Open the Command Palette (Ctrl+Shift+P on Windows/Linux, Cmd+Shift+P on macOS).

  2. Search for Python: Select Interpreter.

  3. Choose the interpreter that points to:

    • .venv/bin/python (macOS/Linux), or

    • .venv\Scripts\python.exe (Windows).

Once selected, VS Code will use this interpreter for running scripts and notebooks.

4. Working with Jupyter notebooks in VS Code#

You can open and run Jupyter notebooks directly inside VS Code.

  1. Open an existing .ipynb file, or create a new one via File → New File → Jupyter Notebook.

  2. At the top right of the notebook interface, locate the Kernel Picker.

  3. Select the kernel associated with the .venv environment (it should match the interpreter selected above).

  4. Run cells using the Run button or by pressing Shift+Enter.

Outputs such as plots and tables will appear inline, in the same way as in JupyterLab.

Key point: If your code cannot find installed packages, double-check that the selected kernel is the one associated with .venv.

5. Using the integrated terminal in VS Code#

VS Code includes an integrated terminal that can be used in place of a separate system terminal.

  1. Open the terminal via Terminal → New Terminal.

  2. Activate the environment if it is not already active:

    source .venv/bin/activate   # macOS / Linux
    .\.venv\Scripts\activate    # Windows
    
  3. Once activated, you can run:

    jupyter lab
    

    or execute Python scripts and commands as needed.

6. VS Code vs JupyterLab#
  • JupyterLab provides a browser-based environment designed specifically for notebooks and teaching workflows.

  • VS Code provides a general-purpose development environment that integrates notebooks, scripts, version control, and terminals in one place.

Both tools use the same underlying Python environment and packages. The course documentation defaults to JupyterLab to ensure a consistent, platform-independent experience, but the concepts, commands, and examples apply equally when using VS Code.

Recommendation: If you are new to Python or notebook-based workflows, we recommend following the course using JupyterLab. If you already use VS Code for Python development, you may continue to do so using the same .venv environment.

Developers#

The developers of this course are:

  • Michael Saunby

  • Simon Kirby

  • Liam Berrisford

Course Delivery Content#

There is currently no additional content that is used outside of the self-study notes to deliver this course.

License Info#

Instructional Material

The instructional material in this course is copyright © 2024 University of Exeter and is made available under the Creative Commons Attribution 4.0 International licence (https://creativecommons.org/licenses/by/4.0/). Instructional material consists of material that is contained within the “individual_modules/python_for_data_analysis” directory, and images folders in this directory, with the exception of code snippets and example programs found in files within these folders. Such code snippets and example programs are considered software for the purposes of this licence.

Software

Except where otherwise noted, software provided in this repository is made available under the MIT licence (https://opensource.org/licenses/MIT).

Copyright © 2024 University of Exeter

Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the “Software”), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions:

The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.

THE SOFTWARE IS PROVIDED “AS IS”, WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.