Python for Data Analysis#
Course Description#
This course provides an in-depth exploration of Python for data analysis, focusing on essential libraries and tools such as NumPy, Pandas, Matplotlib, and Plotly. Additionally, it covers critical software development practices, including testing, virtual environments, and version control, to ensure code reproducibility and collaboration in research projects. By the end of the course, participants will be adept at performing data manipulation, analysis, and visualisation tasks, and will have a solid understanding of maintaining and sharing their code efficiently.
Course Objectives#
Grasp the fundamentals of Python programming, including data types, control structures, and functions.
Learn how to load, clean, and manipulate data using Pandas for effective data analysis.
Learn to use NumPy for numerical operations and handling large datasets efficiently.
Understand the use of Pandas for handling research problem datasets.
Create a variety of static and interactive visualisations to represent data insights, covering Matplotlib and Plotly.
Apply machine learning techniques using Scikit-Learn for predictive modelling.
Implement testing framework, manage dependencies with virtual environment.
Learn methods to ensure that research and analyses can be reproduced and validated by other s.
Pre-requisite Knowledge#
Attendees should have taken the Introduction to Python course described Intro To Python.
The interactive network visualisation below displays the prerequisite structure for this course within the training program. Each node represents a course that you may need to complete beforehand, and the arrows show the recommended order in which to take them, leading up to your selected course. You can click on any course node to view more information about that course. This interactive tool helps you clearly see the learning path required to access this course, making it easier to plan your progress with the Coding for Reproducible Research Training (CfRR) initiative.
Pre-Reqs Subnetwork
Sign-up#
To check for upcoming course dates and to register, please visit the Workshop Schedule and Sign-up page.
Installation Guide#
For this course, attendees will need:
The Python programming language
A way to edit and run code (we will use JupyterLab in a web browser)
An up-to-date web browser. We recommend a current version of Chrome, Safari, Firefox, or Edge, with the course team generally using Chrome.
This guide shows how to set everything up using the command line, with a focus on:
Installing Python
Creating and using a dedicated Python environment (so we avoid global installs)
Installing the course requirements
Starting JupyterLab from the command line
For more background on Python environments and why we use them, see the CfRR short course on the topic: CfRR Short Course on Python Environments
1. Overview of the tools we will use#
During the course you will primarily use:
A terminal / command line:
Windows: Command Prompt or PowerShell
macOS: Terminal
Linux: your preferred terminal
Python 3
A virtual environment in the course folder:
.venvJupyterLab, launched from the command line with
jupyter labA modern web browser
We deliberately do not rely on Anaconda Navigator or other GUI launchers, as was used in the Introduction to Python, to keep the process consistent and reproducible across platforms.
2. Install Python#
2.1 Check if Python is already installed#
Open a terminal / command prompt and run:
python --version
If that doesn’t work, try:
python3 --version
If you see something like
Python 3.10.xor higher, you already have a usable version.If you get an error, you’ll need to install Python.
2.2 Install Python on Windows#
Download the latest Python 3 installer for Windows.
Run the installer and tick:
✅ Add python.exe to PATH
Choose “Install Now” and accept the defaults.
After installation, open Command Prompt or PowerShell and check:
python --version
You may want to check python3 --version. You should now see the version that you installed.
2.3 Install Python on macOS#
Download the latest Python 3 installer for macOS.
Run the
.pkginstaller and follow the prompts.
Then open Terminal and run:
python --version
You may want to check python3 --version. You should now see the version that you installed.
2.4 Install Python on Linux#
On many Linux systems, Python 3 is already installed. Check with:
python3 --version
If it is missing, install Python 3 (and the venv module) using your package manager. For example, on Debian/Ubuntu:
sudo apt update
sudo apt install python3 python3-venv
3. Create a dedicated Python environment (.venv)#
To keep your system clean and make the course setup reproducible, we will create a Python virtual environment in the course folder. This environment will hold all the packages used in the course and avoid global installs.
4.1 Create the environment#
In the directory where you plan to conduct your work, run one of the following:
# If 'python' runs Python 3 on your system:
python -m venv .venv
# If you need to use 'python3':
python3 -m venv .venv
This creates a folder named .venv that contains its own Python interpreter and installed packages.
5. Activate the environment#
You must activate the environment each time you open a new terminal before working on the course.
When the environment is active, you will see (.venv) at the start of your prompt.
5.1 macOS / Linux#
From the folder you created the environment in, run:
source .venv/bin/activate
Your prompt should now look something like:
(.venv) yourname@computer <directory> %
To deactivate the environment later:
deactivate
5.2 Windows – Command Prompt#
From the folder you created the environment in, run:
.\.venv\Scriptsctivate.bat
The prompt will change to something like:
(.venv) C:\Users\you\folder>
To deactivate:
deactivate
5.3 Windows – PowerShell#
From the folder you created the environment in, run:
.\.venv\Scripts\Activate.ps1
If you see a message about execution policy, you may need to allow local scripts:
Set-ExecutionPolicy -Scope CurrentUser -ExecutionPolicy RemoteSigned
Then try activating again.
6. Install the course requirements inside .venv#
In this course we will manually install the core Python libraries we need, so you can clearly see what is being installed and can repeat the process for your own projects.
The key libraries we will use are:
NumPy – efficient numerical computing with arrays
pandas – working with tabular data (data frames)
Matplotlib – creating plots and figures
scikit-learn – basic machine learning and modelling
JupyterLab – running notebooks in your browser
From inside the folder you have created, with the environment activated, run:
pip install numpy pandas matplotlib scikit-learn jupyterlab
This command installs NumPy, pandas, Matplotlib, scikit-learn and JupyterLab into .venv only, keeping your system Python clean and making your setup reproducible.
If you are using additional plotting or analysis libraries later, you can install them in the same way, for example:
pip install seaborn
You can inspect what’s installed with:
pip list
7. Step 6 – Launch JupyterLab from the command line#
We will use JupyterLab as our main editor and environment for running Python code during the course. JupyterLab runs in your web browser but is started from the command line.
7.1 Starting JupyterLab#
Open a terminal / command prompt.
Go to your course folder:
cd /path/to/folder
Activate your environment (
.venv) as described above.Once you see
(.venv)in your prompt, run:jupyter lab
This will:
Start a local Jupyter server.
Open a tab in your web browser titled “JupyterLab”.
Show a menu where you can create a new notebook, open a terminal, and many other actions.
You can close JupyterLab by:
Shutting down the server in the terminal with
Ctrl+Cand confirming, andClosing the browser tab.
Key point: Always activate
.venvbefore runningjupyter labso that Jupyter uses the correct Python and packages.
7.2 Jupyter Notebook vs JupyterLab#
JupyterLab is an editor / integrated development environment (IDE) where you can:
Browse files
Create and edit Jupyter notebooks
Open terminals
Work with multiple notebooks and scripts side by side
A Jupyter notebook (
.ipynbfile) is a document format that mixes:Code cells (which you can run)
Text (Markdown) cells
Outputs (figures, tables, etc.)
Jupyter notebooks are widely used in research and teaching for explaining, demonstrating, and exploring methods. JupyterLab is the modern environment in which you interact with these notebooks.
If you prefer the classic notebook interface, you can launch it instead with:
jupyter notebook
(Still after activating .venv.)
Optional: Using VS Code instead of JupyterLab#
While this course uses JupyterLab as the primary environment for running and interacting with notebooks, Visual Studio Code (VS Code) is also a widely used and fully supported option for Python development. Participants are therefore welcome to use VS Code if it better fits their existing workflow.
VS Code can be used to:
Open and edit Python scripts (
.py) and Jupyter notebooks (.ipynb)Run code using the same virtual environment (
.venv) created for this courseWork with notebooks directly inside the editor using the Jupyter extension
Combine notebooks, scripts, terminals, and file browsing in a single interface
Using VS Code with this course#
1. Install VS Code and required extensions#
Download and install VS Code from:
https://code.visualstudio.com/Open VS Code.
Install the following extensions when prompted, or via the Extensions panel:
Python (by Microsoft)
Jupyter (by Microsoft)
These extensions enable Python execution, environment selection, and notebook support.
2. Open the course folder in VS Code#
In VS Code, select File → Open Folder…
Choose the folder where you created the
.venvenvironment.VS Code will load the folder and detect the virtual environment automatically in many cases.
Important: The
.venvfolder must be inside the project directory for VS Code to detect it reliably.
3. Select the correct Python interpreter#
VS Code must be configured to use the Python interpreter from .venv.
Open the Command Palette (
Ctrl+Shift+Pon Windows/Linux,Cmd+Shift+Pon macOS).Search for Python: Select Interpreter.
Choose the interpreter that points to:
.venv/bin/python(macOS/Linux), or.venv\Scripts\python.exe(Windows).
Once selected, VS Code will use this interpreter for running scripts and notebooks.
4. Working with Jupyter notebooks in VS Code#
You can open and run Jupyter notebooks directly inside VS Code.
Open an existing
.ipynbfile, or create a new one via File → New File → Jupyter Notebook.At the top right of the notebook interface, locate the Kernel Picker.
Select the kernel associated with the
.venvenvironment (it should match the interpreter selected above).Run cells using the Run button or by pressing
Shift+Enter.
Outputs such as plots and tables will appear inline, in the same way as in JupyterLab.
Key point: If your code cannot find installed packages, double-check that the selected kernel is the one associated with
.venv.
5. Using the integrated terminal in VS Code#
VS Code includes an integrated terminal that can be used in place of a separate system terminal.
Open the terminal via Terminal → New Terminal.
Activate the environment if it is not already active:
source .venv/bin/activate # macOS / Linux .\.venv\Scripts\activate # Windows
Once activated, you can run:
jupyter labor execute Python scripts and commands as needed.
6. VS Code vs JupyterLab#
JupyterLab provides a browser-based environment designed specifically for notebooks and teaching workflows.
VS Code provides a general-purpose development environment that integrates notebooks, scripts, version control, and terminals in one place.
Both tools use the same underlying Python environment and packages. The course documentation defaults to JupyterLab to ensure a consistent, platform-independent experience, but the concepts, commands, and examples apply equally when using VS Code.
Recommendation: If you are new to Python or notebook-based workflows, we recommend following the course using JupyterLab. If you already use VS Code for Python development, you may continue to do so using the same
.venvenvironment.
Self Study Material Link#
The self-study material for this course is available on the Self-study notes: Python for Data Analysis page.
Developers#
The developers of this course are:
Michael Saunby
Simon Kirby
Liam Berrisford
Course Delivery Content#
There is currently no additional content that is used outside of the self-study notes to deliver this course.
License Info#
Instructional Material
The instructional material in this course is copyright © 2024 University of Exeter and is made available under the Creative Commons Attribution 4.0 International licence (https://creativecommons.org/licenses/by/4.0/). Instructional material consists of material that is contained within the “individual_modules/python_for_data_analysis” directory, and images folders in this directory, with the exception of code snippets and example programs found in files within these folders. Such code snippets and example programs are considered software for the purposes of this licence.
Software
Except where otherwise noted, software provided in this repository is made available under the MIT licence (https://opensource.org/licenses/MIT).
Copyright © 2024 University of Exeter
Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the “Software”), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions:
The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.
THE SOFTWARE IS PROVIDED “AS IS”, WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.