Standard Project Structures for Reproducible Workflows

Standard Project Structures for Reproducible Workflows#

Authors: Ruxandra Neatu; Tom Hawes.

Outline#

This guide is designed to help researchers, especially those in biology and related disciplines, organise their computational analyses using a standardised, structured workflow. Whether you’re adapting an existing project or starting a new one, using a clear structure that is standardised across different projects within a research group can significantly improve the reusability of your work.

This course covers the following:

  1. Project structure templates:

  • We introduce three versions of a template for structuring analytical workflows, starting with a lightweight option and gradually increasing in sophistication.

  • Feel free to choose the version that best suits your needs. You can also mix and match elements appropriate to your needs.

  • The aim is to help you customise a structure that works for your workflow and yet makes it easy for others to follow.

  1. A README template.

  2. What to store under version control?

  3. How to start a new workflow using a standard structure

  • Guidance on how to set up a new project from the ground up using a project template, ensuring consistency and good organisation from the start.

  1. How to adapt your existing code into a standard structure

  • Guidance on how to take an unstructured workflow, or collection of exploratory scripts, and reorganise them towards one of the template structures, without needing to rewrite everything from scratch.

  1. Maintaining healthy pipelines

  • Guidance on how to keep a pipeline organized, documented and working.

  1. Other resources

  • Links to other resources to support your workflow development.

Why use a standardised workflow structure?#

Many biology projects involve code developed under time constraints, often resulting in scripts with names like final.R, new_final.R, or final2.R, and folders such as data2/ or results_last_try/.

This kind of setup:

  • Makes it hard to remember what you did.

  • Makes it even harder for someone else to reproduce or continue your work.

  • Is typically fragile and has a high chance of leading to errors and wasted time e.g. by using the wrong data or running the wrong script.

A well-structured workflow that follows a standardised structure helps to address these issues. Using a well-structured workflow:

  • Brings logical clarity and organisation to the codebase.

  • Makes it easier to debug, share, and scale up your analysis.

  • Supports reproducibility, essential for trust in your research.

  • Saves the developer time in the long run, because it’s easier to find things and make changes. Using a standardised workflow structure:

  • Makes it easier for colleagues to work with your workflow.

  • Saves needing to ‘reinvent the wheel’ each time a new project is started.

Remember! Good programming is a mix of creativity, coding, and clarity.