# Data Management
## Learning Objectives
- Understand the research data management lifecycle and its importance
- Create and maintain a Data Management Plan (DMP)
- Identify and comply with institutional and funding requirements for data management
- Implement strategies for data organization, versioning, and security during a research project
- Preserve and share data effectively after a research project, ensuring compliance and promoting reproducibility 

## Introduction to Research Data Management Lifecycle
In the realm of research and academia, there exists a comprehensive cycle dedicated to the management of research data, analogous to the well-known software development life cycle. Understanding this lifecycle is crucial as it encompasses various stages that correspond to the different phases of a research project: before, during, and after the research activities.

## Before the Research Project: Planning Phase
### Data Management Plan (DMP)
**Definition and Purpose**: A Data Management Plan (DMP) is a detailed document that outlines how data will be acquired, managed, preserved, and shared throughout the research project. It serves as a blueprint for handling data in an efficient and secure manner.
**Legal and Ethical Considerations**: The DMP must address any potential legal or ethical issues associated with the research data. This includes compliance with laws related to data protection, privacy, and the ethical use of data.
**Living Document**: It is important to note that a DMP is not static. It should be viewed as a living document that can be revised and updated as the project evolves and new needs or challenges arise.
**Benefits of a DMP**:
- **Efficiency and Safety**: Creating a DMP early on can save time and prevent data loss or breaches in security.
- **Data Sharing**: A well-prepared DMP facilitates the sharing of data, thereby supporting the principles of FAIR data management.
- **FAIR Principles**: These principles aim to enhance the Findability, Accessibility, Interoperability, and Reusability of digital assets, thus promoting research reproducibility.

![Data Management Plan](images/data_management_plan.png)

### Institutional and Funding Requirements
**University Requirements**: Many universities, including the example of the University of Exeter, mandate the inclusion of a DMP in all research proposals.
**Funder Requirements**: Research funders may require a DMP in a specific format, detailing certain types of content that need to be included. Additionally, it is advisable to account for research data management costs in funding applications.
### Resources and Support
**DMPonline Tool**: [DMPonline](https://dmponline.exeter.ac.uk/) provides researchers with templates and guidance for writing DMPs tailored to major research funders.
**University Support Teams**:
- **Research Data Management Team**: Researchers can seek assistance and advice from their university's Research Data Management Team, (rdm@exeter.ac.uk).
- **Research Ethics and Governance Team**: For projects involving personal or sensitive data, it is crucial to consult the university's [Research Ethics and Governance Team](http://www.exeter.ac.uk/cgr/researchethics/) to ensure compliance with ethical standards and regulations.

These are the essential components of the planning phase of the research data lifecycle. By adhering to these guidelines, researchers can ensure that their data management practices are robust, ethical, and aligned with both institutional and funder requirements, thereby paving the way for a successful research project.

## During the Research Project: Acquisition, Processing, and Analysis Phase
### Responsibilities and Data Handling
**Responsibility**: Each individual involved in handling the data is accountable for its integrity and security. This emphasizes the collective responsibility of the research team towards the data.

### Data Organization and Versioning
**File Structure and Naming**: Employing a logical file structure with meaningful file names is essential. This practice facilitates the easy location and tracking of files throughout the project lifecycle.
**Read-only Settings**: Setting files to read-only mode is a preventive measure against the accidental modification of data.
**Manual Versioning**: If opting for manual versioning of data files, each version should be distinctly named using dates or version numbers. Additionally, maintaining a supplementary file that logs details of changes can enhance traceability.
**Version Control Software**: For more systematic version control, software tools like Git can be used. These tools help manage changes to documents, computer programs, large websites, and other collections of information.

### Data Processing Strategies
**Consistent Data Representation**: Strategies for data processing, including how to handle missing data consistently, should be decided at the project's onset. It is crucial that all collaborators agree on these strategies to ensure uniformity.

### Storage and Security
**University Storage Options**: Researchers are provided with secure data storage solutions such as OneDrive, SharePoint, Research Data Storage (RDS), and Secure Data Research Hub (SDRH) by the university.
**Portable Storage Devices**: While convenient, portable storage devices like USB sticks are vulnerable and less secure. If used, these devices should be encrypted to protect data.
**Backup Strategy**: Implementing a robust backup strategy is vital for data security throughout the project. The recommended approach is the 3-2-1 principle, which involves keeping three copies of the data on two different media, with one copy stored in a separate physical location.

### Documentation and Metadata
**Importance of Documentation**: Data without its contextual information is often meaningless. Thus, maintaining thorough documentation throughout the project is essential.
**Metadata**: Metadata, or data about data, is crucial as it provides understanding and context to the main data collected. Documentation and metadata combined should render the data comprehensible and functional independent of any publication.

These are critical practices and strategies to be implemented during the acquisition, processing, and analysis phases of a research project. By adhering to these guidelines, researchers can ensure effective data management that safeguards the integrity, security, and usability of research data.

## After the Research Project: Preservation and Sharing Phase
### Expectations for Data Preservation
**Funder Requirements**: Many research funders expect that data deemed to have long-term value be preserved and made accessible for future research use. This expectation underscores the importance of maintaining good data management practices throughout the lifecycle of the project.

### Benefits of Data Sharing
**Enhancing Reproducibility**: By sharing data, researchers enable others to fully reproduce and verify their study results.
**Advancing Scientific Progress**: Sharing data helps prevent duplication of effort, thereby speeding up scientific advancements.
**Increasing Impact and Quality**: Making data accessible can increase the impact and quality of research by exposing it to a broader audience.
**Facilitating Collaboration**: Shared data fosters collaboration among researchers, which can lead to new insights and advancements.

### Steps for Effective Data Sharing
1. **Deciding What Data to Share**:<br>
Not all data can be openly shared due to ethical and commercial concerns. Researchers must carefully decide which data should be shared to enable others to reproduce their research.
2. **Choosing a Data Repository**:
- **Institutional Repository**: Universities often provide their own repositories for long-term data preservation, such as [Open Research Exeter (ORE)](https://ore.exeter.ac.uk/repository/).
- **External Repositories**: Depending on funder stipulations or research needs, researchers may opt for external data repositories. Tools like the [Registry of Research Data Repositories](https://www.re3data.org) can help in selecting an appropriate, subject-specific repository.
- **Registration of Data**: It's essential that all archived data is registered in systems like [Symplectic](https://researchpubs.exeter.ac.uk/login.html) to ensure it is accounted for and easily accessible.
3. **Licensing and Linking Research Outputs**:
- **License**: Choosing the right license is critical for data sharing. It is generally recommended to use a Creative Commons (CC) license to facilitate the legal sharing and reuse of datasets.
- **Data Access Statement**:A data access statement should be included in any publications to clearly guide users on how they can access the data.
4. **Uploading Data and Documentation**:<br>
The file formats used for data should be carefully considered, as some formats are more accessible and enduring than others. Proper documentation is equally vital to ensure that data can be understood and utilized by others.
These notes detail the essential steps and considerations for preserving and sharing research data effectively after the completion of a project. By following these guidelines, researchers not only comply with funder mandates but also contribute to the broader scientific community by making their data accessible and reusable.

**Adapted Content From:**
- [Research Data Management @ University of Exeter](https://www.exeter.ac.uk/research/researchdatamanagement/) (permission for use granted by Christopher Tibbs)
- [The Turing Way](https://github.com/alan-turing-institute/the-turing-way) (Copyright Â© The Turing Way Community) ([CC-BY licence](https://creativecommons.org/licenses/by/4.0/))

## Summary Quiz

In [1]:
from jupyterquiz import display_quiz
display_quiz("questions/summary_dmp.json")

<IPython.core.display.Javascript object>