Library Guides: Postgraduate Researcher Guide: Data Management

Developing a Data Management Plan

A data management plan (DMP) is a working document on what data you will be collecting, how (and when) you will be collecting it, as well as how you will be storing, organising, and sharing it. Having a research data management plan in place before you begin collecting your data will prove useful for yourself at all stages of the research process.

Data management plans are often required when applying for research funding, and many funders will also have requirements for storing and sharing the data after collection (see our guidance on Open Research for further information).

You can work alongside your research supervisor to develop and update your DMP; remember that as your research progresses, your DMP may need to be modified to reflect this progress.

The University of Sunderland Data Management Plan template is linked below. Some funding bodies may have their own DMP template, if the funding body for your research has their own template then you must adhere to that instead.

University of Sunderland Data Management Plan Template

If your research is to be funded by a separate funding body and you will need to use their DMP template, many of which are available through DMPonline linked below. You will need to create an account to use this.

DMPonline

What a Data Management Plan Includes

You will begin your DMP with a brief description of the research you will be undertaking. If your research is being supported by a funding body, it is important to mention this here, as the funding body will have their own terms for the collection, storing, and sharing of the data. Remember that many funding bodies will have their own Data Management Plan templates which you should use in this case.

It is not necessary that all of the data will be collected by yourself, begin by identifying either the primary or secondary (or both) sources you will be collecting for your research.

Any data that have been collected and released openly by another researcher or organisation, for example from the Office of National Statistics (ONS), will be your secondary data. For this data similar considerations will need to be made as with your primary data in regards to its storage and usage. In advance of using this data, pay attention to any terms and licenses applied to it. Much secondary data released online will be under a Creative Commons license, allowing reuse and sharing, so long as attribution to the original creator is given. When using this type of data in your research, your DMP should include a DOI (or any other persistent identifier) which links to the data, information about terms of use and licenses applied to it, and how it will be used in your research.

Primary data is what will be collected by yourself; the types of data and methods used to collect it will be dependent on the topic of your research. For example, a researcher in psychology may collect data via qualitative interviews with research subjects; while a researcher in biosciences may collect data via analytical methods performed in a laboratory. A discussion of where this data fits in with the purpose and scope of your research might also be necessary. You will want to list the types of data you will collect, and how much of this data you expect to collect; this will also be relevant when you begin to plan for storage and sharing of the data.

Along with what data you will collect, information about the methods you use to collect it should be included. This will include both the hardware and software required to gather it, and the file format of the data created. Once you know this you are likely able to develop an estimate on how much capacity will be needed for your project.

The content of the data must also be acknowledged. If you are collecting personally identifiable information you will need to make sure that you are aware of what policies may apply to it.

The organisation system used for your data has importance both for yourself during the research process, and potentially other researchers once your data are shared.

Firstly, your data files should be organised within a followable filing system; in digital storage this is the structure of folders within which the data are stored. The heirarchy of your folders should descend from being quite general to more specific descriptors of the data files within them. Each filing system will be unique depending on the nature of the research and data contained within, and should not be overly complicated or layered. It should be logical to a user not involved in the research project where they can find what they are looking for. Any confidential or sensitive data may be held in a separate folder with password protections or limited access if necessary.

Your data files will then need to follow a naming convention which utilises some of the key properties of the data and their collection. Try not to use too many properties so that the file naming is unnecessarily long, but use enough so that identification of the data is straightforward. Examples of such properties include the data type, location, participant identifier, collection date and so on. Make sure the naming convention you intend to use is consistent, such as for date using DD/MM/YYYY throughout.

How and where your data will be stored is of particular importance, independent of the sensitivity of the data collected.

Before considering where your data are stored it is worth thinking about how much data you will be collecting, and what level of storage is likely to be required. If the data are likely to be substantial, it will be good to double check any cloud storage solution you intend on using (such as OneDrive) for total capacity (especially if you are already using it for previous work) and file size limits.

It is important to maintain a backup of your data at all points; reliance on one storage solution increases the impact of any incident of loss, theft, or data corruption. If you store a backup on a physical drive, make sure the drive is password protected to ensure that only you have access to the data it contains. This will protect the data in the event of theft.

The University of Sunderland provides staff and students with access to 1 Terabyte of storage through Microsoft OneDrive, this is a safe way to store research data as it is encrypted and password protected via two-factor authentication. If your data or parts of it are being collected using tools such as qualtrics, it is important to export this data to OneDrive as soon as possible.

Making a plan for how your data will be shared following the completion of your research project in advance may be mandatory depending on your funders requirements. The sharing of research data is also strongly encouraged through the University of Sunderland Open Research Statement.

Your data should be as open as possible, and as closed as necessary. So long as there is no commercial, legal, or ethical reason to restrict access to your research data, it should be deposited to a suitable data repository following the publication of your research.

The University currently does not have a data repository. SURE can be used to store some data using the date type. However, only files of up to 2GB can be uploaded. If you work with a funder, make sure you check their requirement around data sharing. They might require depositing data in a specific data repository.

There is a range of data repositories that you can explore. Some of the common cross-disciplinary repositories are:

Figshare: You can upload up to 20GB (Individual researchers can use Figshare+ to store a large amount of data linked to a research publication or project. This has a fee attached.)
Zenodo
Open Science Framework

There are also a number of discipline-specific data repositories. PLOS One created a list of some of these that you can view on their website.

Following the FAIR Principles of Open Research will allow your research data to potentially have a greater impact.

F - Findable. Your research data should contain appropriate metadata which allows it to be found easily by both humans and computers.
A - Accessible. Once a researcher finds your data they should be able to access it, or understand how access can be requested.
I - Interoperable. Upon access, the data should be compatible with other datasets and applications. This includes using industry-standard file formats.
R - Reusable. The data should be able to be reused for any future research, including reproducing your findings; having a creative commons license attached to the data simplifies this. Having documentation with guidance on, or a 'readme' file alongside the data can also improve it's reusability

University of Sunderland Library

Postgraduate Researcher Guide

Developing a Data Management Plan

What a Data Management Plan Includes