LibGuides: Data Services: Sharing and Reuse of Data

Sharing and Reuse of Data in DMPs

When writing a Data Management Plan, you will be expected to answer a number of questions related to how you will share your data and / or make it easy to reuse.

Below are several examples of the types of questions you will be asked. Additional information about many of the topics is provided via links to our Best Data Handling Practices section.

What data will you be sharing and in what form?
(e.g. raw, processed, final)

How will you share the data?

How will potential users find out about your data?
With whom will you share the data, and under what conditions?
Will you share data via a repository, handle requests directly or use another mechanism?
When will you make the data available?
Will you pursue getting a persistent identifier for your data?
Will you provide users with accompanying information about how to use the data? (See Data Documentation.)

Have you considered what type of end-user license to
include with your data?

Guidance:

Note: Sometime it may not be necessary to share raw data for
various reasons (e.g., confidentiality, size of raw data). Think
about how future researchers may use your data and decide
what form of data to share. You should present a convincing
case for any restrictions on data sharing.

Consider where, how, and to whom data with acknowledged long-term value should be made available.

Note: Using an internationally recognized license such
as Creative Commons Zero makes it easier to reuse
your data and increases the likelihood that your data
will be cited.

Note: If you need help managing intellectual property
issues contact your librarian for assistance.

Are any restrictions on data sharing required?

What action will you take to overcome or minimise restrictions?
For how long do you need exclusive use of the data and why?
Will a data sharing agreement (or equivalent) be required?

Guidance:

Outline any expected difficulties in sharing data with
acknowledged long-term value, along with causes and
possible measures to overcome these.

Restrictions may be due to confidentiality, lack of consent
agreements or IPR, for example.

Consider whether a non- disclosure agreement would
give sufficient protection for confidential data.

Tri-Agency and Sharing

The Tri-Agency’s Research Data Management Policy States:

“Grant recipients are not required to share their data. However, the agencies expect researchers to provide appropriate access to the data where ethical, cultural, legal and commercial requirements allow, and in accordance with the FAIR principles and the standards of their disciplines. Whenever possible, these data, metadata and code should be linked to the publication with a persistent digital identifier”.

This statement does not mitigate the need to (1) create DMPs, (2) manage data appropriately and (3) preserve data.

It does indicates that where possible (I.e. appropriate access as allowed) data should be shared in a manner makes the most sense.

Dataverse at Brandon University

When deciding How to Share your research, an excellent place to start is Brandon University’s institutional instance of Dataverse. It:

Assigns DOIs to your data sets.
Provides researchers with Licensing options for data sets.
Provides for the inclusion of useful Metadata.
Is indexed by the Federated Research Data Repository (FRDR) making the data locatable.
Has built-in mechanisms for Data Citation and Attribution of Credit.
Allows researchers to manage access to data and make them:
- Private.
- Available to only (1) certain IPs, (2) individual account(s), or (3) to specific groups.
Provides Data Analysis and Exploration Tools.

Dataverse Does NOT accept content that contains confidential or sensitive information.

Dataverse can be used to share de-identified and non-confidential data only.
Contributors are required to remove, replace, or redact such information from datasets prior to upload.

Dataverse at Brandon University Does Not accept file sets over 3GB. For massive file sets see FRDR.

Scholars at Brandon University:

Can register for an account and read our Dataverse Policy.
Have access to useful Training Materials.
Can contact Scholarly Communications Library Services if they have any questions.

Federated Research Data Repository (FRDR)

FRDR provides many data services - like Dataverse. It does, however have one major advantage over our institutional instance of Dataverse: it can accommodate massive data sets. Our institutional instance of Dataverse is limited to 3 GB.

Any Canadian researcher can get an account at FRDR and use it to do many of the same things as Dataverse. You will need to read their Terms of Use.

For additional information about using FRDR consult their Preparing Your Data site and view their Getting Started with FRDR: Upload your Dataset or Getting Started with FRDR: Download a Data Set videos on YouTube.

Additional Data Repositories

A variety of additional repositories that archive research data are available.

Researchers who wish to locate a specific repository are encouraged to start by using Re3data.org which provides a comprehensive listings of disciplinary and institutional repositories to host and share research data.

You can use Re3Data to browse for data repositories by:

Subject

Content Type

Geographical Areas

Other places to find relevant repositories include a few publisher recommended lists such as

Nature - Recommended data repositories
Open Access Directory (OAD) - List of repositories and databases for open data
PLOS One - Recommended data repositories

....as well as a few, reputable general data repositories:

Figshare - a general purpose repository often used in partnership w/ PLOS publications.
Dryad - frequently used for scientific and medical publication
Zenodo - a general purpose repository that integrates with Github for archiving and minting DOIs for Github repos
ICPSR - a repository commonly used for social science data
Qualitative Data Repository - for qualitative data, typically used for digital humanities and social sciences
Scholar's Portal Dataverse - the Canadian high level Dataverse that connects Canadian institutional dataverses. The SP Dataverse may be a good solution for cross-institutional collaborations.
FRDR - The Federated Research Data Repository is a Canadian solution for archiving large/big data

Note: Funders or publishers may have specific recommendations around where you share your data. A good practice is to make use of our instance of dataverse to house / backup data - AS WELL AS sharing it via a subject repository recommended by a funder or publisher. The latter will make your data highly visible to researchers in a discipline. The former ensures the data is protected.

Data Papers & Data Journals

The rise of the “data paper”

Datasets are increasingly being recognized as scholarly products in their own right, and as such, are now being submitted for standalone publication. In many cases, the greatest value of a dataset lies in sharing it, not necessarily in providing interpretation or analysis. For example, this paper presents a global database of the abundance, biomass, and nitrogen fixation rates of marine diazotrophs. This benchmark dataset, which will continue to evolve over time, is a valuable standalone research product that has intrinsic value. Under traditional publication models, this dataset would not be considered "publishable" because it doesn't present novel research or interpretation of results. Data papers facilitate the sharing of data in a standardized framework that provides value, impact, and recognition for authors. Data papers also provide much more thorough context and description than datasets that are simply deposited to a repository (which may have very minimal metadata requirements).

What is a data paper?

Data papers thoroughly describe datasets, and do not usually include any interpretation or discussion (an exception may be discussion of different methods to collect the data, e.g.). Some data papers are published in a distinct “Data Papers” section of a well-established journal (see this article in Ecology, for example). It is becoming more common, however, to see journals that exclusively focus on the publication of datasets. The purpose of a data journal is to provide quick access to high-quality datasets that are of broad interest to the scientific community. They are intended to facilitate reuse of the dataset, which increases its original value and impact, and speeds the pace of research by avoiding unintentional duplication of effort.

Are data papers peer-reviewed?

Data papers typically go through a peer review process in the same manner as articles, but being new to scientific practice, the quality and scope of the process is variable across publishers. A good example of a peer reviewed data journal is Earth System Science Data (ESSD). Their review guidelines are well described and aren't all that different from manuscript review guidelines that we are all already familiar with.

You might wonder, What is the difference between a 'data paper' and a 'regular article + dataset published in a public repository'? The answer to that isn’t always clear. Some data papers necessitate just as much preparation as, and are of equal quality to, ‘typical’ journal articles. Some data papers are brief, and only present enough metadata and descriptive content to make the dataset understandable and reusable. In most cases however, the datasets or databases presented in data papers include much more description than datasets deposited to a repository, even if those datasets were deposited to support a manuscript. Common practices and standards are evolving in the realm of data papers and data journals, but for now, they are the Wild West of data sharing.

Where do the data from data papers live?

Data preservation is a corollary of data papers, not their main purpose. Most data journals do not archive data in-house. Instead, they generally require that authors submit the dataset to a repository. These repositories archive the data, provide persistent access, and assign the dataset a unique identifier (DOI). Repositories do not always require that the dataset(s) be linked with a publication (data paper or ‘typical’ paper; Dryad does require one), but if you’re going to the trouble of submitting a dataset to a repository, consider exploring the option of publishing a data paper to support it.

How can I find data journals?

The article by Candela et al (2015) includes a dataset of data journals that they used for their analysis, with a list of more than 100 data journals
Candela, L., Castelli, D., Manghi, P., & Tani, A. (2015). Data journals: A survey. Journal of the Association for Information Science and Technology, 66(9), 1747–1762. https://doi.org/10.1002/asi.23358
This blog post by Katherine Akers, from 2014, also has a long list of existing data journals.
This slightly dated list of data journals

Additional information on data journals

"How to publish your data in a data journal" (from the NatureJobs job blog)
"Data papers as a new form of knowledge organization in the field of research data" (Schöpfel et al 2019)
"Data journals: incentivizing data access and documentation within the scholarly communication system" (Walters 2020)

Taken from

Oregon State Libraries. Research Data Services. Data Paper & Data Journals. Accessed March 1st, 2021.
https://guides.library.oregonstate.edu/research-data-services/data-management-data-papers-journals This work is licensed under a Creative Commons Attribution NonCommercial 4.0 International License

Persistent Identifiers (PIDs) for Data

A persistent identifier is a long-lasting reference to a digital resource. Unlike URLs, which may break, a persistent identifier reliably points to a digital entity.

An ORCID iD is an example of a persistent identifier for a person.
DOIs (digital object identifiers) are persistent identifiers for things or entities such as journal articles, books, and datasets.

It is important to have a persistent identifier for your data set(s) as they:

Provide a permanent link that points to your data; making the data findable.
Are Machine-readable.
Connect the Persistent Identifier of a Dataset with the Persistent Identifier of a Journal Article.
- This occurs for any citation of the dataset - be it the original publication or a follow up study by another researcher - meaning researchers can track the impact of their research data.
Connect the Persistent Identifier of a Dataset with the Persistent Identifier of the Researcher.
- This means the data can be readily associated with a researcher if they have a researcher ID.

Checklist: Before Sharing Data...

Before uploading your data into your preferred repository, make sure of the following:

1. You have addressed all confidentiality and privacy issues pertaining to the data you intend to share such as::

Modifications to data as required to ensure confidentiality / privacy
Having secured permissions to share research data (in a manner that does not reveal confidential information) as part of a Research Participant Agreement.

2. The data is in a useful format that others can easily make use of and that will remain useful over time.

3. You have created information about the data that will accompany the data in the repository. This should include:

Data Documentation
Metadata
Software - identifying software / versions used to analyze data in various forms / versions.

4. You have addressed all Intellectual Property issues such as:

Having selected the appropriate License(s) to accommodate data sets.
Addressing your ability to share data derived from other sources. Be sure you understand what rights you have with regard to data you are using. You may or may not have the right to share these data. Check the license associated with these data or the data owners to verify what you can and can't do with these data.
Co-researcher agreements over ownership of the data.

5. You have identified relevant permissions for accessing data in the repository if the data cannot be made openly available.

6. You have identified who to contact for permission to access data.

7. You have either assigned a DOI to your data - OR - have verified that the repository you are placing your data in will assign it a DOI.

Can I Share My Data: Portage’s Decision Tool

Deciding whether you can share you data was recently made easier to understand with Portage’s Can I Share My Data Decision Tree. It will walk you through many of the issues discussed and help you make an informed decision.