LibGuides: Data Services: Documentation

DMPs and Data Documentation

Documenting the data collection process is an important part of good research data management practices. It allows for sharing and reuse of data by others, enables the replication of the research and serves as a reference for the research team to verify findings and compare the research project with others. Depending on the data collected and the research project, different types of documentation can be useful.

Because documenting data has many benefits, DMPs include questions about the practice such as:

What documentation and metadata will accompany the data?

What information is needed for the data to be to be read and interpreted in the future?
What metadata standards will you use and why?

How will you make sure that documentation is created or captured consistently throughout your project?

How will you capture / create this documentation and
metadata?

Guidance:

Describe the types of documentation that will accompany
the data to help secondary users to understand and reuse it.
This should at least include basic details that will help people
to find the data, including who created or contributed to the data,
its title, date of creation and under what conditions it can be accessed.

Documentation may also include details on the methodology used,
analytical and procedural information, definitions of variables,
vocabularies, units of measurement, any assumptions made,
and the format and file type of the data.

Consider how you will capture this information and where it will
be recorded. Wherever possible you should identify and use existing community standards.

Consider including information that aids in reproducibility with information about file types,software accessibility and by
referencing all relevant software / versions used when
analyzing data. https://f1000research.com/articles/9-1257/v2

Note: This section is really about project management,
In larger projects, different team members may not interact regularly, and may be working on discrete portions
of the project, so it is important to establish clear
and explicit procedures. This also is useful
in the event of turnover among the project team.

A Brief Introduction to Data Documentation

Data documentation explains how data were created or digitized, what data mean, what their content and structure are and any data manipulations that may have taken place. Documenting data should be considered best practice when creating, organising and managing data and is important for data preservation. Whenever data are used sufficient contextual information is required to make sense of that data.

Good data documentation includes information on:

The context of data collection: project history, aim, objectives and hypotheses.
Data collection methods: sampling, data collection process, instruments used, hardware and software used, scale and resolution, temporal and geographic coverage and secondary data sources used.
Dataset structure of data files, study cases, relationships between files.
Data validation, checking, proofing, cleaning and quality assurance procedures carried out.
Changes made to data over time since their original creation and identification of different versions of data files.
Information on access and use conditions or data confidentiality.

At the data-level, documentation may include:

Names, labels and descriptions for variables, records and their values.
Explanation or definition of codes and classification schemes used.
Definitions of specialist terminology or acronyms used.
Codes of, and reasons for, missing values
Derived data created after collection, with code, algorithm or command file
Weighting and grossing variables created
Data listing of annotations for cases, individuals or items

Data-level descriptions can be embedded within a data file itself. Many data analysis software packages have facilities for data annotation and description, as variable attributes (labels, codes, data type, missing values), data type definitions, table relationships, etc.

Other documentation may be contained in publications, final reports, working papers and lab books or created as a data collection user guide.

Taken From:

Veerle Van den Eynden, Louise Corti, Matthew Woollard, Libby Bishop and Laurence Horton. 2011. Managing and Sharing Data: Best Practices for Researchers. 3rd Edition. U.K Data Archive. Accessed March 6th, 2021. https://ukdataservice.ac.uk/media/622417/managingsharing.pdf

How to Document Data

Approaches to Documentation can include:

Readme Files: Provides information about a data file to ensure that it correctly interpreted. Cornell University has an excellent guide on Readme files.
Codebooks: Describes the contents and structure of data collection. It provides information about the study, variables, data files, etc. The DDI Alliance has a comprehensive guide on how to create a codebook.
Data Dictionaries: Makes data understandable by explaining the meaning of the variables and values. The Open Science Framework has a guide on how to create a data dictionary.