Data Documentation
Data documentation will ensure that your data will be understood and interpreted by any user. It will explain how your data was created, what the context is for the data, structure of the data and its contents, and any manipulations that have been done to the data.
What's important to document?
- Context of data collection
- Data collection methodology
- Structure and organization of data files
- Data validation and quality assurance
- Data manipulations through data analysis from raw data
- Data confidentiality, access and use conditions
Data-level documentation
- Variable names and descriptions
- Definition of codes and classification schemes
- Codes of, and reasons for, missing values
- Definitions of specialty terminology and acronyms
- Algorithms used to transform data
- File format and software used
Example Readme Files
- Simple readme: Easy data management: add a README.txt to your project folders.
- Comprehensive readme: This readme is part of a survey dataset. The readme is exemplary in that it documents the data analysis process and explains each file and folder. In addition the dataset includes license files. The dataset has also been published in Zenodo, capturing many of the metadata fields suggested in the following section.
- Simon Hettrick. (2018, February 23). softwaresaved/software_in_research_survey_2014: Software in research survey (Version 1.0). Zenodo. http://doi.org/10.5281/zenodo.1183562
- Readme template: This template is used for the UA Research Data Repository (ReDATA)
Metadata
Metadata explains the origin, purpose, time, geographic location, creator, access, and terms of use of the data. Information in the metadata is used to retrieving and indexing data in a repository or archives; and for the citation. Metadata can be harvested for data sharing through the Open Archives Initiative Protocol for Metadata Harvesting (OAI-PMH).
There are a variety of metadata standards, usually for a particular file format or discipline. Some examples include the following:
- Astronomy Visualization Metadata
- Darwin Core
- Data Documentation Initiative (DDI) to document numeric data files
- Dublin Core, a general purpose metadata standard
- ISO 19115 or FGDC's Content Standard for Digital Geospatial Metadata for geospatial data
- Ecological Metadata Language
Consult these directories for more comprehensive lists (and tools) of disciplinary metadata.
- Digital Curation Centre metadata directory
- Research Data Alliance metadata directory (community maintained)
The UA Library can help you select the most appropriate metadata standard to use.
When creating metadata, a best practice is to use controlled vocabulary, standard terminology for your discipline. Using an accepted standard, controlled vocabulary or an authority list will help in the retrieving and indexing of your data.
Consider keeping metadata records in a spreadsheet, CSV file or tab-delimited file. Additional information needed to interpret the metadata, such as explanations of variable, codes, acronyms or abbreviations, or algorithms used, should be included as accompanying documentation.
Suggested Metadata Elements
In the absence of a standard in your discipline, the University of Arizona Libraries suggests the following metadata elements. In their simplest form, these can be included as part of a readme file. The UA Research Data Repository (ReDATA) readme template contains a minimal set of elements.
Element | Description |
---|---|
Title | Name of the project or collection of datasets |
Creator | Names and institutions of the people who created the data |
Date | Key dates associated with the data, such as dates covered by the data or date of creation |
Description | Description of the resource |
Keywords or Subjects | Keywords or subjects describing the content of the data |
Identifier | Unique number or alphanumeric string used to identify the data like a DOI. Many repositories provide DOIs for deposited datasets. |
Coverage (if applicable) | Geographic coverage |
Language | Language of the resource |
Publisher | Entity responsible for making the dataset available |
Funding Agencies | Organization or agency who funded the research |
Access restrictions | Where and how your data can be accessed by other researchers |
License | E.g., CC0, CC By 4.0, MIT, etc. See the ReDATA license matrix for help selecting a license. |
Format | What format your data is in |