Data Documentation and Metadata

Data Documentation

Data documentation will ensure that your data will be understood and interpreted by any user. It will explain how your data was created, what the context is for the data, structure of the data and its contents, and any manipulations that have been done to the data.

What's important to document?

Context of data collection
Data collection methodology
Structure and organization of data files
Data validation and quality assurance
Data manipulations through data analysis from raw data
Data confidentiality, access and use conditions

Data-level documentation

Variable names and descriptions
Definition of codes and classification schemes
Codes of, and reasons for, missing values
Definitions of specialty terminology and acronyms
Algorithms used to transform data
File format and software used

Metadata

Metadata explains the origin, purpose, time, geographic location, creator, access, and terms of use of the data. Structured metadata follows a standard and is usually stored in a specific format. Information in the structured metadata is used for retrieving and indexing data in a repository or archives; and for the citation. Metadata can be harvested by search engines for discoverability.

There are a variety of metadata standards, usually for a particular file format or discipline. Some examples include the following:

Astronomy Visualization Metadata
Darwin Core
Data Documentation Initiative (DDI) to document numeric data files
Dublin Core, a general purpose metadata standard
ISO 19115 or FGDC's Content Standard for Digital Geospatial Metadata for geospatial data
Ecological Metadata Language

Consult these directories for more comprehensive lists (and tools) of disciplinary metadata.

Digital Curation Centre metadata directory
Research Data Alliance metadata directory (community maintained)

The UA Library can help you select the most appropriate metadata standard to use.

When creating metadata, a best practice is to use controlled vocabulary, standard terminology for your discipline. Using an accepted standard, controlled vocabulary or an authority list will help in the retrieving and indexing of your data.

Consider keeping metadata records in a spreadsheet, CSV file or tab-delimited file. Additional information needed to interpret the metadata, such as explanations of variable, codes, acronyms or abbreviations, or algorithms used, should be included as accompanying documentation.

Suggested Metadata Elements

In the absence of a standard in your discipline, the University of Arizona Libraries suggests the following metadata elements. In their simplest form, these can be included as part of a readme file.

Element	Description
Title	Name of the project or collection of datasets
Creator	Names and institutions of the people who created the data
Date	Key dates associated with the data, such as dates covered by the data or date of creation
Description	Description of the resource
Keywords or Subjects	Keywords or subjects describing the content of the data
Identifier	Unique number or alphanumeric string used to identify the data like a DOI. Many repositories provide DOIs for deposited datasets.
Coverage (if applicable)	Geographic coverage
Language	Language of the resource
Publisher	Entity responsible for making the dataset available
Funding Agencies	Organization or agency who funded the research
Access restrictions	Where and how your data can be accessed by other researchers
License	E.g., CC0, CC By 4.0, MIT, etc. See the ReDATA license matrix for help selecting a license.
Format	What format your data is in

Example Readme Files

UA Research Data Repository Readme: This readme is used for all datasets deposited into ReDATA. The format is plain-text but can be rendered as Markdown. Template, Example.
Comprehensive readme: This readme is part of a survey dataset. The readme is exemplary in that it documents the data analysis process and explains each file and folder. Example.
Additional templates/examples:
- Cornell University
- University of Michigan