Data documentation will ensure that your data will be understood and interpreted by any user. It will explain how your data was created, what the context is for the data, structure of the data and its contents, and any manipulations that have been done to the data.
Data documentation should start at the beginning of a project and continue throughout the project. It will make data documentation easier and less likely that you will forget details later on.
What's important to document?
Data-level documentation
Metadata explains the origin, purpose, time, geographic location, creator, access, and terms of use of the data. They are typically used for resource discovery and as a bibliographic record for a citation. Data catalogs or portals are often structured to different metadata standards. In addition, metadata can be harvested for data sharing through the Open Archives Initiative Protocol for Metadata Harvesting (OAI-PMH).
There are a variety of metadata standards, usually for a particular file format or discipline. Some examples include the following:
The UA Library can help you select the most appropriate metadata standard to use. Contact Chris Kollen, Data Curation Librarian.
When creating metadata, a best practice is to use controlled vocabulary, standard terminology for your discipline. Using an accepted standard, controlled vocabulary or an authority list (such as the Library of Congress Authorities), will help in the retrieving and indexing of your data.
Consider keeping metadata records in a spreadsheet, CSV file or tab-delimited file. Additional information needed to interpret the metadata - such as explanations of variable, codes, acronyms or abbreviations, or algorithms used - should be included as accompanying documentation.
In general, the University of Arizona Libraries suggests the following metadata elements:
| Title | Name of the project or collection of datasets |
| Creator | Names and institutions of the people who created the data |
| Date | Key dates associated with the data, such as dates covered by the data or date of creation |
| Description | Description of the resource |
| Keywords or Subjects | Keywords or subjects describing the content of the data |
| Identifier | Unique number or alphanumeric string used to identify the data |
| Coverage (if applicable) | Geographic coverage |
| Language | Language of the resource |
| Publisher | entity responsible for making the dataset available |
| Funding Agencies | Organization or agency who funded the research |
| Access restrictions | Where and how your data can be accessed by other researchers |
| Copyright | |
| Format | What format your data is in |