Citing Data & Code
It is just as important to give credit to data as other types of publications. Providing attribution to research data promotes easier access and allows results to be verified and re-purposed for future study.
How Do I Cite Data and Software (Code)?
- If the data or code is part of a paper's supplementary material, cite the paper
- If the data or code is associated with a paper but also exists separately (e.g., in a data repository or website), cite both the paper and the data/code separately.
- Always include enough information in the citation to identify a dataset or software with sufficient granularity that the work can be reproduced and credit properly assigned.
In all cases, check the source to see if the authors have indicated a preferred citation.
- Preferred citations are commonly placed in readme files or CITATION.cff files
- If the data or code is archived in a data repository, the repository may be able auto-generate a citation for you based on the information contained in the entry.
- Some programming environments can also auto-generate citations for you. For example, R has a citation() command that generates citations for packages.
If none of these methods yield a citation, the following information should be included in a data or software citation where appropriate
- Author(s) or creator(s)
- Title
- Publisher or data repository
- Publication Year (date dataset or software was released or published)
- Identifier (DOI or other unique identifier)
- Version
- Availability or access (URL, company that can provide data or software, etc.)
- Date accessed
Examples
Type | Citation Example |
---|---|
Dataset | Sidlauskas B (2007) Data from: Testing for unequal rates of morphological diversification in the absence of a detailed phylogeny: a case study From characiform fishes. Dryad Digital Repository. doi:10.5061/dryad.20. Accessed August 15, 2011. |
Tables, charts, graphs, maps or figures appearing in a publication | United States. Bureau of the Census. "Table 6. People with Income below Specified Ratios of their Poverty Thresholds by Selected Characteristics: 2009." Income, Poverty, and Health Insurance Coverage in the United States: 2009. http://www.census.gov/prod/2010pubs/p60-238.pdf. Accessed: 8/16/2011. |
Interactive database | U.S. Geological Survey. "Geology of Colorado". Parameters: Geologic Map, Quaternary Faults, Cities and Towns. Scale 1"=75 miles. Dataset: National Atlas of the United States http://nationalatlas.gov. Accessed August 15, 2011. |
Specific version of software w/DOI in a repository | Lewis John McGibbney, Omkar Reddy, Ibrahim Jarif, Noah Spahn, & Alex Goodman. (2018, November 30). nasa/podaacpy: Podaacpy v2.2.1 (Version 2.2.1). Zenodo. http://doi.org/10.5281/zenodo.1751973 |
Non-versioned software, citation date corresponds to commit date | Klimowsky, K. (2018). Datahog. Accessed May 5, 2019. |
A piece of software in general, no DOI available, w/online link | Boscher, D., Bourdarie, S., Brien, P., & Guild, T. (2008). IRBEMâ€LIB download. https://sourceforge.net/projects/irbem/. Accessed March 3, 2014. |
Software available in a data archive |
Lisa, M., & Bot, H. (2017). My Research Software (Version 2.0.4) [Computer software]. https://doi.org/10.5281/zenodo.1234 |
Software not available for download | MATLAB (2018). version 9.4 (R2018a), The MathWorks Inc., Natick, Massachusetts. |
More information on Citing Data
- Inter-University Consortium for Political and Social Research (ICPSR)'s Data Citations
- How to Cite Roper Center Data (scroll to the iPOLL section)
- DataCite
Citation Tools
- EndNote Web (free for UA affiliates)
- Zotero
- Mendeley
- CiteAs: Generate citations for many kinds of non-traditional research outputs.