Citing Data & Code

It is just as important to give credit to data as other types of publications.  Providing attribution to research data promotes easier access and allows results to be verified and re-purposed for future study. 

How Do I Cite Data and Software (Code)?

  • If the data or code is part of a paper's supplementary material, cite the paper
  • If the data or code is associated with a paper but also exists separately (e.g., in a data repository or website), cite both the paper and the data/code separately.
  • Always include enough information in the citation to identify a dataset or software with sufficient granularity that the work can be reproduced and credit properly assigned.

In all cases, check the source to see if the authors have indicated a preferred citation. 

  • Preferred citations are commonly placed in readme files or CITATION files
  • If the data or code is archived in a data repository, the repository may be able auto-generate a citation for you based on the information contained in the entry.
  • Some programming environments can also auto-generate citations for you. For example, R has a citation() command that generates citations for packages.

If none of these methods yield a citation, the following information should be included in a data or software citation where appropriate

  • Author(s) or creator(s)
  • Title
  • Publisher or data repository
  • Publication Year (date dataset or software was released or published)
  • Identifier (DOI or other unique identifier)
  • Version
  • Availability or access (URL, company that can provide data or software, etc.)
  • Date accessed

Examples

Type Citation Example
Dataset Sidlauskas B (2007) Data from: Testing for unequal rates of morphological diversification in the absence of a detailed phylogeny: a case study From characiform fishes. Dryad Digital Repository. doi:10.5061/dryad.20.  Accessed August 15, 2011.
Tables, charts, graphs, maps or figures appearing in a publication United States. Bureau of the Census. "Table 6. People with Income below Specified Ratios of their Poverty Thresholds by Selected Characteristics: 2009." Income, Poverty, and Health Insurance Coverage in the United States: 2009http://www.census.gov/prod/2010pubs/p60-238.pdf.  Accessed: 8/16/2011. 
Interactive database U.S. Geological Survey.  "Geology of Colorado".  Parameters: Geologic Map, Quaternary Faults, Cities and Towns.  Scale 1"=75 miles.  Dataset: National Atlas of the United States  http://nationalatlas.gov. Accessed August 15, 2011. 
Specific version of software w/DOI in a repository Lewis John McGibbney, Omkar Reddy, Ibrahim Jarif, Noah Spahn, & Alex Goodman. (2018, November 30). nasa/podaacpy: Podaacpy v2.2.1 (Version 2.2.1). Zenodo. http://doi.org/10.5281/zenodo.1751973
Non-versioned software, citation date corresponds to commit date Klimowsky, K. (2018). Datahog. https://github.com/cyverse/datahog/tree/db190e3439cb4afe69460cea67c474669029b41f/. Accessed May 5, 2019.
A piece of software in general, no DOI available, w/online link Boscher, D., Bourdarie, S., Brien, P., & Guild, T. (2008). IRBEM‐LIB download. https://sourceforge.net/projects/irbem/. Accessed March 3, 2014.
Software not available for download MATLAB (2018). version 9.4 (R2018a), The MathWorks Inc., Natick, Massachusetts.

More information on Citing Data

Citation Tools