Data Repositories

Selecting a Data Repository

A data repository is a place to archive and make publicly available research datasets. To select an appropriate repository take the following steps

StepNote
1. Are you required to deposit in a certain repository?Some funders and journals require or recommend datasets be deposited in their repositories. Check the specific requirements or contact us for assistance in making this determination
2. Is there a discipline-specific repository?If you have a choice of where to deposit, look for commonly used repositories in your discipline. Some repositories are geared towards groups of disciplines while others are specific to a specific kind of research.
3. If there is no discipline-specific repository, select a general repositoryThere are several general-purpose repositories that can fulfill funder and journal sharing requirements. The choice often comes down to personal preferences.
Image
Redata logo

For a one-stop shop that addresses all funder, journal, and University data archiving and sharing requirements, consider ReDATA, the University of Arizona's Research Data Repository.

For archiving open access manuscripts, theses/dissertations, monographs, etc., please visit the Campus Repository.  Contact Kimberly Chapman, Director.

Desirable Characteristics of Data Repositories 

To help you identify suitable repositories, refer to the table of desirable characteristics of data repositories.

This table is adapted from the NIH's guidance for selecting a data repository, which follows the Desirable Characteristics of Data Repositories for Federally Funded Research guidelines set forth by the National Science and Technology Council (NSTC). The NSTC is part of the Office of Science and Technology Policy (OSTP). 

CharacteristicDescriptionReDATA
Unique Persistent IdentifiersAssigns datasets a citable, unique persistent identifier, such as a digital object identifier (DOI) or accession number, to support data discovery, reporting, and research assessment. The identifier points to a persistent landing page that remains accessible even if the dataset is de-accessioned or no longer availableYes
Long-Term SustainabilityHas a plan for long-term management of data, including maintaining integrity, authenticity, and availability of datasets; building on a stable technical infrastructure and funding plans; and having contingency plans to ensure data are available and maintained during and after unforeseen events.Yes
MetadataEnsures datasets are accompanied by metadata to enable discovery, reuse, and citation of datasets, using schema that are appropriate to, and ideally widely used across, the community(ies) the repository serves. Yes
Curation and Quality AssuranceProvides, or has a mechanism for others to provide, expert curation and quality assurance to improve the accuracy and integrity of datasets and metadata.Yes
Free and Easy AccessProvides broad, equitable, and maximally open access to datasets and their metadata free of charge in a timely manner after submission, consistent with legal and ethical limits required to maintain privacy and confidentiality, Tribal sovereignty, and protection of other sensitive data.Yes
Broad and Measured ReuseMakes datasets and their metadata available with broadest possible terms of reuse; and provides the ability to measure attribution, citation, and reuse of dataYes
Clear Use GuidanceProvides accompanying documentation describing terms of dataset access and use.Yes
Security and IntegrityHas documented measures in place to meet generally accepted criteria for preventing unauthorized access to, modification of, or release of data, with levels of security that are appropriate to the sensitivity of data.Yes*
ConfidentialityHas documented capabilities for ensuring that administrative, technical, and physical safeguards are employed to comply with applicable confidentiality, risk management, and continuous monitoring requirements for sensitive data.Yes*
Common FormatAllows datasets and metadata downloaded, accessed, or exported from the repository to be in widely used, preferably non-proprietary, formats consistent with those used in the community(ies) the repository serves.Yes
ProvenanceHas mechanisms in place to record the origin, chain of custody, and any modifications to submitted datasets and metadata.Yes**
Retention PolicyProvides documentation on policies for data retention within the repository.Yes
Fidelity to ConsentUses documented procedures to restrict dataset access and use to those that are consistent with participant consent and changes in consent.Yes***
Restricted Use CompliantUses documented procedures to communicate and enforce data use restrictions, such as preventing reidentification or redistribution to unauthorized users.Yes***
PrivacyImplements and provides documentation of measures (for example, tiered access, credentialing of data users, security safeguards against potential breaches) to protect human subjects’ data from inappropriate access.No^
Plan for BreachHas security measures that include a response plan for detected data breaches.Yes
Download ControlControls and audits access to and download of datasets (if download is permitted).No^
ViolationsHas procedures for addressing violations of terms-of-use by users and data mismanagement by the repository.Yes
Request ReviewMakes use of an established and transparent process for reviewing data access requests.No^

*Technical and administrative measures (e.g., NetIDs login, curatorial review prior to publication) help ensure data is not modified without authorization. Furthermore, administrative mechanisms help ensure sensitive data is not made public and, where applicable, that requirements for ethical data sharing are met.

**This information is retained internally.

*** ReDATA supports de-identified human data but requires documented consent. ReDATA's terms of use forbid users from attempting to reidentify participants.

^ ReDATA is intended for materials that are publicly releasable with unrestricted availability. It does not allow for restricting data downloads.

Tools for Finding Repositories

Data Indexers
NAMEDESCRIPTION
Re3DataRegistry of Research Data Repositories. A worldwide index of data repositories.
FairsharingA database of data repositories and related metadata standards and policies. Also useful for identifying metadata standards for writing a DMP.
Google Dataset SearchSearch for data across many data repositories and government websites
Data.worldCollection of community contributed datasets. For profit company, login required
Awesome DatasetsCollection of community contributed datasets
Other Resources
Examples of Well-known Data Repositories

We created this table to help with writing DMPs but it's also useful for finding repositories at the publication stage.