Skip to Main Content

NIH 2023 Data Management and Sharing Policy: Selecting a Repository

NIH Data Management and Sharing Policy Coming in 2023: Changes that may matter to you!
Click on the tabs directly above to explore the different considerations for choosing a repository. Look below for links to references.

 

Selecting a repository is an incredibly personal decision and no single repository will fit all needs. We recommend talking to colleagues in your field to see what they are using. The NIH has also provided two lists- one of generalist repositories and the other of subject matter repositories that you can use to help guide your decision. 

 

The library is happy to help vet repositories and check policies, procedures, and prices. We will do our best to help your data find its perfect home. 

 

From the Selecting a Repository Supplemental Information:While NIH supports many data repositories, it will not necessarily provide data repositories to preserve and share all data resulting from the research it funds. The broader repository ecosystem for biomedical data includes data repositories supported by other organizations, both public and private. NIH anticipates that the broader repository ecosystem will continue to evolve over time, providing different options for researchers as their data sharing needs continue to evolve.

Selecting a Data Repository:

repository

Desirable Characteristics for All Data Repositories:

repository

The NIH maintains a guide to selecting a data repository complete with guidance for what characteristics underlie a quality repository. Please see the list below:

  • Unique Persistent Identifiers: Assigns datasets a citable, unique persistent identifier, such as a digital object identifier (DOI) or accession number, to support data discovery, reporting, and research assessment. The identifier points to a persistent landing page that remains accessible even if the dataset is de-accessioned or no longer available.
  • Long-Term Sustainability: Has a plan for long-term management of data, including maintaining integrity, authenticity, and availability of datasets; building on a stable technical infrastructure and funding plans; and having contingency plans to ensure data are available and maintained during and after unforeseen events.
  • Metadata: Ensures datasets are accompanied by metadata to enable discovery, reuse, and citation of datasets, using schema that are appropriate to, and ideally widely used across, the community(ies) the repository serves. Domain-specific repositories would generally have more detailed metadata than generalist repositories.
  • Curation and Quality Assurance: Provides, or has a mechanism for others to provide, expert curation and quality assurance to improve the accuracy and integrity of datasets and metadata.
  • Free and Easy Access: Provides broad, equitable, and maximally open access to datasets and their metadata free of charge in a timely manner after submission, consistent with legal and ethical limits required to maintain privacy and confidentiality, Tribal sovereignty, and protection of other sensitive data.
  • Broad and Measured Reuse: Makes datasets and their metadata available with broadest possible terms of reuse; and provides the ability to measure attribution, citation, and reuse of data (i.e., through assignment of adequate metadata and unique PIDs).
  • Clear Use Guidance: Provides accompanying documentation describing terms of dataset access and use (e.g., particular licenses, need for approval by a data use committee).
  • Security and Integrity: Has documented measures in place to meet generally accepted criteria for preventing unauthorized access to, modification of, or release of data, with levels of security that are appropriate to the sensitivity of data.
  • Confidentiality: Has documented capabilities for ensuring that administrative, technical, and physical safeguards are employed to comply with applicable confidentiality, risk management, and continuous monitoring requirements for sensitive data.
  • Common Format: Allows datasets and metadata downloaded, accessed, or exported from the repository to be in widely used, preferably non-proprietary, formats consistent with those used in the community(ies) the repository serves.
  • Provenance: Has mechanisms in place to record the origin, chain of custody, and any modifications to submitted datasets and metadata.
  • Retention Policy: Provides documentation on policies for data retention within the repository.

Additional Considerations for Repositories Storing Human Data (even if de-identified):

“The additional characteristics outlined in this section are intended for repositories storing human data, which are also expected to exhibit the characteristics outlined in Section I, particularly with respect to confidentiality, security, and integrity. These characteristics also apply to repositories that store only de-identified human data, as preventing re-identification is often not possible, thus requiring additional considerations to protect privacy and security.” 

repository

The additional characteristics outlined in this section are intended for repositories storing human data, which are also expected to exhibit the characteristics outlined in Section I, particularly with respect to confidentiality, security, and integrity. These characteristics also apply to repositories that store only de-identified human data, as preventing re-identification is often not possible, thus requiring additional considerations to protect privacy and security. Please see the list below:

  • Fidelity of Consent: Employs documented procedures to restrict dataset access and use to those that are consistent with participant consent (such as for use only within the context of research on a specific disease or condition) and changes in consent.
  • Restricted Use Compliant: Employs documented procedures to communicate and enforce data use restrictions, such as preventing reidentification or redistribution to unauthorized users.
  • Privacy: Implements and provides documentation of appropriate approaches (e.g., tiered access, credentialing of data users, security safeguards against potential breaches) to protect human subjects’ data from inappropriate access.
  • Plan for Breach: Has security measures that include a response plan for detected data breaches.
  • Download Control: Controls and audits access to and download of datasets (if download is permitted).
  • Violations: Has procedures for addressing violations of terms-of-use by users and data mismanagement by the repository.
  • Request Review: Makes use of an established and transparent process for reviewing data access requests.

Subject Matter Repositories

Metabolomics Workbench (MetWB)
Cancer Nanotechnology Laboratory (caNanoLab)
Genomic Data Commons (GDC) 
Proteomic Data Commons (PDC)
The Cancer Imaging Archive (TCIA)
The Network Data Exchange (NDEx)
The Pediatric Genomic Data Inventory (PGDI)
NEI Data Commons
FlyBase: A Drosophila Genomic and Genetic Database
The NHGRI Genomic Data Science Analysis, Visualization, and Informatics Lab-space (AnVIL)
The Zebrafish Model Organism Database (ZFIN)
WormBase
Mouse Genome Informatics (MGI)
The Universal Protein Resource (UniProt)
Biologic Specimen and Data Repository Information Coordinating Center (BioLINCC)
National Sleep Research Resource
Rat Genome Database (RGD)
AD Knowledge Portal
National Archive of Computerized Data on Aging (NACDA)
NIDUS Delirium Research Hub
The National Institute on Aging Genetics of Alzheimer’s Disease Data Storage Site (NIAGADS) 
Eukaryotic Pathogen Database Resources (EuPathDB)
Immune Epitope Database and Analysis Resource (IEDB)
Influenza Research Database (IRD)
ITN TrialShare
Pathosystems Resource Integration Center (PATRIC)
TB Portals
The Immunology Database and Analysis Portal (ImmPort)
VDJServer Community Data Portal
VectorBase
Virus Pathogen Research (ViPR)
LONI Database
Medical Imaging and Data Resource Center (MIDRC)
NeuroImaging Tools and Resources Collaboratory (NITRC)
Child Language Data Exchange System (CHILDES)
Data and Specimen Hub (DASH)
Data Sharing for Demographic Research (DSDR)
National Children’s Study (NCS) Archive
PhonBank
Xenbase
Mouse Phenome Database (MPD)
National Addiction & HIV Data Archive Program (NAHDAP)
Neuroscience Information Framework (NIF)
AphasiaBank
FluencyBank
FaceBase
NIDDK Central Repository
NIDDK Information Network (DKnet)
The AMP-T2D Knowledge Portal (T2DKP)
Chemical Effects in Biological Systems (CEBS)
Cell Image Library
Database of Interacting Proteins (DIP)
PhysioNet
Biological General Repository for Interaction Datasets (BioGRID)
Inter-university Consortium for Political and Social Research (ICPSR)
NIMH Repository and Genomics Resources (NRGR)
OpenNeuro
Archived Clinical Research Datasets
NeuroMorpho.org
ClinicalTrials.gov
ClinVar
database of Genotypes and Phenotypes (dbGaP)
dbSNP
dbVar
GenBank
Gene Expression Omnibus (GEO)
PubChem
Sequence Read Archive (SRA)