The CSHL Shared Resources generate data for the researchers at the Lab. Under the new NIH Data Management and Sharing Policy, the NIH encourages the maximization of data sharing. Here is a guide for NIH repositories depending on the shared resource which you generate data.
Open Access, Non-human organisms
"MGI is the international database resource for the laboratory mouse, providing integrated genetic, genomic, and biological data to facilitate the study of human health and disease."
Open Access, Non-human organisms
"The IMSR is a searchable online database of mouse strains, stocks, and mutant ES cell lines available worldwide, including inbred, mutant, and genetically engineered strains. The goal of the IMSR is to assist the international scientific community in locating and obtaining mouse resources for research. Note that the data content found in the IMSR is as supplied by strain repository holders."
Controlled, registered, and open access; Human and non-human organisms
"The Cancer Imaging Archive (TCIA) is a service which de-identifies and publishes medical image datasets to study cancer. The data are organized into "Collections" or "Analysis Results", typically subjects related by a common disease (e.g. lung cancer), image modality (MRI, CT, digitized pathology images, etc) or research focus. DICOM is the primary file format used by TCIA for radiology image storage, and many common formats are accepted for histopathology data. Supporting data related to the images such as patient outcomes, treatment details, genomics, proteomics and image analyses are also provided when available."
Registered Access; Human and non-human organisms
"BindingDB is a public, central, web-accessible database of measured binding affinities, focusing chiefly on the interactions of proteins considered to be candidate drug-targets with ligands that are small, drug-like molecules. BindingDB also includes a small collection of host-guest binding data of interest to chemists studying supramolecular systems. BindingDB is a FAIRshare recommended resource, with about 2.1M binding data for about 8,000 proteins and 920,000 small molecules, which is used worldwide for a range of activities, including drug discovery, computational chemistry, systems biology, and education."
Open Access, Human and non-human organisms
"The Electron Microscopy Data Bank (EMDB) is a public, worldwide repository that stores 3D density maps obtained using cryo-electron microscopy (cryo-EM) and other electron microscopy techniques."
Open Access, Human and non-human organisms
"The mission of the RCSB Protein Data Bank (PDB) is to sustain a unique data resource of three-dimensional biomolecular structure information."
Controlled, registered, and open access; Human and non-human organisms
"ImmPort is a public data sharing repository funded by NIAID. It provides: (1) A centralized, secure, reliable and scalable immunology data infrastructure with templates, user manuals and tools for researchers to manage data submission and data sharing. (2) User-friendly web interfaces and tools for query, download, integration, and analyses of shared data. (3) Data and metadata standards and data Quality Control (QC) procedures to facilitate data integration and secondary analyses."
Open access, Human and non-human organisms
"BioGRID ORCS is an open repository of CRISPR screens compiled through comprehensive curation efforts."
Open access, Human and non-human organisms
"MassIVE is a community resource for Computational Mass Spectrometry to promote the global, free exchange of mass spectrometry data."
Open access, Human and non-human organisms
"The NIH Common Fund's National Metabolomics Data Repository (NMDR) is now accepting metabolomics data for small and large studies on cells, tissues and organisms via the Metabolomics Workbench. We can accommodate a variety of metabolite analyses, including, but not limited to MS and NMR. In order to ensure reproducibility and interoperable use of data, we require experimental metadata (see tutorials) to be deposited along with the metabolite measurements. Processed data (measurements) maybe in the form of quantitated metabolite concentrations, MS peak height/area values, LC retention times, NMR binned areas, etc. Raw data in the form of MS and NMR binary files and associated parameter files may also be uploaded. We accept data from both targeted and untargeted studies. The Metabolomics Workbench also provides a suite of tools for analysis and visualization of the data. Step-by-step instructions for the whole process are provided on our Upload and Manage Experimental Data and Metadata page."
Registered and open access; Human and non-human organisms
"Panorama is a server-based data repository application for targeted mass spectrometry assays that integrates into a Skyline mass spec workflow. It is implemented as a module within LabKey Server, an open-source bioinformatics data management platform with extensive support for proteomics and small molecule data, other assay data, and a security model rich enough to support clinical studies."
Open access; Human and non-human organisms
"PeptideAtlas is a multi-organism, publicly accessible compendium of peptides identified in a large set of tandem mass spectrometry proteomics experiments. Mass spectrometer output files are collected for human, mouse, yeast, and several other organisms, and searched using the latest search engines and protein sequences. All results of sequence and spectral library searching are subsequently processed through the Trans Proteomic Pipeline to derive a probability of correct identification for all results in a uniform manner to insure a high-quality database, along with false discovery rates at the whole atlas level. Results may be queried and browsed at the PeptideAtlas web site. The raw data, search results, and full builds can also be downloaded for other uses. The PeptideAtlas SRM Experiment Library (PASSEL) is a component of the PeptideAtlas project that is designed to enable submission, dissemination, and reuse of SRM experimental results from analysis of biological samples."
Registered and open access; Human and non-human organisms
"The PRIDE PRoteomics IDEntifications (PRIDE) Archive database is a centralized, standards compliant, public data repository for mass spectrometry proteomics data, including protein and peptide identifications and the corresponding expression values, post-translational modifications and supporting mass spectra evidence (both as raw data and peak list files). PRIDE is a core member in the ProteomeXchange (PX) consortium, which provides a standardized way for submitting mass spectrometry based proteomics data to public-domain repositories. Datasets are submitted to ProteomeXchange via PRIDE and are handled by expert bio-curators. All PRIDE public datasets can also be searched in ProteomeCentral, the portal for all ProteomeXchange datasets."
Registered and open access; Human and non-human organisms
"The ProteomeXchange Consortium was established to provide globally coordinated standard data submission and dissemination pipelines involving the main proteomics repositories, and to encourage open data policies in the field."
Open access; Human data
"The Proteomic Data Commons hosts mass spectra and process data from cancer proteomic experiments"
Open access; Human and non-human organisms
"PubChem is an open chemistry database at the National Institutes of Health (NIH). “Open†means that you can put your scientific data in PubChem and that others may use it. Since the launch in 2004, PubChem has become a key chemical information resource for researchers, data scientists, health & safety, and other professionals. PubChem is an archive for chemical substances and biological assay experiments. PubChem mostly contains small molecules, but also larger molecules such as nucleotides, carbohydrates, lipids, peptides, and chemically-modified macromolecules. We collect information on chemical substances, including associated: identifiers, chemical names, biological activities, publications, patents, and much more."
Controlled, registered, and open access; Human and non-human organisms
"The Cancer Imaging Archive (TCIA) is a service which de-identifies and publishes medical image datasets to study cancer. The data are organized into "Collection" or "Analysis Results", typically subjects related by a common disease (e.g. lung cancer), image modality (MRI, CT, digitized pathology images, etc) or research focus. DICOM is the primary file format used by TCIA for radiology image storage, and many common formats are accepted for histopathology data. Supporting data related to the images such as patient outcomes, treatment details, genomics, proteomics and image analyses are also provided when available."
Open access; Human and non-human organisms
"A community repository for the ingestion, archival, preservation, management, distribution, and reuse of multi-scale microscopy data encompassing cellular networks, cellular and subcellular microdomains, and their macromolecular components. The Cell Image Library is comprised of software for researchers to upload, organize, process, and share project data prior to publication, as well as a searchable, public-facing website for users to disseminate their results and freely distribute their data for access and reuse by others. This website includes advanced cyberinfrastructure services and deep-learning based workflows, which make use of prominent high performance computing resources on the backend, allowing CIL users to automatically post-process and analyze public data at scale."
Open access; Human and non-human organisms
"The National Cancer Institute (NCI) Cancer Research Data Commons (CRDC) aims to establish a national cloud-based data science infrastructure. Imaging Data Commons (IDC) is a new data repository of CRDC supported by the Cancer Moonshot. The goal of IDC is to enable a broad spectrum of cancer researchers, with and without imaging expertise, to easily access and explore the value of de-identified imaging data and to support integrated analyses with non-imaging data utilizing CRDC Cloud Resources."
Controlled, registered, and open access; Human and non-human organisms
"The NeuroImaging Tools and Resources Collaboratory (NITRC) provides free access to data (MRI, EEG, MEG, CT, PET, etc.) and enables pay-per-use cloud-based access to unlimited computing power, enabling worldwide scientific collaboration with minimal startup and cost. With NITRC and its components—the Resources Registry (NITRC-R), Image Repository (NITRC-IR), and Computational Environment (NITRC-CE)—a researcher can obtain pilot or proof-of-concept data to validate a hypothesis for a few dollars."
Open access; Human and non-human organisms
"A free and open platform for validating and sharing BIDS-compliant MRI, PET, MEG, EEG, and iEEG data."
Open access; Human and non-human organisms
"The Brain Image Library (BIL) is a national public resource enabling researchers to deposit, analyze, mine, share and interact with large brain image datasets."
Open access; Human and non-human organisms
"BossDB is a volumetric database for 3D and 4D neuroscience data."
Open access; Human and non-human organisms
"The BRAIN Initiative archive for publishing and sharing neurophysiology data including electrophysiology, optophysiology, and behavioral time-series, and images from immunostaining experiments."
Controlled and open access; Human and non-human organisms
"The Neuroscience Multi-omic Archive (NeMO Archive) is a data repository specifically focused on the storage and dissemination of omic data generated from the BRAIN Initiative and related brain research projects."
Open access; Human and non-human organisms
"GEO is a public functional genomics data repository that archives open-access array- and sequence-based datasets. Tools are provided to help users query and download experiments and curated gene expression profiles."
Open access; Human and non-human organisms
"Sequence Read Archive (SRA) data, available through multiple cloud providers and NCBI servers, is the a broad scope publicly available repository of high throughput sequencing data. The archive accepts data from all branches of life as well as metagenomic and environmental surveys. SRA stores raw sequencing data and alignment information to enhance reproducibility and facilitate new discoveries through data analysis."
Controlled and open access; Human data
"The database of Genotypes and Phenotypes (dbGaP) was developed to archive and distribute the data and results from studies that have investigated the interaction of genotype and phenotype in Humans."
Controlled, registered, and open access; Human data
"The NHGRI Genomic Data Science Analysis, Visualization, and Informatics Lab-space (AnVIL) is a cloud-based genomic data sharing and analysis platform. AnVIL facilitates integration and computing on and across large datasets generated by NHGRI programs, as well as initiatives funded by National Institutes of Health (NIH), or by other agencies that support human genomics research. AnVIL is a component of the emerging federated data ecosystem and actively collaborates and integrates with other genomic data resources through the adoption of the FAIR (Findable, Accessible, Interoperable, Reusable) principles. AnVIL provides a collaborative environment and interfaces for consortia and researchers. AnVIL offers training and functionality for users that have limited computational expertise as well as sophisticated data scientist users."
Controlled and open access; Human data
"The Cancer Data Service (CDS) is a data repository under the NCI's Cancer Research Data Commons (CRDC) infrastructure for storing cancer research data generated by NCI funded programs. CDS provides secure and authorized storage and data sharing capabilities in the cloud for studies that do not have a repository specific for their data type or are wait listed or not approved by that repository for storage. CDS hosts both controlled and open access data. Permission to access controlled data on CDS is obtained through the NCBI dbGaP system. CDS hosts data and offers analysis capabilities through the NCI Cloud Resources. Seven Bridges Cancer Genomics Cloud, one of the NCI Cloud Resources, can be used to search and analyze data. Seven Bridges-CGC is established on Amazon Web Services (AWS)."
Open access; Human data
"ClinVar is a freely accessible, public archive of submitted reports about the relationships among human variations and phenotypes, with supporting evidence."
Open access; Human data
"In addition to single nucleotide alterations that are common enough in a population to be referred to as polymorphic, dbSNP also includes rare variants, such as those with clinical assertions in ClinVar. Simple genetic variations including single-base nucleotide variations (SNVs), small multi-base deletions or insertions, and microsatellite repeats are all included in the dbSNP database. Allele frequency from the NCBI ALFA project (https://www.ncbi.nlm.nih.gov/snp/docs/gsr/alfa/), which compiles data from dbGaP studies, is also included in dbSNP."
Open access; Human data
"dbVar is NCBI's database of human genomic Structural Variation large variants >50 bp including insertions, deletions, duplications, inversions, mobile elements, translocations, and complex variants."
Registered and open access; Human and non-human organisms
"The ENCODE project (Encyclopedia of DNA Elements) is a large-scale international research initiative aimed at identifying all functional elements in the human genome."
Open access; non-human database
"FlyBase is an online bioinformatics database and the primary repository of genetic and molecular data for the insect family Drosophilidae. For the most extensively studied species and model organism, Drosophila melanogaster, a wide range of data are presented in different formats.Information in FlyBase originates from a variety of sources ranging from large-scale genome projects to the primary research literature. These data types include mutant phenotypes; molecular characterization of mutant alleles; and other deviations, cytological maps, wild-type expression patterns, anatomical images, transgenic constructs and insertions, sequence-level gene models, and molecular classification of gene product functions. Query tools allow navigation of FlyBase through DNA or protein sequence, by gene or mutant name, or through terms from the several ontologies used to capture functional, phenotypic, and anatomical data. The database offers several different query tools in order to provide efficient access to the data available and facilitate the discovery of significant relationships within the database. Links between FlyBase and external databases, such as BDGP or modENCODE, provide opportunities for further exploration into other model organism databases and other resources of biological and molecular information.The FlyBase project is carried out by a consortium of Drosophila researchers and computer scientists at Harvard University and Indiana University in the United States, and University of Cambridge in the United Kingdom."
Open access; Human and non-human organisms
"GenBank is the NIH genetic sequence database, an annotated collection of publicly available DNA sequences. GenBank captures, preserves, and presents comprehensive nucleotide sequence information and annotations from around the world and connects these data, where applicable, to associated scientific publications and the biological specimens from which the data were derived to preserve the scientific record and enable broad sharing of such data. GenBank preserves nucleotide sequence data to support science necessary for the study of human health, food safety and security, and biodiversity, and other important domains."
Controlled and open access; Human data
"The database of Genotypes and Phenotypes (dbGaP) was developed to archive and distribute the data and results from studies that have investigated the interaction of genotype and phenotype in Humans."
Open access; Human and non-human organisms
"The Proteomic Data Commons hosts mass spectra and process data from cancer proteomic experiments. Many datasets have corresponding genomic and/or imaging data available in other nodes of The National Cancer Institute (NCI) Cancer Research Data Commons (CRDC)."
Registered and open access; Human and non-human organisms
"RGD has expanded to include a large body of structured and standardized data for ten species (rat, mouse, human, chinchilla, bonobo, 13-lined ground squirrel, dog, pig, green monkey/vervet and naked mole-rat). Much of this data is the result of manual curation work by RGD curators. In other instances, it has been imported into RGD from other databases through custom ELT (Extract, Load and Transform) pipelines giving RGD users integrated access to a wide variety of data to support their research efforts. RGD also offers a growing suite of innovative tools for querying, analyzing and visualizing this data making it a valuable resource for researchers worldwide."
Open access; Non-human data
"WormBase is an international consortium of biologists and computer scientists dedicated to providing the research community with accurate, current, accessible information concerning the genetics, genomics and biology of C. elegans and related nematodes."
Open access; Non-human data
"The Zebrafish Information Network (ZFIN) is the database of genetic and genomic data for the zebrafish (Danio rerio) as a model organism. ZFIN provides a wide array of expertly curated, organized and cross-referenced zebrafish research data."