Skip to Main Content

Research Data Management at CSHL: Data Storage

Data Storage

It's important to have a plan for data backupstorage, security, and depositing data in one or more repositories, including the CSHL Institutional Repository

Data Backup Best Practices
When backing up your data, make 3 copies using the Hear Near and Far Rule: 
  • Here: 1 local/working copy (e.g., on your lab computer)
  • Near: 1 external copy (e.g., on an external, portable hard drive, that is preferably kept offsite)
  • Far: a remote copy (e.g., cloud-based storage companies)

Regularly check your backup data to see if restoration of your data is still feasible.

Data Backup Resources:

Data Storage

Formats for Data Storage

Store files in non-proprietary formats whenever possible (i.e. .txt, .csv, .asc, .html, .xml) to enable more open, longer-term access, storage, and preservation of your data.

  • Text files - use TXT, XML, PDF/A, HTML, ASCII

  • Databases and Tabular data - use XML, CSV

  • Statistical data - use ASCII, DTA, POR, SAS, SAV

  • Movies - use AVI, MOV, MPEG,  MXF

  • Images - use TIFF, JPEG 2000, PDF, PNG, GIF, BMP

Other Examples of Commonly Used File Formats: 

Proprietary  Non-proprietary/Preferred 
Excel (.xls, .xlsx) Comma Separated Values (.csv or .tsv) ASCII
Word (.doc, .docx) Plain text (.txt), or PDF/A (.pdf)
PowerPoint (.ppt, .pptx) PDF/A (.pdf)
Photoshop (.psd) TIFF (.tif, .tiff)
Quicktime (.mov) MPEG-4 (.mp4)

See the Library of Congress Recommended Formats for a more extensive, regularly updated list

Data Security

Data Encryption: Although encryption may make your data more difficult to for collaborators and future users to access, sensitive data (e.g., data related to medical records or human subjects) may need to be encrypted.

If you need assistance with encrypting your data, please contact Information Technology at 516-367-8390 or at

Depositing Data in a Repository

Depositing your data in a repository is an important step in conducting responsible science, by enhancing data FAIRness and making it easier for other researchers to access, and potentially reuse your data. The NIH and other funding agencies either require or encourage depositing data collected in sponsored research in both local/institutional and external repositories.

CSHL Institutional Repository

Our institutional repository collects, preserves, and disseminates CSHL intellectual output, including preprints, published articles, as well as unpublished data. The overall goal of the CSHL repository is to enhance the discoverability and impact of your research. Our libguide provides more information. To maximize the visibility of your research, and to comply with the latest funder mandates and recommendations for data management and sharing, researchers should deposit their data in an external repository, and create an entry in the CSHL Institutional Repository that links to their external repository submissions. Importantly, all data submissions should be accompanied by descriptive metadata to be more findable, interpretable, and reusable.

Our institutional repository:

  • Collects multiple data types (software code, publications, raw data, theses, presentations, posters, etc.)
  • Helps comply with federal mandates for data management
  • Provides altmetrics to see who, what platforms, and how much people are highlighting your research
  • Allows for open access to enhance data sharing
  • Highlights the CSHL scientific community
  • Helps you and others cite your data by supplying a persistent identifier

Please contact us to work with you to submit your research. 

The CSHL Institutional Repository homepage (left) and an example of an entry page (right). 

External Repositories

There are a number of external disciplinary and multi-disciplinary repositories to choose from the submit your data. The CSHL Library can help you select a suitable data repository.

Commonly used data repositories:

  • Zenodo - Zenodo allows users to upload any file format and accepts figures, datasets, media, papers, posters, presentations and filesets. 50 GB/dataset, but can request more. Note: we do have a CSHL Zenodo Community.
  • Figshare - Figshare allows users to upload any file format and accepts figures, datasets, media, papers, posters, presentations and filesets. 100 GB/user, but can request more.
  • Dryad  - Dryad welcomes data files associated with any published article in the sciences or medicine, as well as software scripts and other files important to the article. 
  • Github - Github allows users, as they develop code, to have version control and deposit their code to an individual repository for each project.
  • Omero - Omero has a variety of features with emphasis on managing microscope images.

The following resources provide additional lists and search tools to help you identify an appropriate repository for your data: