Organizing your data, documents, and file system is critical for research data management. Fortunately, there are a number of resources describing standards and best practices that can help. The resources below provide information on how to organize your data, organize your files, preparing to future-proof and share your data through data documentation and metadata, and how to choose electronic lab notebooks for data organization.
Please contact us for assistance and consultation on any of these resources:
Phone: +1 516.367.6872
Librarian Email: libraryhelp@cshl.edu
As your data are collected but before it is analyzed, it needs to be organized so that it can be easily analyzed and examined. Common ways to organize your data include using spreadsheet best practices, and creating data dictionaries that are also used as a type of metadata (description of the data).
Data dictionaries are files that accompany and describe data files, particularly spreadsheet data, by defining each variable included in the dataset. Data dictionaries should be created for all datasets, so that you or others can understand the data now or in the future. As such, they are a common type of metadata, or data description that provide important context for the data collected.
Variable definitions can include the following information:
Additional Data Organization Resources
Preparing Tabular Data for Description and Archiving, Cornell University Library
Have you ever had trouble locating raw data, or any other file associated with a research project or publication? File organization is important to establish and maintain throughout a research project, and to aid reuse and reproducibility in the short and long term. The basic components of file organization include file naming, versioning, and file structure.
A File Naming Convention (FNC) is a framework for naming your files in a way that describes what they contain and their relationship to the project and other files. When establishing a FNC, there are 3 criteria: Organization, Context, Consistency. A well designed FNC will provide a preview of the content in each file, be organized logically (based on time of production), and identify the creator.
Aim for filenames no more than 25 characters in length.
Here's an example of file names created without and using an FNC:
File names with no FNC | File names with an FNC |
---|---|
Labwork_2017 | Labwork_Matt_03072017 |
Images_test | Images_Leicaconfocal_testsamples1-7_07092016 |
Sequence125 | Sequence_mouse_sample125_06092015 |
Video_387 | Video_behaviour_mouse387_05032016 |
Always remember a file naming convention breaks down if not followed consistently. When developing one be sure to include all the relevant information and feedback from everyone who needs to use the FNC (e.g., fellow lab members) and make sure that everyone is aware of it and knows how to apply it.
File Naming Resources:
File Naming Best Practices Handout from MIT Libraries
Hints and tips for developing your FNC
Versioning allows you to maintain different versions, or iterations of a file or set of files, and keep track of changes made over time. For example, in a collaborative project, you may want to know who made what changes, and why. You can do this by using version numbers within file names to delineate between updated versions (e.g., v1.1, v2.4) where a change in the first digit represents a major revision change, and a change in the second number represents a relatively minor revision change.
Example: FileName_1.0 (original file); FileName_1.1 (original file with minor changes); FileName_2.0 (original file with major revisions)
You can also create a log to substantively describe changes among versions. Such versioning logs can be created manually (e.g., in a text file) or automatically (e.g., using Google Drive). See the Versioning Resources below for more details.
Versioning Resources:
Version Control Tools and Techniques handout from MIT Libraries
Using Git for version control from NYU Data Services
Developing a hierarchical filing/folder system can seem daunting, but simple, best practices can make it easier to develop a system that helps you find files quickly in the short and long term. Once you develop a file structure system that works for you, follow it consistently. In developing a file structure, consider the following:
File Structure Resources:
Naming and Organizing Files and Folders from MIT Libraries
File organization strategies from NYU Libraries
Data Documentation and Metadata
Describing your data through documentation and metadata ("data about the data") provides necessary context for the future use or reuse of your data by yourself and others. Such descriptions are important to include with any stored or shared data files. There are multiple ways to document and describe data. Your choices should consider current and anticipated data uses.
Common Types of Data Documentation:
Metadata is structured information describing the characteristics (content, context, structure, other details) of a data product. Creating metadata is important because it supports responsible data discoverability, re-use and preservation.
Common Types of Metadata:
Readme files are fantastic organizational tools that you can use to document and describe anything from your own filing system, to a set of data and project-related documents that you share with others.
Readme best practices:
Documentation and Metadata Resources
Metadata Naming Authorities and Taxonomies (NYU Libraries)
Metadata Authoring Software (NYU Libraries)
Metadata standards/schema: DublinCore, MODS (Metadata Object Description Schema), DarwinCore