Skip to Main Content

Data Management: Version Control

How to manage data.

Version Control Introduction

Version Control

                Version control is the process in which various versions of the same document, code, or set of information is tracked.  A simple example is renaming files with each iteration of a document.  Other version control could involve GitHub that saves each iteration of a code and a previous version can be recovered at any point.  For data, there should always be an untouched copy of the raw data in case processing needs to be done from the beginning.  In terms of computational experiments, version control provides a form of backup, a historical record that can be used to find bugs or to recover previous versions, the ability to “branch” a project, and a collaborative environment. Versioning can save time and effort in research.  For example, if someone is working on a code that successfully completes a task, but then the person added a feature that renders the code unusable, versioning can allow the person to go back to the version of code that worked.  Otherwise, the person would have to spend time troubleshooting the code.

 

References

Briney, K., Coates, H. & Goben, A. Foundational Practices of Research DAta Management. Research Ideas and Outcomes 6, e56505 (2020).

Kanza, S. & Knight, N.J. Behind every great research project is great data management. BMC Research Notes 15, 20 (2022)

Nobel, W.S. A Quick Guide to Organizing Computational Biology Projects. PLOS Computational Biology 5, e1000424 (2009)