Versioning refers to saving new copies of your files when you make changes so that you can go back and retrieve specific versions of your files later.
When creating new versions of your files, record what changes are being made to the files and give the new files a unique name. Follow the general advice on the site for naming files, but also consider the following:
Keep track of document versions either sequentially (e.g. v01, v02,) or with a unique date and time ( e.g. 20140403_1800)
DO: FileNm_Guidelines_20140409_v01.docx
DON’T: FileNm_Guidelines_20140409_Review.docx AND FileNm_Guidelines_20140409_Investigation.docx
BECAUSE: Two years from now, you won’t remember what you meant.
Simple file versioning
One simple way to version files is to manually save new versions when you make significant changes. This works well if:
Some useful practices include:
The directory below shows multiple versions of a web page mock-up called DMSSiteHome.jpg. Note the use of v1, v2, etc. to indicate versions. The notations "FISH" and "SandC" indicate different images that were swapped into some versions, i.e. major changes that were made.
Overall, saving multiple versions makes it possible to decide at a later time that you prefer an earlier version. You can then immediately revert back to that version instead of having to retrace your steps to recreate it.
The Simple File Versioning method requires that you remember to save new versions when it is appropriate. This method can become confusing when collaborating on a document with multiple people.
What the manual process described above requires most is self-discipline. Another approach is to use a version control system that manages changes to a project. These system automate some steps while enforcing others and thereby require less self-discipline for more reliable results.
It's hard to know what version control tool is most widely used in research today, but the one that's most talked about is undoubtedly Git. This is largely because of GitHub, a popular hosting site that combines the technical infrastructure for collaboration via Git with a modern web interface. GitHub is free for public and open source projects and for users in academia and nonprofits. GitLab is a well-regarded alternative that some prefer, because the GitLab platform itself is free and open source.
The one caveat may be that these systems do come with a learning curve.
A version control system stores snapshots of a project's files in a repository. Users modify their working copy of the project, then save changes to the repository when they wish to make a permanent record and/or share their work with colleagues. The version control system automatically records when the change was made and by whom, along with the changes themselves.
Crucially, if several people have edited 1 or more files simultaneously, the version control system will detect any overlapping changes and require conflicts to be resolved before storing the result. Modern version control systems allow repositories to be synchronized with each other, so that no 1 repository becomes a single point of failure. Tool-based version control has several benefits over manual version control:
The benefits of version control systems don't apply equally to all file types. In particular, version control can be more or less rewarding depending on file size and format. First, file comparison in version control systems is optimized for plain text files, such as source code. The ability to see so-called "diffs" is one of the great joys of version control. Unfortunately, while Microsoft Office files (like the .docx files used by Word) or other binary files, e.g., PDFs, can be stored in a version control system, it is not possible to pinpoint specific changes from 1 version to the next. Tabular data (such as CSV files) can be put in version control, but changing the order of the rows or columns will create a big change for the version control system, even if the actual data have not changed.
Second, raw data should not change and therefore should not require version tracking. Keeping intermediate data files and other results under version control is also not necessary if you can regenerate them from raw data and software. However, if data and results are small, we still recommend versioning them for ease of access by collaborators and for comparison across versions.
Third, today's version control systems are not designed to handle megabyte-sized files, never mind gigabytes, so large data or results files should not be included (as a benchmark for "large", the limit for an individual file on GitHub is 100 MB). Some emerging hybrid systems such as Git Large File Storage (LFS) put textual notes under version control while storing the large data in a remote server, but these are not yet mature enough for us to recommend.
Researchers dealing with data subject to legal restrictions that prohibit sharing (such as medical data) should be careful not to put data in public version control systems. Some institutions may provide access to private version control systems, so it is worth checking with your IT department.
Additionally, be sure not to unintentionally place security credentials such as passwords and private keys in a version control system where it may be accessed by others.
To learn about local software options consult IT Services at Brandon University. They can inform you how frequently backups occur.
Pros: As a local service option in BU, there are fewer concerns around storing data in countries that may have practices that differs from Canada when it comes to privacy and security.
Cons: Backup will likely not be real time in regards to versioning.
Drive's word processing, spreadsheet, and presentation software automatically create versions as you edit.
Pros: The real-time editing feature means that Google Drive works well for collaborating on files with multiple people. And because the files are on Google Drive, they are accessible from anywhere.
Cons: You are restricted to the software provided by Google, which may not have all the bells and whistles of your desktop word processing, spreadsheet, or presentation software.
If you have more sophisticated version control needs, you might consider a distributed version control system like git. Files are kept in a repository and users clone copies of the repository for editing and commit changes back to the repository when they are done.
Version control systems like git are frequently used for groups writing software and code, but can be used for any kind of files or projects. Many people share their git repositories on GitHub.
Adapted from:
Stanford Libraries, Research Support, Data Management Services. “Data versioning”. Accessed: February 22, 2021, https://library.stanford.edu/research/data-management-services/data-best-practices/data-versioning
Cornell University, Research Data Management Services Group. “File Management”. Accessed Feb 24th, 2021, https://data.research.cornell.edu/content/file-management
Wilson G, Bryan J, Cranston K, Kitzes J, Nederbragt L, Teal TK (2017) Good enough practices in scientific computing. PLoS Comput Biol 13(6): e1005510. https://doi.org/10.1371/journal.pcbi.1005510