Why version control? Why git?

If you have never used version control software for your projects, you might ask yourself, what benefits there possibly could be. The main characteristic of version control is that everything you ever have written in your project will be stored forever. Hence, at any time you could easily roll back to any prior point of your project, and check out the exact state that your project previously was in. Still, of course, it might not be immediately clear why this should be of any great use. I mean, who wants to roll back the project to some long past unfinished step in the middle of development? Well, let’s seeā€¦

To me, the main benefit of version control lies in collaboration. Compared, for example, with simple synchronization software like Dropbox, version control is just a lot more robust for collaboration. The reason lies in the conceptual design of synchronizing software. While Dropbox is incredibly powerful and easy to use when it comes to synchronizing and sharing content amongst collaborators, it is simply not designed to simultaneously working at the same file! Just assume two people applying changes to a given line simultaneously – how would Dropbox know, which changes should be kept? Such conflicting changes will always require human interaction in order to be resolved satisfactory. The goal of any software for collaboration, hence, must be to make the operation of resolving conflicts as cheap as possible. And this is exactly what git does, since it was explicitly built for this purpose. Linus Torvalds, the chief architect of the Linux kernel, created git in order to improve collaboration at this open source project with participation of thousands of individual developers.

However, don’t be fooled at this point into thinking that version control is for multi-person projects only. There is a different kind of multi-dimensionality involved in nearly every single person project as well: using multiple computers. So far, I myself use git to simply keep my research synchronized on work computer, private computer and netbook. Of course, this is something that Dropbox could do quite efficiently as well, as the number of conflicting changes should be rather negligible (basically, in single person projects you only run into conflicting changes when you accidentally start editing a file on the next computer, while Dropbox still hasn’t finished synchronizing – something that usually happens with either low bandwidth or large files). However, even without conflicting changes, you will still profit from some of the other features of version control.

Through branching, for example, git allows development of new code in a sandbox environment: without a risk of breaking functioning code in the current stable project version. For example, this becomes important whenever you choose to improve the performance of an already working part of code. You do not want to lose your old version, just to have a safety net in case that the experiment goes wrong. This becomes especially useful whenever you work out improvements on some code base which simultaneously is required to be in a reliable state for a second project as well. With branches, you can switch to the stable old version in a second. Or, if you want to extend some project that is currently under revision, just set a mark at the state of the project when you did hand it in. When you get back the paper with remarks on refinements, you can easily work in the refinements in a second branch, while your project already continues to evolve in some different direction on the main branch.

So, in my opinion, the largest benefit of version control is that it allows robust and easy collaboration. And even if you do not participate at a collaborative project right now, it might be a good idea to already get prepared for when the time comes. Maybe getting accustomed to a workflow with version control will even lower the barriers for future collaborations.

But why exactly should you prefer git? What about Mercurial, CSV, Subversion and the like? Well, honestly speaking, I might not be the best person to answer this question, as I only did work with git so far. However, by what I have heard so far, git allegedly is the unbeaten champion when it comes to further features like branching. It is extremely fast and reliable, and heavily used in open source software development.

But if you still need some further persuasion, and probably some more expertise as well, you better listen to the argumentation of the inventor of git himself. There exists a quite entertaining Google Tech Talk, where Linus Torvalds (“a man of strong opinions”, as he describes himself) alternately describes the developers of other version control software as either “morons” or “incredibly stupid people”.

Now that you are convinced: you can find plenty of resources to get started with git at the documentation section on the official git homepage. Furthermore, you can have free public repositories hosted by github, and free repositories for up to 5 simultaneous users by bitbucket. Using these external storage providers, git immediately becomes an efficient backup system as well. After you are sufficiently accustomed to the basic commands, there is plenty of inspiration for how to make the most of git in your daily workflow. Either take a look a the different git workflows presented on the Atlassian homepage, or read the blog post on A successful Git branching model.

Advertisements

Posted on 2013/09/29, in tools and tagged , . Bookmark the permalink. 3 Comments.

  1. Nice article. The link to the Google Tech Talk is broken, should be http://www.youtube.com/watch?v=4XpnKHJAok8

    You might have more luck using the shortened version, which doesn’t need the parameters: http://youtu.be/4XpnKHJAok8

  1. Pingback: git for data analysis – part I: putting data under version control? | Quantifying Information

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: