So I had a conversation last night with my good friend Steve about my decision to start using Mercurial. Talking to him, I realized I hadn’t really posted much on what I think about distributed version control systems (DVCS), so the switch to Mercurial may have taken many people off guard. So, I wanted to spend a post (maybe two) talking about what I see as the advantages of DVCS over a traditional “central server” (VCS) mentality.
The Leap of Faith
Like many, about the idea of version control without a central server scares me. I don’t trust my own machine, or my developer’s machines, to be in any way fault tolerant. My server, on the other hand, has a RAID 5 controller in it, and is backed up off-site weekly (though if I were checking in more I’d probably back up nightly). Having that central server keep track of my changes feels safer, so I shied away from DVCS, opting instead for centralized version control with Subversion with user branches (more on that here). Even with tools like svnmerge though, Subversion utterly fails at merge tracking, especially from multiple branches bidirectionally. It was just never scaled to do that. Since then, I’ve watched several videos on the design of distributed version control (two on Git (here and here), one on Mercurial) and I made the leap of faith to attempt to see what DVCS is like.
The Centralized Model, Distributed
Now, distributed version control is amazing for open source projects, but what about working in a company where team communication and process is key? Where coordination is a requirement? Well, the nice thing about distributed version control is that it can, if you need it to, support a centralized model. Even in open source projects, you have what you call an “upstream” server, which is what everyone consults for the latest and greatest “official” changes to a product. You could, if you wanted to, push every commit you made to the upstream server, and pull every so often to make sure you have the latest version. In that case it would be exactly like using centralized version control (except it takes a more hard drive space, as you’re holding a repository on your hard drive, not just the source code). Sure, you have to train people to commit, then push, but once they understand that, there’s no issue. In addition, your centralized version of your tree can have pre/post-commit hooks just like any other version control system, allowing you to create check-in gauntlets should the need arise. That said, I’m sure other source control providers make it easier (the new Source Safe and Perforce I believe have GUIs for create check-in gauntlets, but I could be wrong). Still, with an easily extended system (like Mercurial), these additions wouldn’t be hard.
Encouraging Experimentation and Checkpoints
So in this case, why use a DVCS if it’s just going to work like centralized version control. Well, even in this situation, you now have more options available to you than you did before. First is the ability to checkpoint whenever you want, without affecting other developers. The key reason most people give for using version control is the Chinese proverb “The palest ink is better than the best memory.” So what not increase your memory by encouraging your developers to commit as often as possible? And if they can do this even when their build is broken or when they’re not completely finished with something, why can’t we allow them to do this? In a centralized model, this is almost impossible (though packing apparently solves this partially). Then, when a developer is done, their commit can have all of the changes they just made, with a history of what they did. Of course, it doesn’t have to (rebasing is always an option) but having it there can help you see potential problems or flaws in thought process.
In addition, distributed version control can encourage experimentation in branches. The problem I have had with branches in the past is that they are a pain in the ass to manage on a centralized server. Even Subversion, which makes branching easy, doesn’t manage merging well at all, especially bi-directional merging. I’ve even had problems with Perforce merging in the past. Regardless of that, though, working off of branches and creating branches is never easy. Not as easy as it is in distributed version control environments anyway. In Mercurial, branches are just clones of the current repository. You can work in them, share them with others, and delete them without the central server being involved. This may sound bad, but it does encourage developer experimentation and sharing. If I can branch simply off of whatever my current state is, experiment a bit, show that change to other developers and get feedback, then merge back into my main development branch easily, that opens up a lot of opportunities for small team experimental work.
Here’s an example of where I could have used this in the past. One of my former companies did not do and did not encourage unit testing, and I could not convince my coworkers (or the management) to let me try it in some of our newer libraries. I decided, though, to do unit testing on any new modules I wrote anyway. I made a copy of the library into a directory that had a unit test environment set up, worked on the library (writing new tests, making them pass, etc), then copied my changes back over when I was done, and committed them to the central server. The unit test folder, though, was completely unversioned. If I’d had a DVCS, I could have cloned the library, done my work, then pulled changes back, leaving the unit tests versioned in their own folder. In addition, any other developer that was also working on those libraries could have easily pulled the unit tests from me, thus spreading a new practice at the company. This may be an edge case, but it is an example where simple branching and merging would have encouraged experimentation between developers.
Shelving, Packing, Branching
This gets into the second option you have with distributed version control. DVCS makes it easy for developers to share uncommitted changes with each other. Other source control providers give you the option through “shelving” or “packing”, but I can’t see it being as easy as it is with distributed version control. I can pull changes from anyone, on any branch, into a cloned version of my own repository to test things out without talking to a central server. Those changes don’t need to be based off of the same trunk and can easily be merged by whichever developer at whatever point, then pushed to the upstream server, all while retaining who made the original changes. This has to be experimented with to really see the full benefit, and honestly I haven’t done it enough, but I can see where it would be useful. Unlimitted simple branching with the ability to push and pull changes from any developer coupled with a strong revision history just sounds nice to me.
Fault Tolerance
The last thing I want to talk about is when things go wrong. We try to avoid it, but every so often, our central server goes down. Sure, we have backups off site, maybe we have a passive failover server, but if we don’t, our central server going down is very problematic. I have had this happened to me, and at fairly large companies that can afford lots of nice servers. If this happens in a traditional VCS, development (basically) stops for the day. Not so much in distributed environments. Since I can commit to my local repository all I want and share with everyone else without the benefit of a central server, a dead upstream doesn’t affect me at all. This, in my mind, is pretty awesome.
More…
I will say that I’m using a DVCS in a very small team environment, but looking at it, I believe it could scale very easily. I also think there are lots of interesting ways to use DVCS, especially in agile environments where small teams may need to communicate changes without affecting central efforts. I really want to go into these, but this is already too long, so maybe I’ll talk about them tomorrow. Until then, let me know what you think.