In the first part we started a community buy-in process to support the migration and we set out the technical stage. In this part we’ll map out the road for moving the code from Subversion (SVN) to Mercurial (Hg).
Source and Target layout are most likely different from one another. You need to test if the selected conversion tool supports the source layout. Most tools handle standard/canonical layouts, but few repositories follow such layouts strictly and consistently over time.
The Hugin SVN repository was itself the result of a migration from an even older tool, CVS. The subdivisions of the Hugin codeline did not follow the canonical trunk/branches/tags subdivision to the letter: We had good reason to distinguish three kind of branches: development branches, obsolete_branches, releases. Moreover the repository contained seven unrelated code lines because of the SourceForge limitation to one SVN repository per project. The sensible choice was to separate each of the seven code lines into its own Hg repository. In Hg, branches and tags are not part of the layout and they only need to be addressed in terms of history conversion.
History Clean Up
The next big question is how far back do you need to go? And to what level of detail? We decided to keep the SVN repository publicly accessible to document history. This freed us from the need for a detailed reconstruction of the past.
You will have a wide range of choices from painstakingly reconstructing every single past changeset to pragmatically start from scratch with a current code snapshot. The trade-off is between effort, storage requirements, and benefits to the project. I decided to go as far back and into as much detail as the automated tools enable me with little effort; and to step beyond that only in case the benefits outweigh the extra effort.
This meant giving up on the history of past development branches. The nature of SVN merge operations implicitly omits carrying the history of the development branch into trunk. To fully reconstruct history one must extract the development branch and transplant it into the Hg default code line. Maybe feasible but time consuming.
Save that time. You will need it to comb a few knots you’ll find hidden inside SVN history. The result of less than optimal manipulations, these knots are quickly fixed in subsequent SVN revisions so that they do not affect day to day operation. They get forgotten until somebody has to dig up history. We had two such knots in Hugin:
Movie files that do not belong in the repository landed there by mistake. A few revisions later they were removed and stopped affecting daily checkout operations. But they’re still there, represent more than 75% of the weight of the Hugin SVN repository, and will affect the Hugin Hg repository if left untreated.
We also had an unorthodox switch of a branch to replace trunk completely. It worked well while using SVN but automated conversion tools trip over unconventional layout operations. Luckily this only left a small cosmetic scar with the tool retained. I decided not to spend time on cosmetic aspects and left the scar untouched.
The advent of distributed RCS spurred development of a panoply of tools to efficiently move around bits of code. It was difficult to discern upfront which tool would work for my specific scenario. I’ve tried a few of them and the one that worked best for me was Mercurial’s own convert extension. Another tool that was helpful in the process was Mercurial’s hgk extension.
Edit the following lines int your ~/.hgrc file (create it if it does not exist) to activate these extensions. You will also need the directives in the [ui] section:
username = YOU <firstname.lastname@example.org>
verbose = True
Changesets are committed by users. The definition of a user in Hg differs from SVN. We need to map SVN users to Hg users. The syntax of the file is one user per line with a statement listing the SVN user and the corresponding Hg user, e.g.
yuv = Yuval Levy <email@example.com>
The following command will produce a file listing alphabetically all users that ever committed to SVN, one per line:
svn -q log | grep ^r | cut -d'|' -f 2 | sort | uniq > svn_users.txt
I used a quick script to generate SourceForge users addresses (@users.sourceforge.net) from that file, but some manual cleanup will be inevitable (and is a good opportunity to keep the buzz going and the stakeholders interested).
While it is possible to enter any thing in the username directive of ~/.hgrc, the best practice is to put in a name and an email address. This is important to establish the legitimacy of the code committed.
Mapping out the conversion is an iterative process: set up the conversion command, kick it off, go for a walk while the computer churns through the repository. When you come back, hopefully there is an Hg repository that you can analyze to determine the next step. Usually the next step will be to refine some of the configuration files or conversion parameters. Rinse/repeat until the resulting Hg repository fulfills your expectations.
I strongly recommend that you document each single step and minuscule change. Even better: if I was to start such a process again, I’d keep a shell script to run everything from scratch to the reconstruct the current state. You will find yourself going back to the same operations again and again, sometimes days or weeks later. Memory may betray you on small details.
Convert, Again, Again, and Again.
The basic command to convert a repository is
hg convert --branchsort --config convert.svn.branches=hugin/branches --config convert.svn.tags=hugin/tags --config convert.svn.trunk=hugin/trunk --authors svn_users.txt --filemap hugin_filemap.txt hugin-mirror hugin-mercurial
The paths to the branches, tags, and trunk depend from the SVN repository’s layout and the intended outcome. You’ll tweak those many times.
When I wanted to add the 2010.0 release branch on top of the converted trunk, the command was:
hg convert --branchsort --config convert.svn.branches= --config convert.svn.tags=hugin/tags --config convert.svn.trunk=hugin/releases/2010.0 --authors svn_users.txt --filemap hugin_filemap.txt hugin-mirror hugin-mercurial
hugin_filemap.txt is used to include/exclude paths. To filter out the heavy movies, I used the following:
exclude "GSoC 2007/Presentation 1"
exclude "GSoC 2007/Presentation 2"
Examine The Results
When you first walk into the newly converted repository with cd hugin-mercurial, it feels empty. There is only one invisible .hg folder. The repository. Use hg view to have a first look at the resulting revisions tree. You need to hg checkout a revision if you want to see more. Or delve into internals. The file .hg/shamap will list all SVN revisions with path and revision number against Hg SHA1 changeset IDs. These are helpful in case you need to manipulate history, e.g. to skip on some revisions or to link a disconnected part of history such as a separately extracted branch with a parent and child changesets. For such manipulations you will use the –splicemap and –branchmap options. They point to files, like –filemap, but work differently. They are described in hg help convert and can help you fix the most broken of repositories. I was thankful I did not have to deal with this – for adding the release branches into the repository it was sufficient to simply run convert again on the same hugin-mercurial target.
As you proceed, you will find your repository to improve iteration after iteration. As soon as you have a result to show, pack it into a tarball and community contributors to download and try the repository in the tarball. Share as much information as you can, enable them to do the same as what you did. Unless you have unlimited time and resources, this is the only way to go beyond basic repository integrity checks. The tests will reveal corrupted repositories, and if the contributors will go one step further and try to build the code, they will also reveal dependencies into the build system that may require the committing of specially crafted code to support Hg instead of SVN. Keep trying and refining until you have on your hard disk an Hg repository that is ready to replace the old SVN repository. Then you’ll know you’re askready for Implementation Day.