Local Transparent Cache of a Mercurial Repository

It’s been a while since I blogged.  January was extremely busy.  I did some interesting stuff with Mercurial and I would like to share some of them, starting with this one.

It is not unusual for developers to work concurrently on different local clones, and cloning clean every time from another repository on the Internet is inefficient.  Many will have a clean local pull and work on copies of it, but this brings the headache of manually updating and keeping track of changes that have not gone upstream yet.  The headache can be done away by making the clean clone a transparent cache, and cloning it rather than copying it.

These are the commands to make it work with Hugin:

$ mkdir ~/src/hugin
$ cd ~/src/hugin
$ hg clone \
     ssh://${USER}@hugin.hg.sourceforge.net/hgroot/hugin/hugin \
$ nano hugin.cache/.hg/hgrc

Add the following lines to hugin.cache/.hg/hgrc:

# make sure we're committing on the latest version upstream
precommit.from_master = hg pull
# after committing, push to the main repo
changegroup.sf = hg push
# make sure that we answer a pull with the latest version upstream
preoutgoing.to_master = mv .hg/store/lock .hg/store/unlock
    && hg pull && mv .hg/store/unlock .hg/store/lock

All in the spirit of sync early, sync often, The first hook pulls the latest changes from upstream before committing; the second pushes the committed changes to upstream; and the third pulls from upstream before responding to downstream pulls.

Now, when you want to start working on Hugin on a clean slate you go business as usual:

$ cd ~/src/hugin
$ hg clone hugin.cache hugin.work
$ cd hugin.work
$ hg pull && hg up
... hack ... hack ...
$ hg ci
... hack ... hack ...
$ hg ci
$ hg push

And to work on a second copy, e.g. on the next release:

$ cd ~/src/hugin
$ hg clone hugin.cache hugin.release
$ cd hugin.release
$ hg pull && hg up -C 2011.0

Your transparent cache insures that you’re in sync with the rest of the team.  Happy hacking!

Poetry in the Fourth Dimension

Back from almost one week without connecting to the internet, a pleasant surprise on the Hugin-PTX mailing list.

Every WordPress user knows that code is poetry.  David Haberthür took the Hugin code from its new Mercurial repository into the fourth dimension with Gource.  Inspiring.

From Subversion to Mercurial. Part 3, Implementation Day and Beyond

If you followed the steps described in the first and second parts of this series, you should have a Mercurial (Hg) repository ready to replace your project’s Subversion (SVN) repository.  In this third and last part we’ll go over Implementation Day, with particular detail on how to implement this migration on the SourceForge infrastructure.


Can’t test enough.  Your script produces an Hg repository that looks OK on superficial investigation with tools like hg log and hg view.  But does the code build?  Hugin’s build system had a couple of dependencies on SVN and they needed to be updated.  Thomas Modes and Kornel Benko stepped up to the task.  Can developers and builders use this repository?


On Implementation Day the project will transition from SVN to Hg.  While all relevant contributors are proficient in SVN, Hg was new territory for many.  While progressing on the conversion I kept the community informed and took every opportunity to encourage learning of the new tools, including public tutoring that continues after the transition.  You want to encourage people to share their experiences and learn from each other.  Conceptually the biggest difference between SVN and Hg is that with Hg the repository sits on your local client.  A check in to SVN is the equivalent of a check in and a push to Hg.  Offline operation is not possible with SVN but it is with Hg.  However both are revision control system (RCS) and very similar to use.

Implementation Day Overview

Warn everybody one last time.  Create a new repository on SourceForge for each migrating code line.  Lock down SVN by revoking write access to everybody but a few maintainers who will clean up after the transition.  Run once again the whole migration on a green field, from scratch, to be sure that everything to the very last SVN commit is included.  Test one last time this new local repository (compare it to previous results); create the new repository on SourceForge and push your local repository to it. Last but not least, configure the repository on SourceForge and announce the transition to the world.  Sounds easy.  The devil is in the detail.

Mercurial on SourceForge

SourceForge has been very generous with the projects it is hosting: we can have unlimited Hg repositories.  Unfortunately there are rough edges.

To activate Mercurial for your project:

  • Login via the web as a project administrator and go to the “Develop” page for your project.
  • Select the Project Admin menu, and click on “Feature Settings”.
  • Select “Available Features”.
  • Select the checkbox to the left of the “Mercurial” heading. Your repository will be instantly enabled.

This first repository will be fine but if you want to activate more than one repository you will have to manually set them to be group writable.  To activate additional repositories:

  • Log on to SourceForge’s shell service (assuming you have set up your SSH key) with `ssh -t USER,PROJECTUNIXNAME@shell.sourceforge.net create`
  • Navigate to your project’s Mercurial space with `cd /home/scm_hg/P/PR/PROJECTUNIXNAME`, e.g. for Hugin this would be `cd /home/scm_hg/h/hu/hugin`
  • Create a new directory with the name you want for the repository.  E.g. for Hugin’s website this was `mkdir hugin-web`
  • Execute `hg init DIRNAME` (where DIRNAME is the directory you just created, e.g. `hg init hugin-web`). This will initialize the new repository.
  • Inside the new repository, edit the configuration file .hg/hgrc (see configuration section below)
  • SourceForge rough edge: group write access must be given manually `chmod -R g+w /home/scm_hg/P/PR/PROJECTUNIXNAME/DIRNAME`

Configuration of the Mercurial Repository on SourceForge

SVN support on SourceForge is mature and projects are used to amenities such as email commit notifications.  Hg support is better than what the scant documentation suggests.  Most standard functionality, including email notification, works, even if it is officially unsupported.  One only has to find out how to configure it.  I played around with some trial and error already when optimizing the Enblend repository last year.  This is the hgrc file template that works for us:

changegroup.notify = python:hgext.notify.hook

from = NOTIFICATION_ADDRESS@lists.sourceforge.net

host = localhost

baseurl = http://PROJECT.hg.sourceforge.net/hgweb/PROJECT/DIRNAME

sources = serve push pull bundle
test = False
config =
template = \ndetails:   {baseurl}{webroot}/rev/{node|short}\nchangeset: {rev}:{node|short}\nuser:      {author}\ndate:       {date|date}\ndescription:\n{desc}\n
maxdiff = -1

NOTIFICATION_ADDRESS@lists.sourceforge.net = **[trusted]

users = *

You’ll have to replace your own project unix name PROJECT; your own Hg repository top directory DIRNAME; and your own NOTIFICATION_ADDRESS mailing list.  The configuration options are documented.

Committer Write Access

With a dRCS like Mercurial write access has a completely different meaning.  Everybody can `hg clone` an existing repository and once cloned has full write access and can publish their own repository.  The d in dRCS stands for distributed.  Technically there are no more hierarchies and no more central control.  All clones are equal.  Whoever owns a clone can decide to publish it on the web, e.g. with `hg serve`, and give write access to whomever they want.  Granting SourceForge write access only means that the committer can push to the repository hosted on SourceForge.  What makes a repository authoritative is user’s trust, and this is given implicitly by pulling from it.

SourceForge Rough Edges, Again

I wish there was a way to group-manage access rights on SourceForge.  I have not found it.  I needed to revoke SVN access to most developers, and grant them Hg access.  I had to click through each and every contributor registered with the project and single handedly managed their access rights.  To make things worse the pseudo-ajax web interface of SourceForge is everything but asynchronous: it reload the page after each change.  Ajax-cosmetics with underlying old technology from the last century.

One point projects on SourceForge will need to pay attention to are default access rights.  I did not find a place to change those, so every new project member gets by default SVN access right, unless you explicitly remove them.  It seems to me that the defaults on SourceForge are based on the principle of random uncoordinated historical growth.  Have they ever heard of the generally accepted principle of least privilege?  And the default file access for newly created extra Hg repositories is less than reasonable least privilege (see above).

<rant>And don’t tell me about SourceForge’s IdeaTorrent and ways to request and enhancement.  In my experience it does not work and some things on that site have been broken for years when the fix is simple, easy, and does not take much time.  Have you tried to use a SourceForge mailing list archive?</rant>


Now that everything is set, you can simply `hg push` from your local repository to the SourceForge one.  Or if you’re really confident, you can rsync the .hg directory (but don’t forget to edit the .hg/hgrc configuration file on the SourceForge end).

CMake Build System

Our CMake build system depended on SVN and after the push it was broken.  Kornel Benko and Thomas Modes fixed it.  Bruno Postle added a break in the CMake build system in the SVN repository, to warn users of that repository that newer versions are in Hg.  Harry van der Wolf updated the OSX build system.


The disruption was short.  A few hours after going live, developers started committing again, using Hg.  Builders started building and distributing again, using Hg.  The Google Summer of Code students cloned away their own copies of the source code and started working on the next major developments for Hugin.  After taking on the most complex of the code lines in the SVN first,  I migrated the remaining ones over a few hours Sunday night.  Hugin and most of its related projects live now happily in Hg and can easily be converted to other formats, including Bazaar, git, and even SVN.  Initially I thought to mirror the default code branch from Hg to SVN, but our project does not really need that.  Subversion has been made completely redundant by a newer, better, superior tool. Mercurial and its likes would not exist without Subversion, and should be seen as a continuum in the lineage rather than a break from the past.  With Mercurial, Hugin is freer than ever, and you are free to take it further on a journey to the future.  For now Hugin still lives on SourceForge, where the next critical bit of infrastructure is the bug tracker.  But with Mercurial the dependency on SourceForge; and the dependency on any single central service or person; has been further reduced.  Long and Free live Hugin.

From Subversion to Mercurial. Part 2, Mapping the Road

In the first part we started a community buy-in process to support the migration and we set out the technical stage. In this part we’ll map out the road for moving the code from Subversion (SVN) to Mercurial (Hg).

Repository Layout

Source and Target layout are most likely different from one another.  You need to test if the selected conversion tool supports the source layout.  Most tools handle standard/canonical layouts, but few repositories follow such layouts strictly and consistently over time.

The Hugin SVN repository was itself the result of a migration from an even older tool, CVS.  The subdivisions of the Hugin codeline did not follow the canonical trunk/branches/tags subdivision to the letter: We had good reason to distinguish three kind of branches: development branches, obsolete_branches, releases. Moreover the repository contained seven unrelated code lines because of the SourceForge limitation to one SVN repository per project.  The sensible choice was to separate each of the seven code lines into its own Hg repository.  In Hg, branches and tags are not part of the layout and they only need to be addressed in terms of history conversion.

History Clean Up

The next big question is how far back do you need to go?  And to what level of detail?  We decided to keep the SVN repository publicly accessible to document history.  This freed us from the need  for a detailed reconstruction of the past.

You will have a wide range of choices from painstakingly reconstructing every single past changeset to pragmatically start from scratch with a current code snapshot.  The trade-off is between effort, storage requirements, and benefits to the project.  I decided to go as far back and into as much detail as the automated tools enable me with little effort; and to step beyond that only in case the benefits outweigh the extra effort.

This meant giving up on the history of past development branches.  The nature of SVN merge operations implicitly omits carrying the history of the development branch into trunk. To fully reconstruct history one must extract the development branch and transplant it into the Hg default code line.  Maybe feasible but time consuming.

Save that time.  You will need it to comb a few knots you’ll find hidden inside SVN history.  The result of less than optimal manipulations, these knots are quickly fixed in subsequent SVN revisions so that they do not affect day to day operation.  They get forgotten until somebody has to dig up history.  We had two such knots in Hugin:

Movie files that do not belong in the repository landed there by mistake.  A few revisions later they were removed and stopped affecting daily checkout operations.  But they’re still there, represent more than  75% of the weight of the Hugin SVN repository, and will affect the Hugin Hg repository if left untreated.

We also had an unorthodox switch of a branch to replace trunk completely.  It worked well while using SVN but automated conversion tools trip over unconventional layout operations.  Luckily this only left a small cosmetic scar with the tool retained.  I decided not to spend time on cosmetic aspects and left the scar untouched.


The advent of distributed RCS spurred development of a panoply of tools to efficiently move around bits of code.  It was difficult to discern upfront which tool would work for my specific scenario.  I’ve tried a few of them and  the one that worked best for me was Mercurial’s own convert extension. Another tool that was helpful in the process was Mercurial’s hgk extension.

Edit the following lines int your ~/.hgrc file (create it if it does not exist) to activate these extensions.  You will also need the directives in the [ui] section:

convert =
hgext.hgk =
username = YOU <your@email.add>
verbose = True

Mapping Users

Changesets are committed by users.  The definition of a user in Hg differs from SVN.  We need to map SVN users to Hg users.  The syntax of the file is one user per line with a statement listing the SVN user and the corresponding Hg user, e.g.
yuv = Yuval Levy <yuv@example.com>

The following command will produce a file listing alphabetically all users that ever committed to SVN, one per line:
svn -q log | grep ^r | cut -d'|' -f 2 | sort | uniq > svn_users.txt

I used a quick script to generate SourceForge users addresses (@users.sourceforge.net) from that file, but some manual cleanup will be inevitable (and is a good opportunity to keep the buzz going and the stakeholders interested).

While it is possible to enter any thing in the username directive of ~/.hgrc, the best practice is to put in a name and an email address.  This is important to establish the legitimacy of the code committed.

Conversion Process

Mapping out the conversion is an iterative process:  set up the conversion command, kick it off, go for a walk while the computer churns through the repository.  When you come back, hopefully there is an Hg repository that you can analyze to determine the next step.  Usually the next step will be to refine some of the configuration files or conversion parameters.  Rinse/repeat until the resulting Hg repository fulfills your expectations.

I strongly recommend that you document each single step and minuscule change.  Even better: if I was to start such a process again, I’d keep a shell script to run everything from scratch to the reconstruct the current state.  You will find yourself going back to the same operations again and again, sometimes days or weeks later. Memory may betray you on small details.

Convert, Again, Again, and Again.

The basic command to convert a repository is
hg convert --branchsort --config convert.svn.branches=hugin/branches --config convert.svn.tags=hugin/tags --config convert.svn.trunk=hugin/trunk --authors svn_users.txt --filemap hugin_filemap.txt hugin-mirror hugin-mercurial

The paths to the branches, tags, and trunk depend from the SVN repository’s layout and the intended outcome. You’ll tweak those many times.

When I wanted to add the 2010.0 release branch on top of the converted trunk, the command was:

hg convert --branchsort --config convert.svn.branches= --config convert.svn.tags=hugin/tags --config convert.svn.trunk=hugin/releases/2010.0 --authors svn_users.txt --filemap hugin_filemap.txt hugin-mirror hugin-mercurial

hugin_filemap.txt is used to include/exclude paths.  To filter out the heavy movies, I used the following:

exclude "GSoC 2007/Presentation 1"
exclude "GSoC 2007/Presentation 2"

Examine The Results

When you first walk into the newly converted repository with cd hugin-mercurial, it feels empty.  There is only one invisible .hg folder.  The repository.  Use hg view to have a first look at the resulting revisions tree. You need to hg checkout a revision if you want to see more. Or delve into internals. The file .hg/shamap will list all SVN revisions with path and revision number against Hg SHA1 changeset IDs.  These are helpful in case you need to manipulate history, e.g. to skip on some revisions or to link a disconnected part of history such as a separately extracted branch with a parent and child changesets.  For such manipulations you will use the –splicemap and –branchmap options.  They point to  files, like –filemap, but work differently.  They are described in hg help convert and can help you fix the most broken of repositories.  I was thankful I did not have to deal with this – for adding the release branches into the repository it was sufficient to simply run convert again on the same hugin-mercurial target.


As you proceed, you will find your repository to improve iteration after iteration.  As soon as you have a result to show, pack it into a tarball and community contributors to download and try the repository in the tarball.  Share as much information as you can, enable them to do the same as what you did.  Unless you have unlimited time and resources, this is the only way to go beyond basic repository integrity checks.  The tests will reveal corrupted repositories, and if the contributors will go one step further and try to build the code, they will also reveal dependencies into the build system that may require the committing of specially crafted code to support Hg instead of SVN.  Keep trying and refining until you have on your hard disk an Hg repository that is ready to replace the old SVN repository. Then you’ll know you’re askready for Implementation Day.

Moved 2: From Subversion to Mercurial. Part 1, Setting the Stage

It’s less than four weeks since I drove that 26′ U-Haul truck full of stuff and I’ve had enough of moving for a while.  So why move again?  This is a different kind of move: a move to more efficient infrastructure.  To a decentralized source code repository.  Thank you Subversion, you’ve served us well over the past years.  Welcome Mercurial, a distributed revision control system (RCS) of the next generation.  In this series of three articles I describe how I moved Hugin from Subversion (SVN) to Mercurial (Hg).  In the first part I’ll describe how to kick off the process in the community and set the technical stage on your machine.  The second part deals with the technical code conversion.  The third part with the conversion aftermath and the actual switch.  Once the road is mapped out, the process is a relatively straight forward one.  I made some mistakes while mapping the road and I hope that if you find yourself in the same situation, these articles will help you prevent such mistakes.

Why Mercurial and why Now?

It could have been git, or Bazaar.  They are all equally good.  But I found Mercurial to be the one with the more mature client support, particularly GUI clients on disparate operating systems; and it is well supported at SourceForge where Hugin is currently hosted.  Our project needs to accommodate contributors using Linux, Windows, OSX, BSD and we do not want to leave anybody behind.  To get all stakeholders buy into the process I started a public discussion.

Spring was the right time for repository cleaning.  With a tight integration schedule the team merged most outstanding development branches into the main code line.  Migrating before branching out again for a new set of Google Summer of Code projects will avoid extra complexity.

For more than two years Hugin has been humming along on an asynchronous development and release process that has helped increase the capacity of the project to absorb changes.  Despite a diligent, disciplined and careful team we seem to have hit a scalability ceiling.  It may be lack of resources (except for the Google Summer of Code students during their three months on Google’s payroll, we’re all here in our spare-time) but I suspect that it is also the infrastructure and I expect Mercurial will further increase the capacity of the project to absorb changes.


One of the first questions to arise from the community was the scope of the change.  If already changing RCS, how about reviewing all infrastructure?  Hugin has been at SourceForge since inception.  A lot has happened in the project hosting arena since.  Sites like GoogleCode, Launchpad, BerliOS, GIThub offer a panoply of services – RCS; bug tracking; mailing lists; web and download hosting.  Often different implementations of the same Open Source tools.  Mostly “free” (as in beer, but beware of the alcohol)  for Open Source projects like Hugin.

The RCS, while central to the project, is just part of a project’s infrastructure.  Migrating the whole infrastructure is beyond the scope of this project.  And beyond the available resources too.   Just moving the nearly 200 open bug reports (many of which are stale or duplicate – the bug tracker needs a good spring cleaning too) to a new bug tracking system can keep a spare-timer busy for months.  SourceForge may not be the most fashionable choice, but it works for us.

Server and Client

On the one side is an existing SVN repository on the SourceForge server.  On the other side is a new Hg repository on the SourceForge server.  How do I move the code from one side to the other?  The first mistake I made was to work on the SourceForge server itself.  This slowed me down and ate their precious bandwidth.  I should have known better: SVN runs on the server sitting in my office closet.  Even that was too much overhead.  The most efficient way to go about the task is to mirror the SVN repository to a local client and work from there.

These are the steps for a K/X/Ubuntu distribution:

sudo apt-get install mercurial subversion python-subversion
svnadmin create hugin-mirror
cd hugin-mirror
echo '#!/bin/sh' > hooks/pre-revprop-change
echo 'exit 0' >> hooks/pre-revprop-change
chmod +x hooks/pre-revprop-change
export FROMREPO=https://hugin.svn.sourceforge.net/svnroot/hugin/
export TOREPO=file://`pwd`
svnsync init ${TOREPO} ${FROMREPO}
svnsync --non-interactive sync ${TOREPO}

The initial sync can take hours or more.  This is a good time to take a break.  If the sync is aborted, you may need to reset the lock state and restart the conversion:

svn propdelete svn:sync-lock --revprop -r 0  ${TOREPO}
svnsync --non-interactive sync ${TOREPO}

It’s a good idea to repeat the above two commands in a cron job or a startup job to keep in sync with the repository over time.

Your local machine is set for the job.  Keep the discussion in your community going, to get all relevant stakeholders to buy into the process. On the next installment we’ll look at how to map the road.

Hugin 2010.0 Released

Last week Bruno released Hugin 2010.0. It was almost immediately followed by binaries for MacOSX (Harry), Debian/Ubuntu (Andreas), Fedora (Terry). The step from release candidate to release had been longer than usual because of the lack of feedback from Windows builders/users whose interest seems to have shifted to the development version, which has already a few new cool features such as a masking tool.

Trunk still has a few rough edges, but overall is ready for the next cycle. Consensus is forming on moving the codebase to Mercurial and potential new Google Summer of Code projects are being discussed on the mailing list.