Monday, April 18, 2011

darcs hacking sprint 6 report

The sixth Darcs Hacking Sprint took place on 1-3 April in Paris. As usual we had a lot of fun getting together, thinking and talking about Darcs for 3 days together, meeting new developers and seeing old friends.

Preparing Darcs 2.8

Darcs 2.8 is fast approaching!

The main feature we're pushing for is a new "packs" optimisation, which rolls a repo’s pristine and patch files into single files, making darcs get over HTTP significantly faster. The packs optimisation work was done by Alexey Levan, one of our 2010 Gooogle Summer of Code students (and now our Windows packager). Guillaume spent some of the sprint working on a lot of the finishing touches needed to get packs into our users' hands: enabling them by default, writing new tests to ensure that Darcs attempts to use them, and reducing their size. Guillaume and Jérémie (who joined us as he was a local Darcs user and just wanted to give back, merci!) also worked together to get a couple of benchmarks:

repositorypatchesbeforeafter
Jérémie's repository~900 patches10s1s
darcs screened (full)~9300 patches37m2m
darcs screened (lazy)~9300 patches27s7s

The timings and a couple of other numbers are now available on the darcs wiki optimize --http page

Future of Darcs

If Darcs didn't exist, it would be worth writing

Ganesh gave an excellent talk on the Future of Darcs, shaking us out of some recently infectious gloom (it was even getting to me!). He gave us a much needed reminder what we're fighting for, how to keep it going, and what he's been contributing to the fight.
  1. Darcs is important because it's fundamentally different. Our patch-based view is still unique and brings a lot of novel thinking to the version control table. We need to work on Darcs because nobody else is doing something like it. If Darcs itself didn't exist, we'd want to create it.
  2. Patch-based version control is the secret ingredient to Darcs' very easy user interface. The message we hear most often from Darcs fans is not about theory, but about how simple and easy it is to use Darcs. Our users don't care about patch theory per se, but it's because we work from patch theory that our user interface can be both gentle and powerful.
  3. Darcs provides a path to the future of version control. Ever wished your version control system thought in terms of abstract syntax trees instead of lines of text? We can't do that yet, but because we understand patches in a way nobody else does, we have a good idea for how to get there.
We use Darcs because we love the UI; we hack Darcs because we love the theory. But then what?

First, we still need to catch up; we're making a lot of progress overcoming performance issues, and usability issues such as our rather appalling conflict marking. Second, we need to solve the interoperability problem. Being a research-oriented version control system with a lot of practical catch-up to do puts us in the minority. We need to make it cheaper and cheaper for people who love Darcs to keep using it, when all their friends are using something else; and we need to make it risk free for somebody to give it a spin and see how they feel about it. Finally, we need to start moving towards that future, start tackling some of the really cool features we have in mind for Darcs 3.

So what has Ganesh been up to? Lots! When not taking care of a new baby, he's been implementing new features like rebase, making conflicts marking easier to use (with patch names!) and exploring a potential "graphictors" conflict representation (because every Darcs hacker needs a conflict representation of his own). More below!

Rebase Design

Thomas, Eric and Ganesh
Darcs rebase is a new feature which may be available in Darcs 2.8 on an experimental basis.

One of the great things about Darcs is that we generally do not need rebase; the combination of a patch theory and an friendly interactive UI makes many complicated rebase/cherry-pick use cases effortless for Darcs users. Theory and UI gets very far but sometimes it's not enough. Having a rebase operation would make it possible to do things like smoothing away unwanted conflicts, amending depended-upon patches including mistakes like that 1 GiB file you added one year ago, and consolidating multiple patches into one.


Ganesh gave us a tour of the changes and cleanups to Darcs code needed to make Darcs rebase work. He also brought up a tricky implementation issue about how the rebase "suspended" patches will interact with darcs amend-record, which we pored over together whiteboard markers in hand (this is why we need sprints!). Owen captured some of this discussion so we're hoping to have some nice diagrams explaining the issue in more detail.

Bridge - from Darcs to Git and everything else


Git is a great choice of version control system for many users and projects. Great hosting sites like GitHub, code review tools like Gerrit and graphical interfaces like GitTower add a lot of value to the Git universe and add testimony to the self-reinforcing power of network effects. So what kind of role does Darcs play in an increasingly Git-dominated world? We debated the issue for a while and eventually agreed on a single goal which is to work towards best-effort interoperability between Darcs and other version control systems, but being cautious to avoid tying ourselves tightly to any particular system.

Owen on Day 1
What we envisioned was an advanced bridge based on darcs-fastconvert that would allow us to maintain an incremental bidirectional mapping between repositories in Darcs and another version control system. We want it to be easy, for example to contribute to a GitHub hosted project as a Darcs user, or conversely to maintain a Darcs project, but make it easy for Git-using contributors to submit patches and track your upstream work. One exciting feature of the bridge is that it would allow us to smooth the progression from Darcs 1 to Darcs 2 repositories and eventually to Darcs 3.

Owen and Ganesh worked on identifying the technical challenges behind creating such a bridge and fleshing it out into a Google Summer of Code proposal:
  1. Creating a mapping of multi-head repos to Darcs repositories.
  2. Import/Export of foreign patch formats.
  3. Efficiently mapping between patch-based and snapshot-based models.
  4. Robust translation between Darcs versions.
  5. Mapping from “Darcs only” patch-types e.g. replace.
Check out Owen's proposal for more details. Looks like we may have a fun summer ahead of us!

Review backlog

Getting together with a video projector was a good way to catch up on some patch review. Over the three days, Ganesh and Guillaume cleaned out the patch tracker, reviewing patches (6 accepted, 2 rejected, 1 follow-up requested). Thanks, guys!

Development workflow


So if "Adam" submits a patch to the tracker... 
Speaking of the patch tracker, there's been lots of experimentation lately, namely, a "screened" branch and a more relaxed review policy. The overall aim is to reduce administrative overhead: the screened branch reduces the need for developers to maintain long term branches, and the relaxed review policy allows us to keep patches flowing, so that minor patches that don’t actually need any review are not jammed up by bureaucratic process. But change is messy! We now have users who aren't entirely sure how to send patches to Darcs or when they can or should amend their patches.

We spent some time thinking about how to achieve our goal of reforming the review process to accommodate the needs of patch submitters and reviewers, while keeping the overall process simple and lightweight. Our conclusion was to:
  1. preserve the screened/reviewed branch distinction and start explicitly calling the reviewed branch that
  2. simplify the general flow of patches to a linear screened, reviewed, release sequence (except the usual release-specific backports)
  3. eventually point http://darcs.net to the screened branch
  4. shift definitively from an amend-oriented culture to a follow-up oriented one. Once a patch has been accepted to screened, it can no longer be amended
  5. simplify release process by removing the notion of a soft freeze
Eric, Ganesh and Guillaume also spent some time on the developer documentation. We clarified which repository to send patches to (screened), when you can amend patches (only until they've been screened), and how to propose/add a new feature to Darcs (carefully). We also discussed briefly discussed the high level developer documentation concluding that we want to move towards it reading more like a book.

Healing paper cuts

Owen and Iago worked on improving Darcs' user interface in some corner cases. Owen studied the infamous "thisrepo" problem. The thisrepo cache entry is a piece of local-filesystem tracking information that causes more recent versions of Darcs to generate an annoying warning when a repository is moved to new location. Having established that it was safe to do so, he got Darcs to stop producing the entry and to ignore it when it is present. This keeps the warning system useful and relevant, by removing false alarms.

Guillaume and Iago
Iago started out by fixing a deceptively easy bug where darcs amend-record would let you add primitive patches to a tag patch. Darcs tags are implemented as empty named patches that depend on other patches; in theory tags could also contain patches of their own, but this use case is not supported in the implementation. Removing the ability to sneak these patches in avoids potential bugs and usability issues that may arise from it. Iago's work on this bug opened up into a host of related patch name parsing corner cases. Iago spent some time analysing the different possible ways to specify the name of a patch, how they behave with respect to empty names and names starting with 'TAG'. This took quite a bit reading the Darcs.Commands.Record code to figure out how to could effectively fix these problems with minor effort.

An interesting recurring theme came up talking about these issues with Owen, Iago and Ganesh, which is that changing Darcs behaviour can sometimes be a delicate affair because we have to take into account (a) repositories produced by older versions of Darcs and (b) how older versions of Darcs will react to repositories produced by more recent ones. It's the sort of issue that makes Darcs a great place for anybody who wants to confront "real world" software issues, where getting things right includes, and goes beyond, simply nailing down the theory.

Garbage collecting pristine

The darcs hashed repository format (one idea we've stolen from Git!) makes for much better robustness, allows for performance improvements such as the global cache and lazy fetching, and opens the door to future work on verifiability (short secure version identifiers). After a lot of performance work on hashed repositories, we've reached a point where we're ready to deprecate the old-fashioned format and get everybody updated to hashed repositories.

Florent formulates a plan
But there's a lingering performance issue in the back of our minds. To avoid race conditions we have disable automatic garbage collection of ununsed hashed repository entries. Repository owners would have to remember to run an occasional darcs optimize to delete these files. Is there a better way?

Guillaume led a discussion which led to a suggestion by Florent: keep track of pristine root hash timestamps and delete files older than 24 hours. Getting a repository should not take that long, so it should be safe to delete older files!

Darcs Testing and Code Quality

Iago gave a couple of nice presentations on work he has been doing for his MSc work, in the context of an MFES circular unit on formal methods: verifying darcs patch theory properties with the Alloy model checker, and assessing/improving the maintainability of Darcs code. He showed us some rather interesting examples of things that Darcs code does which makes testing hard (really long functions seems to be killer) and discussed improvements he made to our QuickCheck generators. It was great to see him break down our randomly generated tests into categories with varying degrees of meaningfulness and how a few simple tweaks to our generators could make the Darcs tests a lot more useful and informative. Iago will post the slides and his sample Alloy code when he has obtained his qualifications.

Darcs needs to work harder on code quality and testing, but what are you going to do with a handful of hobby-hacking hackers doing the best with their free time? Iago suggests candidates for the two lowest hanging fruit to pick: (a) cutting down our function sizes (check out urlThread in our URL module!) and (b) developing a sort of standard test suite template to be used for each Darcs module. He later spent a bit of time in the airport fleshing this out with the Darcs.Util module as an example.

Experience reports

We had two new participants at the sprint, Owen and Iago.  Let's hear it from them. Owen said:
The sprint was a great introduction to the Darcs team, having face-to-face discussions (and being able to quickly answer my beginner-questions) really helped bring me up to speed with not only the code-base but also the underlying concepts and ideas. My primary motivation for coming was to get to know the team and code-base, so that I could make an effective GSoC project proposal, and the sprint certainly helped in that respect. 
Since I had only had limited experience with the code-base prior to the sprint, I didn't actually do much coding (I was primarily focusing on my GSoC ideas), however, at future sprints, I'll definitely be able to code more, having "learnt my way around".
I came away feeling very positive about Darcs, with a strong feeling of wanting to contribute, to bring Darcs to the top of the VCS world. It is obvious to me that people do care about Darcs! 
I think attending a Darcs sprint is a great chance to have a fun weekend coding and talking about a project that is very much alive, with a few beers and a kebab afterwards! :)
And Iago:
It was a very nice experience. Personally, I though the Sprint will consist on three days of almost full-time work from 9am to 6pm; now I see that the hard work is done during the time between Sprints, whilst Sprints are just time for discussing some project-management stuff, interesting work in progress, ideas, suggestions, etc. Although I have started contributing with Darcs few months before the Sprint, I think that it is a very good opportunity to meet Darcs people and start to contribute with Darcs.
Thanks guys! It was great having you. I hope you can join us for future sprints.

Presentations

  • Ganesh on Future of Darcs
  • Iago on Model Checking Darcs
  • Iago on Darcs Code Quality

Participants 

It's on timer! (Iago, Guillaume, Owen, Ganesh and Eric)
  • Eric Kow
  • Florent Becker
  • Ganesh Sittampalam
  • Guillaume Hoffmann
  • Iago Abal
  • Owen Stephens
Short visits
  • Jérémie
  • Thomas Refis
  • Nicolas Pouillard

Thank-you!

We had a great time in Paris. Many thanks to the Initiative de Recherche et Innovation sur le Logiciel Libre (IRILL) for making such nice facilities available to us with so little fuss!

Thanks also to our donors for making it sprints as accessible to a wide public, and to to the Software Freedom Conservancy for taking on the nitty gritty administrative detail that makes it possible for us to focus on the core mission of making Darcs rock.

Merci à tous!

3 comments:

winterkoninkje said...

When it comes to Darcs gloom, my number one complaint (and the only one ATM) is the lack of a vibrant equivalent to BitBucket, GoogleCode, GitHub, etc. Ironically, this has little to to with the DVCS itself. In part it's a community problem, but actually the biggest feature I'd like to see in Darcsden, PatchTag, or similar, is an integrated bugtracker that's easy to use (e.g., like GoogleCode, not like Trac). That has little to do with Darcs, but it is an extremely compelling benefit that the Darcs ecosystem doesn't currently offer (that I've seen).

Ronan said...

Thank you all for your work (especially Eric: you are an example of leadership I try to follow -- although I can only see your work from the outside). Darcs is awesome. It is very easy to use! I like it because with it I can focus on coding, not on reading VCS manuals, and it does just what I want most of the time.

I have seen Darcs making great progresses, and I hope it continues for a long time. If I ever start Haskell, that will be your fault :).

kowey said...

@winterkoninjke indeed, the lack of really easy and featureful hosting has been a problem for Darcs. I hope PT and/or darcsden find a solution to the bug tracking problem soon. I think Alex was working on a simple bug tracker last time I heard

@Ronan, thanks! I doubt I'm any sort of example to follow, but I'm happy to see that Darcs is a good fit for you. We'll see about the Haskell; if you do get hooked, you'll know the perfect place to hack :-)

Followers