Thursday, October 28, 2010

darcs hacking sprint 5 report

The fifth Darcs hacking sprint took place in Orleans over the weekend of 15-17 October.

We seem to be starting a tradition of sprints coinciding with social movements. Last year, the sprint venue in Vienna was squatted by students protesting university fee reforms. This time we were caught in the French pension reform strikes, which knocked out one of our would-be participants and made another lose a day.

The sprint was small but productive. We had four people attending, Florent Becker (also local organizer), Guillaume Hoffmann, Eric Kow and Reinier Lamers.

Talks and discussions

Scribble Scribble Think Think!

Maintaining Darcs

The day before the sprint, Eric gave a talk to undergraduate and masters students on Free and Open Source Software Projects, in particular the principles that we try to apply within the Darcs team.

Therapy session

We kicked off the sprint with a discussion of some of the challenges that we have been facing in the Darcs community and how we hope to rise to them over the long term.

  1. Code quality - darcs code is a gem buried in big pile of muck. We've been making progress tidying the mess and moving towards a clean, well thought out library... but we still have a long way to go.

  2. Portability - darcs relies on GHC, which takes a long time to build and which simply does not support certain niche platforms.

  3. Usability - for all its power, Darcs has a reputation among its fans for being exceptionally easy to use. While we can be proud of our friendly UI and simple mental model, we need to also recognise the parts of Darcs that make life difficult. Here are three areas we should explore:

    • Patch annotations would allow Darcs to start tracking a repository history while still allowing for patch reordering. People should be able to ask questions like "who signed off on this patch?", and "when was in pulled into the current repository?"

    • Short version identifiers would make it easier for users to communicate with each other - "Bob, could you please fetch version 83dc9fa3?"

    • Better conflict marking and summaries would help maintainers to merge sets of patches from long term branches.

  4. Network effects - Darcs is most useful when many other people are also using Darcs or something compatible.

    • Unlike Git/Hg/Bzr, Darcs lacks bridges with the other DVCSes. The other three can more or less talk to each other because they have similar models. Darcs is a bit different, so making good bridges can be tricky.

    • The Darcs community lacks services that facilitate collaboration to the same extent that Github does. We need patch-tag, darcsden and friends to get better still!

    • One piece of low-hanging fruit to pluck is the ability to host Darcs repositories on servers that lack a Darcs binary. How do we push patches over SFTP without the luxury of a remote Darcs binary?

File formats

We continued the discussion from the darcs-users mailing list on machine-readable formats for the Darcs data and command output. This discussion was tricky because it involves many different parts of Darcs and involves juggling some conflicting goals:

  • Standards compliance: preference for well-documented and understood standards.
  • Familiarity: attention to de-facto standards used in the revision control community.
  • Conceptual integrity: all Darcs formats should work the same way
  • Extensibility: ability accomodate future requirements
  • Easy parsing: people should be able to whip up quick little scripts to slice and dice Darcs output
  • Agnosticism: need a good story for arbitrary bytes because Darcs is fairly agnostic to the content of text files
  • Transparency: no complicated escaping mechanisms as these tend to fall in rare corner cases and would be easy to get wrong.
  • Conservatism: the less we change Darcs, the better

Reinier had the idea of bringing this discussion to a whiteboard -- this is why you need hackathons! -- which allowed us to take a more global view of the problem. After much discussion, we reached a conclusion was that we should converge on 4 formats:

  1. Unified diff format [SAME], (familiarity, conservatism) there's no reason to move away from diff/patch style output for low-level diffs.

  2. High level patch format [SAME] (conceptual integrity, agnosticism, conservatism) - this is a high-level representation of patches which is unique to Darcs. For instance, it can describe file renames and word replacement. We plan to continue using this format whenever we need to represent high-level patch contents.

  3. Line-separated annotate format [NEW] (easy parsing, agnosticism, transparency) - We will deprecate the annotate --xml format, and shift to a line-based one. If there are community standards that exist we'll try to use them as much as possible.

  4. Hashed context file format [NEW] (agnosticism, transparency, conceptual integrity) - we will deprecate the changes --xml format and converge to an extended version of the context file format. New features:

    • file contents hashes (issue1550)
    • format version information
    • is-context-file flag (need deps to be safe to use)

Note that where forced to choose, we have essentially sacrificed the otherwise worthy goals of standards compliance and extensibility.

Hacking

Reinier and Guillaume hacking away

Darcs 2.5

Darcs 2.5 is almost here! The release was delayed for quality control reasons, but after many betas and bug fixes, we think we're ready to ship. Reinier put the finishing touches on our first release candidate.

Infrastructure

Eric made a handful of improvements to the issue tracking infrastructure, improving integration with our darcs repository and darcswatch.

User interface

Eric and Reinier polished off some user interface work:

  • Removed a confirmation prompt asking you if you really want to record your patch when you choose to edit long comment but make no changes. (undo beats confirmation).
  • Testing UI regression fix by Adolfo: Darcs was overzealous in warning of about unreachable cache entries.
  • Improved checking of commands that work on file paths

Pristine cache handling

Guillaume documented much of Darcs pristine cache handling, fixing a darcs repair bug along the way. He

  1. Studied the problem of garbage collecting the Darcs pristine cache http://wiki.darcs.net/Using/GrowingPristineProblem
  2. Fixed darcs repair when pristine cache is missing.
  3. Designed some improvements to darcs get handling of missing pristine cache items. http://bugs.darcs.net/issue1976

No working directory: towards passive repositories

We want to make it as easy for people to use and host Darcs repositories. In particular, we think it would be great if you could host a Darcs repository on any server, without caring if Darcs is installed there or not. While it is already possible to fetch Darcs repositories from such server, what we now need is the ability to push to such repositories without a remote copy of Darcs.

Working in this direction, Florent implemented a long-requested feature for repositories without a working directory. This is useful for repositories which are only meant to be used for pushing/pulling, where the notion of a working directory is superfluous and makes some Darcs operations harder to implement.

Faster annotate (we'll get there!)

Unfortunately, Benedikt could not join us for the sprint as travel from Zurich to Orleans was disrupted by strikes. Luckily, he was still able to participate over IRC. He ported over his work on the "patch index" optimisation to the latest version of the Darcs code in progress (that's a 6 month leap!) and will continue by exposing the patch index to Darcs commands.

Experience report

Guillaume (right) with a question for Reinier

Our Darcs Weekly News editor attended his first sprint 6 months ago in Zurich, starting work on some ProbablyEasy bugs. It was great to see him again and very encouraging to see how much deeper he was getting into Darcs internals. Let's hear it from Guillaume:

I arrived at the sprint with this bug report in mind, written by a NetBSD user who could not build Darcs on his system. I wondered how easy could it be to write a minimal Darcs client that could only fetch a working copy from a Darcs repository, in a programming language more common than Haskell (Python comes to mind).

Thus began my discovery of the hashed repository format. The most susprising thing I discovered was the lack of documentation: currently someone who wants to write a Darcs client can only count on the existing source code. So I started to document what I understood by asking the other sprinters and looking at the code.

I also documented the Growing Pristine Problem as it was cited as being a low point of the hashed repository format with regards to the old-fashioned format. After understanding why this phenomenon happens, I believe that this is an unavoidable issue when one wants to avoid breakage during simultaneous pushing and getting the same repository. Also, it becomes a problem only in really big repositories.

However, some parts of Darcs could be improved. Darcs could do a better work to handle its pristine.hashed files. For instance, as of now, deleting the pristine.hashed directory leads to an almost dead-end situation since "darcs repair" refuses to work unless a dummy pristine.hashed directory is created. I sent a test case and a fix for this problem.

Missing pristine files are generally not handled graciously by Darcs while their presence is not necessary (albeit very important for speed). As of now, "darcs get" refuses to work when a pristine file is missing, and this has already bit me in the past. I proposed an enhancement of this behaviour. Other local commands that use the pristine files fail if one file is missing, but never tell the user to run darcs repair. I will probably work on these two proposals soon.

The aim is to make Darcs as robust as possible with its current format, and above all to prevent users from being exposed to unhelpful error messages.

Thanks

Thanks to Florent and to the rest of the laboratoire LIFO for hosting the Darcs team this weekend! Hosting sprints is an excellent way to support and to interact with the Darcs community.

The obligatory Jeanne D'arc statue photo

A special shout-out also goes to Yannick Parmentier, a LIFO researcher (and coincidentally Eric's former office mate) who very kindly visited us to take photos and shuttle us back and forth between Orleans and the lab. Merci, Yannick!

Merci, Yannick!

See you next time!

This was a really fun sprint. We hope you can join us next time, hopefully in 6 or so months. In the meantime, check out the flickr tag darcs-2010-10 for more photos from the sprint.

1 comment:

Followers