Tuesday, November 30, 2010

Coming in Darcs 2.8: new features

In the previous post we talked about Darcs 2.8's read-only support for old-fashioned (OF) repositories. This reduced support will help us simplify the code base and focus on new features and optimizations for the Hashed repositories, that are now the recommended way to use Darcs.

New features

Let us review the features that were not in 2.5 and that we are working on now:
  • rebase: this is a long-wanted Git feature that may be merged into Darcs 2.8. It is currently maintained on a separate branch. We will be confident to include it in the 2.8 release if we are sure we get the UI right by then. We already had some feedback but if you want to try it please go ahead and tell us what you think.
  • repositories without working copies: this is still not accepted in HEAD but we are confident we can have it on time.
Let us now consider an improvement that is already present in Darcs HEAD and is thus guaranteed to appear in 2.8: packs. Basically this optimization makes getting a repository over HTTP much faster.

What happens with darcs get

Let us come back to the darcs get command and see why fetching Hashed repositories is faster than OF (see also the previous post for more information on OF repositories). A Darcs repository contains (among other things):
  • the patches
  • a pristine cache, that is the "sum of patches"
Here is what Darcs does when getting an OF repository:
  1. get patch 0, get patch 1, ..., get patch n
  2. build the local pristine and working directory trees
Hence to start hacking on a project hosted like this, you need to wait as long as the history of the project is large. This is only bearable for small histories. For instance, the Xmonad repository (>1100 patches) can take up to 7 minutes to be retrieved on an ADSL line, while the repositoy itself is only 5 MBytes big.

On the other hand, when getting a Hashed repository, Darcs does:
  1. get pristine file 0, get pristine file 1, ..., get pristine filem
  2. get patch n, get patch n-1, ... , get patch 0
  3. build local working directory tree
Step 2. can be skipped with the --lazy flag, or can be interrupted by hitting CTRL-C. This is fine, since the new local repo has the address of the remote repo and Darcs can fetch patches on demand. Hence, the wait can be reduced a lot, and dowloading with --lazy is as long as the working copy is large. The Xmonad repository has about 30 pristine files, weighting less than 200 KBytes, so getting those would only be a matter of seconds.

By the way, getting a Darcs repo with --lazy is strictly more powerful than doing a checkout of a Subversion repository: even though you don't have patches locally, you have the high-level history of the project (i.e., you can use darcs changes offline). However, looking into the patch contents (darcs changes -v) does require you to access the remote repository.

Darcs 2.8 will contain a new optimization called packs. Running the command darcs optimize --http in a repository will store a pack of the pristine and a pack of the patches inside of it. Packs are basically tar.gz archives. Darcs will detect these packs when doing a get and act as follows:
  1. get pristine pack
  2. get missing pristine files
  3. get patches pack
  4. get missing patches
  5. build working directory tree
Steps 1. and 3. consist in transferring a single file via HTTP, which is much faster than transferring many little files. Steps 3. and 4. are skipped if the flag --lazy is used.

How much faster is this? Here are the figures we got on the Darcs repository  (>8000 patches, >600 pristine files):
  • full get without packs: 40 minutes
  • full get with packs: 3 minutes
  • lazy get without packs: 20 seconds
  • lazy get with packs: 11 seconds
You need to run darcs optimize --http manually on the public repository of your project from time to time, as this will not happen automatically. The best moment to do that is after pushing a tag, since this enables people who use get --tag X (with X being the last tag) to take advantage of the optimization.


This is probably not going to be the only new features that we will try to fit into the 2.8 release, so when we have more we will let you know. Before May 2011 we are going to try to release feature-based alpha  versions of Darcs. They will be called Darcs 2.7.x and will be for users  who want to try out bleeding-edge features and give us feedback.

If you can't wait, you can simply build the current Darcs HEAD with darcs get --lazy http://darcs.net and cabal update && cabal install -f-library inside of the obtained directory.

EDIT: removed darcs-fastconvert from the list since as of now this is not a supported feature of Darcs.


Mark Stosberg said...

Exciting. Thanks for the update!

dixie said...

Just one remark: In the case of the repository with few patches but a lot of files the fetching of the patches can lead to faster download than loading the pristine files.

It is not very common kind of repo. Do you see a needed for a switch for "darcs get" to skip download of the pristine ?
Isn't that already covered by the --complete flag ?

Anyway the packs seems to be the best option :-)