Tuesday, November 30, 2010

darcs weekly news #80

News and discussions

  1. Alexey Levan is our new Windows packager, and has put online Windows binaries of Darcs 2.5:
  2. We posted two roadmap blogs about Darcs 2.8:

Issues resolved in the last week (0)

Patches applied in the last week (1)

See darcs wiki entry for details.

Coming in Darcs 2.8: new features

In the previous post we talked about Darcs 2.8's read-only support for old-fashioned (OF) repositories. This reduced support will help us simplify the code base and focus on new features and optimizations for the Hashed repositories, that are now the recommended way to use Darcs.

New features

Let us review the features that were not in 2.5 and that we are working on now:
  • rebase: this is a long-wanted Git feature that may be merged into Darcs 2.8. It is currently maintained on a separate branch. We will be confident to include it in the 2.8 release if we are sure we get the UI right by then. We already had some feedback but if you want to try it please go ahead and tell us what you think.
  • repositories without working copies: this is still not accepted in HEAD but we are confident we can have it on time.
Let us now consider an improvement that is already present in Darcs HEAD and is thus guaranteed to appear in 2.8: packs. Basically this optimization makes getting a repository over HTTP much faster.

What happens with darcs get

Let us come back to the darcs get command and see why fetching Hashed repositories is faster than OF (see also the previous post for more information on OF repositories). A Darcs repository contains (among other things):
  • the patches
  • a pristine cache, that is the "sum of patches"
Here is what Darcs does when getting an OF repository:
  1. get patch 0, get patch 1, ..., get patch n
  2. build the local pristine and working directory trees
Hence to start hacking on a project hosted like this, you need to wait as long as the history of the project is large. This is only bearable for small histories. For instance, the Xmonad repository (>1100 patches) can take up to 7 minutes to be retrieved on an ADSL line, while the repositoy itself is only 5 MBytes big.

On the other hand, when getting a Hashed repository, Darcs does:
  1. get pristine file 0, get pristine file 1, ..., get pristine filem
  2. get patch n, get patch n-1, ... , get patch 0
  3. build local working directory tree
Step 2. can be skipped with the --lazy flag, or can be interrupted by hitting CTRL-C. This is fine, since the new local repo has the address of the remote repo and Darcs can fetch patches on demand. Hence, the wait can be reduced a lot, and dowloading with --lazy is as long as the working copy is large. The Xmonad repository has about 30 pristine files, weighting less than 200 KBytes, so getting those would only be a matter of seconds.

By the way, getting a Darcs repo with --lazy is strictly more powerful than doing a checkout of a Subversion repository: even though you don't have patches locally, you have the high-level history of the project (i.e., you can use darcs changes offline). However, looking into the patch contents (darcs changes -v) does require you to access the remote repository.

Darcs 2.8 will contain a new optimization called packs. Running the command darcs optimize --http in a repository will store a pack of the pristine and a pack of the patches inside of it. Packs are basically tar.gz archives. Darcs will detect these packs when doing a get and act as follows:
  1. get pristine pack
  2. get missing pristine files
  3. get patches pack
  4. get missing patches
  5. build working directory tree
Steps 1. and 3. consist in transferring a single file via HTTP, which is much faster than transferring many little files. Steps 3. and 4. are skipped if the flag --lazy is used.

How much faster is this? Here are the figures we got on the Darcs repository  (>8000 patches, >600 pristine files):
  • full get without packs: 40 minutes
  • full get with packs: 3 minutes
  • lazy get without packs: 20 seconds
  • lazy get with packs: 11 seconds
You need to run darcs optimize --http manually on the public repository of your project from time to time, as this will not happen automatically. The best moment to do that is after pushing a tag, since this enables people who use get --tag X (with X being the last tag) to take advantage of the optimization.

Conclusion

This is probably not going to be the only new features that we will try to fit into the 2.8 release, so when we have more we will let you know. Before May 2011 we are going to try to release feature-based alpha  versions of Darcs. They will be called Darcs 2.7.x and will be for users  who want to try out bleeding-edge features and give us feedback.

If you can't wait, you can simply build the current Darcs HEAD with darcs get --lazy http://darcs.net and cabal update && cabal install -f-library inside of the obtained directory.

EDIT: removed darcs-fastconvert from the list since as of now this is not a supported feature of Darcs.

Tuesday, November 23, 2010

Coming in Darcs 2.8: read-only support for old-fashioned repositories

A month ago, we released Darcs 2.5. This version brought performance improvements and a few nice features like trackdown --bisect and UTF8 patch metadata. Since we follow a time-based release schedule, Darcs 2.8, the next major release, will be released in May 2011, and we are at the beginning of the work that will be part of it. So what are we working on now?

One of the hot topics of Darcs 2.8 will be repository formats. As of now, Darcs can work with two kinds of repositories: the old-fashioned (OF) ones and the Hashed ones. OF has been around since the very first Darcs, while the Hashed format has been introduced in Darcs 2.0 in April 2008.

Why Hashed

The reasons why the Hashed format was introduced were robustness and performance.

To understand robustness, one has to know that darcs repositories contains patches (of course), but also a "pristine cache" which is the latest recorded state of the files stored in the repository. It can be rebuilt from the patches of the repository, so it is not really mandatory in theory, however it makes some commands run in reasonable time. The command whatsnew, for instance, show the current unrecorded modifications in your working copy, and works by comparing your working copy with the pristine.

In OF repositories, the pristine directory is simply a plain filesystem tree. The consequences are that if an external program (say Subversion or Unison) adds files to that directory, Darcs will believe they really belong to the pristine cache! Hence, the command whatsnew would tell that you are missing some file in your working copy, and the command record would record a wrong patch! More generally, OF repositories can not be really checked for integrity: Darcs can not know you modified a pristine file by hand.

In Hashed repositories, the pristine is a directory containing files named by the SHA1 hash of their contents, and there is a special file that tells the hash of the root directory of the pristine. Hence, adding bogus files to the pristine directory is harmless: they will never be read. So bring on all your Subversions, Unisons and Dropboxes: Darcs can not longer be tricked.

There is another downside of OF: getting them via HTTP is very slow. Indeed, they contain no list of pristine files. You can not directly copy the remote pristine files in order to get a working copy. So you need to get all the patches, and then locally rebuild a pristine and a working copy, making the delay until getting a working copy linear in the history of the project. On the other hand, the latest working copy of a Hashed repo can be obtained with darcs get --lazy, which does not need to retrieve the history of the project.

The developers are not amused

Those are downsides of OF for users. But we, as developers, are also not very pleased with some parts of the Darcs codebase. The code that handles repositories is not really great and lacks modularity. Moreover, nobody is really motivated to maintain code for OF, since we know Hashed repositories are better and there is ongoing work in order to make Darcs more performant with them. For instance, the work done in Summer of Code 2009 has then been incorporated in Darcs 2.4 and has boosted performance of Darcs a lot. In Darcs 2.5, performance of commands like record and pull have been substantially improved. None of these changes concern OF repositories.

Hence we have decided to stop struggling with old code, and from Darcs 2.8, only read-only access will be provided for OF repositories. This means you will still be able to use the commands get, pull  and send to interact with remote OF repositories. On the other hand writing in an OF repository with record, pull or apply will no longer work, and commands that rely on the pristine files, like whatsnew, diff or dist, will also fail. We will make sure the user interface will remain clear and  helpful in those cases.

For us, this is a step further towards simplify the code base and make it generally more modular. This will help us focussing on performance improvement for the Hashed format and modern features for the Darcs client without having to think about OF maintenance.

Should it concern you?

Switching to Hashed repositories has a drawback: Darcs 1 binaries do not know how to interact with them. If you manage a project whose source code is hosted in a Darcs repository, then you should ensure that all contributors use a Darcs 2 binary (darcs --version).

If you or someone in your project use a Darcs 1 binary, you should check whether it is possible for you to upgrade to Darcs 2.0 or greater. Please consult http://wiki.darcs.net/Binaries for more information and http://wiki.darcs.net/DarcsTwo for converting from OF to Hashed.

For the record, Darcs 2.0 was released in April 2008 and is now adopted in most current versions of the major Linux distributions. The release of Darcs 2.8 is planned for May 2011, so no problematic situation due to OF deprecation will happen before then.

    Consolidating the move

    We know that a few shortcomings remain with hashed repositories. For instance darcs add is noticeably slower on big Hashed repositories: http://bugs.darcs.net/issue1938 . We will try to address these issues by the release of 2.8. We are very interested in knowing whether there are other issues, so please let us know on the bug tracker or by commenting this post.

    Okay, we talked about a "un-feature", but Darcs 2.8 will also contain a few new features that will probably please a lot of you. This will be for another blog post very soon.

    Monday, November 22, 2010

    darcs weekly news #79

    News and discussions

    1. Eric called for a Windows darcs packager:
    2. The lists darcs-users and darcs-devel now follow a more usual separation, that is, patches discussions are now being done on darcs-devel. This will make darcs-users less noisy for non-developers:
    3. Gour explained why he switched from darcs to Fossil:

    Issues resolved in the last week (6)

    issue332 Gabriel Kerneis
    issue1397 Alexey Levan
    issue1637 Dmitry Tsygankov
    issue1965 Reinier Lamers
    issue1970 Florent Becker
    issue1988 Gabriel Kerneis

    Patches applied in the last week (89)

    See darcs wiki entry for details.

    Sunday, November 14, 2010

    darcs weekly news #78

    News and discussions

    1. Reinier talked about the release process of darcs 2.4 and 2.5:
    2. Petr released darcs-fastconvert 0.2:

    Issues resolved in the last week (2)

    issue1551 Eric Kow
    issue1977 Guillaume Hoffmann

    Patches applied in the last week (25)

    See darcs wiki entry for details.

    Sunday, November 7, 2010

    darcs weekly news #77

    News and discussions

    1. Simon Michael worked on adding support for darcs repositories for ohloh.net and called for volunteers:
    2. Christian Maeder asked how to apply a darcs patch to a non-darcs source tree:

    Issues resolved in the last week (2)

    issue1266 Alexey Levan
    issue1984 Dmitry Tsygankov

    Patches applied in the last week (14)

    See darcs wiki entry for details.

    Monday, November 1, 2010

    darcs weekly news #76

    News and discussions

    1. Darcs 2.5 was released!

    2. The report from the Orleans Sprint was posted:

    Issues resolved in the last week (3)

    issue182 Eric Kow
    issue734 Gabriel Kerneis
    issue1809 Ganesh Sittampalam

    Patches applied in the last week (51)

    See darcs wiki entry for details.

    Followers