In the previous post we talked about Darcs 2.8's read-only support for old-fashioned (OF) repositories. This reduced support will help us simplify the code base and focus on new features and optimizations for the Hashed repositories, that are now the recommended way to use Darcs.
Let us review the features that were not in 2.5 and that we are working on now:
rebase: this is a long-wanted Git feature that may be merged into Darcs 2.8. It is currently maintained on a separate branch. We will be confident to include it in the 2.8 release if we are sure we get the UI right by then. We already had some feedback but if you want to try it please go ahead and tell us what you think.
Here is what Darcs does when getting an OF repository:
get patch 0, get patch 1, ..., get patch n
build the local pristine and working directory trees
Hence to start hacking on a project hosted like this, you need to wait as long as the history of the project is large. This is only bearable for small histories. For instance, the Xmonad repository (>1100 patches) can take up to 7 minutes to be retrieved on an ADSL line, while the repositoy itself is only 5 MBytes big.
On the other hand, when getting a Hashed repository, Darcs does:
get pristine file 0, get pristine file 1, ..., get pristine filem
get patch n, get patch n-1, ... , get patch 0
build local working directory tree
Step 2. can be skipped with the --lazy flag, or can be interrupted by hitting CTRL-C. This is fine, since the new local repo has the address of the remote repo and Darcs can fetch patches on demand. Hence, the wait can be reduced a lot, and dowloading with --lazy is as long as the working copy is large. The Xmonad repository has about 30 pristine files, weighting less than 200 KBytes, so getting those would only be a matter of seconds.
By the way, getting a Darcs repo with--lazy is strictly more powerful than doing a checkout of a Subversion repository: even though you don't have patches locally, you have the high-level history of the project (i.e., you can use darcs changes offline). However, looking into the patch contents (darcs changes -v) does require you to access the remote repository.
Darcs 2.8 will contain a new optimization called packs. Running the command darcs optimize --http in a repository will store a pack of the pristine and a pack of the patches inside of it. Packs are basically tar.gz archives. Darcs will detect these packs when doing a get and act as follows:
get pristine pack
get missing pristine files
get patches pack
get missing patches
build working directory tree
Steps 1. and 3. consist in transferring a single file via HTTP, which is much faster than transferring many little files. Steps 3. and 4. are skipped if the flag --lazy is used.
You need to run darcs optimize --http manually on the public repository of your project from time to time, as this will not happen automatically. The best moment to do that is after pushing a tag, since this enables people who use get --tag X (with X being the last tag) to take advantage of the optimization.
This is probably not going to be the only new features that we will try to fit into the 2.8 release, so when we have more we will let you know. Before May 2011 we are going to try to release feature-based alpha versions of Darcs. They will be called Darcs 2.7.x and will be for users who want to try out bleeding-edge features and give us feedback.
If you can't wait, you can simply build the current Darcs HEAD with darcs get --lazy http://darcs.net and cabal update && cabal install -f-library inside of the obtained directory.
EDIT: removed darcs-fastconvert from the list since as of now this is not a supported feature of Darcs.
A month ago, we released Darcs 2.5. This version brought performance improvements and a few nice features like trackdown --bisect and UTF8 patch metadata. Since we follow a time-based release schedule, Darcs 2.8, the next major release, will be released in May 2011, and we are at the beginning of the work that will be part of it. So what are we working on now?
One of the hot topics of Darcs 2.8 will be repository formats. As of now, Darcs can work with two kinds of repositories: the old-fashioned (OF) ones and the Hashed ones. OF has been around since the very first Darcs, while the Hashed format has been introduced in Darcs 2.0 in April 2008.
The reasons why the Hashed format was introduced were robustness and performance.
To understand robustness, one has to know that darcs repositories contains patches (of course), but also a "pristine cache" which is the latest recorded state of the files stored in the repository. It can be rebuilt from the patches of the repository, so it is not really mandatory in theory, however it makes some commands run in reasonable time. The command whatsnew, for instance, show the current unrecorded modifications in your working copy, and works by comparing your working copy with the pristine.
In OF repositories, the pristine directory is simply a plain filesystem tree. The consequences are that if an external program (say Subversion or Unison) adds files to that directory, Darcs will believe they really belong to the pristine cache! Hence, the command whatsnew would tell that you are missing some file in your working copy, and the command record would record a wrong patch! More generally, OF repositories can not be really checked for integrity: Darcs can not know you modified a pristine file by hand.
In Hashed repositories, the pristine is a directory containing files named by the SHA1 hash of their contents, and there is a special file that tells the hash of the root directory of the pristine. Hence, adding bogus files to the pristine directory is harmless: they will never be read. So bring on all your Subversions, Unisons and Dropboxes: Darcs can not longer be tricked.
There is another downside of OF: getting them via HTTP is very slow. Indeed, they contain no list of pristine files. You can not directly copy the remote pristine files in order to get a working copy. So you need to get all the patches, and then locally rebuild a pristine and a working copy, making the delay until getting a working copy linear in the history of the project. On the other hand, the latest working copy of a Hashed repo can be obtained with darcs get --lazy, which does not need to retrieve the history of the project.
The developers are not amused
Those are downsides of OF for users. But we, as developers, are also not very pleased with some parts of the Darcs codebase. The code that handles repositories is not really greatand lacks modularity. Moreover, nobody is really motivated to maintain code for OF, since we know Hashed repositories are better and there is ongoing work in order to make Darcs more performant with them. For instance, the work done in Summer of Code 2009 has then been incorporated in Darcs 2.4 and has boosted performance of Darcs a lot. In Darcs 2.5, performance of commands like record and pull have been substantially improved. None of these changes concern OF repositories.
Hence we have decided to stop struggling with old code, and from Darcs 2.8, only read-only access will be provided for OF repositories. This means you will still be able to use the commands get, pull and send to interact with remote OF repositories. On the other hand writing in an OF repository with record, pull or apply will no longer work, and commands that rely on the pristine files, like whatsnew, diff or dist, will also fail. We will make sure the user interface will remain clear and helpful in those cases.
For us, this is a step further towards simplify the code base and make it generally more modular. This will help usfocussing on performance improvement for the Hashed format and modern features for the Darcs client without having to think about OF maintenance.
Should it concern you?
Switching to Hashed repositories has a drawback: Darcs 1 binaries do not know how to interact with them. If you manage a project whose source code is hosted in a Darcs repository, then you should ensure that all contributors use a Darcs 2 binary (darcs --version).
For the record, Darcs 2.0 was released in April 2008 and is now adopted in most current versions of the major Linux distributions. The release of Darcs 2.8 is planned for May 2011, so no problematic situation due to OF deprecation will happen before then.
Consolidating the move
We know that a few shortcomings remain with hashed repositories. For instance darcs add is noticeably slower on big Hashed repositories: http://bugs.darcs.net/issue1938 . We will try to address these issues by the release of 2.8. We are very interested in knowing whether there are other issues, so please let us know on the bug tracker or by commenting this post.
Okay, we talked about a "un-feature", but Darcs 2.8 will also contain a few new features that will probably please a lot of you. This will be for another blog post very soon.