Friday, April 24, 2009

darcs hacking sprint #2 report

Our second darcs hacking sprint was held in Utrecht, the Netherlands from 17-19 April.

Overview

This is our second sprint, but it was also the first one that was held in a single location. Being all in one place was extremely helpful for us because it gave everybody a chance to interact with everybody else (especially with other Haskellers)! It also meant that instead of hacking a lot of code, we focused on sharing knowledge within the team and with the rest of the community, developing new ideas and planning for the future.
  • Total sprint patches : 24
  • Participants: 11 onsite and 3 offsite

Presentations

An alternative patch theory notation?

whiteboard scribbles from Marnix Klooster's patch theory notationMarnix Klooster presented an alternative notation for patch theory based on priniciples elucidated by Edsger Dijkstra. The goal is to develop a way of writing patch theory that is easier to understand. For example, instead of treating patch commutation as a primitive, as in (p,q) ↔ (q',p'), we could talk about pushing a patch over another patch, (e.g. p ⊳ q), which also has the benefit of avoiding introducing new patch names such as p'. Also instead of writing the inverse of p as p-1, we could write them as -p.

question and answer session

On Saturday morning, Eric led a Question and Answer session in which the wider Haskell community could ask questions about darcs usage or share their experiences using darcs and learn more about the upcoming darcs 3 transition.

Some of the features we discussed are the patch matching --match features, and the notion of context files which give you a precise notion of repository version, and also the darcs annotate command.

One of the interesting experiences we learned from the Haskell community was that it is often useful to collect a set of completely unrelated repositories into a one. A problem that arises in this use case that repositories would have common filenames like "Makefile" that give rise to a conflicts. As a result of this discussion, Ganesh had some new ideas about how darcs could work to better support this scenario.

Thanks to Ben Moseley for the suggestion and to Ganesh and Petr for answers!

time-sliced HPC code coverage

Thorkil Naur gave a demonstration of his creative technique for using the HPC code coverage tool to examine why some darcs commands take so long. With the help of HPC and a small shell script, Thorkil can show what parts darcs code are being called at various time intervals during a long darcs session.

Discussions and implementation

The camp/darcs-3 transition

Ian, Ganesh and Eric discussed plans for transitioning to camp or darcs 3. We still have a lot of questions left to answer, but four points are clear to us:
  1. We at least want the camp-core (patch theory 3) component to be part of future darcs.
  2. Ian is focused on camp fundamentals for now, that is, proving things in Coq.
  3. We anticipate at least three more darcs releases (2010-07) before this fundamentals work is done.
  4. Meanwhile, the darcs team should work on converging to camp, for example, by turning as much code into small, reusable libraries as possible
More on the roadmap in a future e-mail!

Camp/darcs-3 and Coq

Ian and Florent (hacking from Berlin) worked on formal proofs of Camp theory using the Coq proof assistant. One of the major discussions that came up was how to merge their two approaches. As a result of this discussion, Florent has come up with a plan for merging their patch theory proof initiatives.

Overall, the sprint was very productive for Ian. He achieved his goal of finishing the proof that he was working on, and also did some refactoring of it, making it much simpler and shorter than it started off. This will be even more important as he gets to the larger proofs that are still to come. Ian also refactored everything to use the Coq module system properly. Being able to pick the brains of some more experienced Coq folk at the Hackathon was key to getting so much done. Also valuable were some good discussions, which have given him a lot to think about!

implementation
  • Proof completed and refactored by Ian
  • Some minor bugs fixed in the camp theory and paper.

Documentation

Daniel Carrera continued his work on making the darcs wiki and patch theory documentation friendlier.
Daniel Carrera and Sigrid Kronenberger
Despite a lot of recent technical difficulties -- the darcs team is in the middle of a switch from MoinMoin to a darcsit wiki -- Daniel has made some great progress and has put together a static demo of the new content.

screenshot of patch theory documentation page

Filecache optimisation

The filecache is a mapping from a filename at a certain point in history to the creation name of the file and all patches touching the file. It can be used to speed up commands such as "changes filename" or "annotate filename" where performance is important for repository browsing tools, e.g. darcsweb or darcsit.

Benedikt and other darcs hackers discussed the design of the filecache and how changes and annotate can be optimized by making use of it as well as the interaction between this and the new hashed-storage work . This lead to a simpler design of the filecache based on unique filenames (creation name + creation patch).

implementation:
  • Prototype that creates filecache with "optimize --filecache"and uses it for "changes filename" implemented. These will be submitted for review when "annotate" has been adapted.
  • Optimize parseDate by switching from String to ByteString to speed up reading the inventory file. Pending review.

Source file mechanism

Petr and Eric discussed the current problems with the caching mechanism used by hashed repositories.

Darcs's --hashed repositories make darcs a lot safer to use and also allow for performance gains such as --lazy patch fetching and the global patch cache. But there are some annoying bugs in its implementation, for example, stale entries in our cache could cause darcs to waste time connecting to and timing out from non-existent servers.

Petr has developed a plan to solve these issues by ignoring bad cache locations, warning about them, and deleting them when it is appropriate to do so. He has begun studying the darcs caching source code to prepare for future implementation work.

zlib and CRC errors

Ganesh also worked on the corrupted gzip patch files problem. After discussions with Duncan Coutts he implemented code in the Haskell zlib library to support continuing after finding patch corruption, and then added code to darcs to make use of this new API. Still to do is to clean up the zlib patches and submit to Duncan, and to make darcs repair able to fix the broken patches.

Test suite improvements

Reinier Lamers began some work on optimising the darcs QuickCheck test suite so that we could run it all the time. This turned out to be very challenging because the performance problems in the unit tests were less localized in two bugs than we expected beforehand. Reinier was able to fix one of these bugs, but the resulting speedup is not of an order-of-magnitude kind and is hardly noticable without a timer. Future work will be to further investigate how to reduce the number of patch equality tests, which will require us to delve further into the patch theory code.

implementation:

  • Profiling of darcs and test suites
  • Patches sent and applied for some small optimisations.

Future work: filenames and semi-conflicts

Spurred by a question from Peter Verswyvelen in the Darcs Q&A about merging two unrelated repositories into separate subdirectories, Ganesh figured out a way of making the resulting add-add conflicts be more manageable than in current darcs.

Roughly speaking, darcs could associate each filename with some sort of unique ID. If we then try to merge two unrelated repositories that (for instance), have the same filename, darcs need not treat them as an actual conflict. These files would have a semi-conflicted state, where darcs does not really think of them as being conflicting, but warns the user about the clashing filenames. This mechanism could also be easily extended to deal with issues such as case-insensitive filesystems, or specific unsupported filenames (e.g. COM1 on Windows).

Future work: conflict state

Nicolas Pouillard developed an proposal for storing conflict markings in a "conflict state" file, which can then be passed on to high level conflict resolution tools, for example with a graphical interface.

Future work: commute properties tool

Ganesh continued work on a tool he has been writing for some months which is intended to help with designing new patch types and their commutation rules. He doesn't have anything ready to unleash on the world yet, but hopes to soon.

Participants

In addition to the work listed above, there was lots of other interesting work and discussions, not to mention time invested in getting (re)acquainted with the darcs source (Arjan, Eric and Ben). Thanks to everybody that participated and hope to see in hacking sprint #3!

Here's the list of on-site participants in alphabetical order.
  1. Arjan Boeijnk
  2. Daniel Carrera
  3. Marnix Klooster
  4. Eric Kow
  5. Ian Lynagh
  6. Ben Moseley
  7. Thorkil Naur
  8. Nicolas Pouillard
  9. Petr Ročkai
  10. Benedikt Schmidt
  11. Ganesh Sittampalam

Thanks to...

Haskell Hackathon organisers and sponsors

The Hac5 organisers did a brilliant job. We had Internet access, food and most importantly, a very warm welcome! Thanks to the Utrecht team for their very hard work getting the logistics down and congrulations for keeping 50-odd Haskell hackers happily hacking away.

Thanks also to the sponsors of the Hackathon for their generous support:

Darcs Donors

Speaking of generosity, this is the first darcs hacking sprint to be supported with donations from the darcs community. We raised $1072 with contributions from 24 donors. This allowed us to subsidize travel for three of our darcs hackers and have some money left over for future darcs hacking sprint.

Participants

Finally, thanks to all the participants of the Haskell Hackathon! It was a real joy for so many of us darcs hackers to be in a single room at the same time, and also for us to chat with folks from the wider Haskell community. Let's do this again!

Acknowledgements

Except for the group photo above (by Thomas Davie), and the photo of Ganesh by the whiteboard (by Reinier Lamers), these photos are by Martijn van Steenbergen and are available under the terms of the Creative Commons Attribution-NonCommercial-NoDerivs 3.0 license license. Many thanks to Martijn, both for the photos and for tips on the pronunciation of Dutch vowels!

Thanks also to darcs hackers who contributed to this report.

9 comments:

Martijn said...

It's good to know that you made so much progress! And thanks for using my pictures. :-) Teaching you guys Dutch vowels was a lot of fun. :-)

Unknown said...

Thanks to the Utrecht team for their very hard work getting the logistics down and congratulations for keeping 50-odd Haskell hackers happily hacking away. | drywall services

Joana said...

Eric led a Question and Answer session in which the wider Haskell community could ask questions about darts usage or share their experiences using darts and learn more about the upcoming darcs 3 transitions.

anonymous said...

Darcs would consider these files to be semi-conflicted but nonetheless alert the user to the conflicting filenames even if it does not actually view them as being in conflict. | stump grinding

Unknown said...

Benedikt and other darcs hackers discussed the design of the filecache and how changes and annotate can be optimized by making use of it as well as the interaction between this and the new hashed-storage work . mold remediation services miami florida

Anonymous said...

Marnix Klooster's innovative approach to an alternative patch theory notation is both intriguing and practical. Simplifying concepts like patch commutation into more intuitive terms, such as pushing a patch over another, seems like a promising way to enhance understanding. The avoidance of introducing new patch names and the reimagining of patch inverses as -p are creative solutions that could make the notation more accessible. Loldle

Unknown said...
This comment has been removed by the author.
Unknown said...

Being able to pick the brains of some more experienced Coq folk at the Hackathon was key to getting so much done.
- www.dallasdrywallsolutions.com/drywall-contractor-dallas-texas

Nathalie said...

Thanks for sharing your journey. https://haroclean.com/

Followers