Responses to more Meta-CVS Criticisms.

In early 2003, Tom Lord, developer of Arch, wrote something in the Arch mailing list to the effect that projects like Meta-CVS and DCVS are better replacements for CVS than Subversion, for various reasons. A year later, Greg Hudson, Subversion developer, has stumbled upon it and written a response which contains some clueless, ``theoretical'' criticisms of Meta-CVS.

The URL is: http://web.mit.edu/ghudson/thoughts/undiagnosing

Here is a citation of the relevant section of the document:

1. Meta-CVS and DCVS

"People have recently said things here along the lines of 'svn
  fails to significantly improve upon CVS and, to the degree it
  does, meta-CVS and dcvs do the same job in a better way' (I pretty
  much agree)" [quoting Tom Lord]

Meta-CVS implements directory structure versioning using CVS as an
underlying layer, much as CVS originally used (and in some sense,
still uses) RCS as an underlying layer.  DCVS is a reimplementation of
a CVS server.  From a practical perspective, neither appears to have a
significant user or developer community, so let's evaluate this claim
from the theoretical perspective.

Meta-CVS's layering strategy reduces initial development time, but it
does not achieve any kind of meaningful compatibility with the
installed base, it does not migrate away from the CVS code base (which
is widely acknowledged to be awful), and it cannot achieve whole-tree
versioning or failure-atomic commits.  Moreover, the layering strategy
almost certainly introduces puzzling failure modes where a CVS error
messages is presented in response to a Meta-CVS command.

By using the CVS client, DCVS achieves compatibility with the CVS
installed base, but is hamstrung both by the CVS network protocol
(which is also widely acknowledged to be awful) and by the client's
operational model, which assumes that all versioning is performed by
file pathname and there are no directory or tree versions.  So, while
it can accomplish atomic commits and other improvements over CVS, it
cannot accomplish proper directory structure versioning.

My responses:

From a practical perspective, neither appears to have a significant user or developer community, so let's evaluate this claim from the theoretical perspective.

Good grief, Greg! So a working, robust program is theoretical if it doesn't have a large developer community? What do you mean by theoretical perspective—I hope it's not ``I have not tried or looked at these programs, but since they don't appear to have big communities, I can say the following things ...''

What do you count as the developer community of Meta-CVS? Meta-CVS is a solution made up of some 5800 lines of Lisp that I wrote, plus CVS and all of the surrounding tools. The developer commmunity is not just me, but everyone involved in CVS. One developer is plenty for the ongoing maintenance of the Lisp part. How many people should it take to maintain 5800 lines of code?

I don't know how many users there are of Meta-CVS, but they exist and I'm in touch with some of them. They are all happy, and don't send me any bug reports whatsoever. Consequently, we don't need a big website or mailing list or any of that.

Subversion already had a user base when I started working on Meta-CVS in January 2002. That's a significant head start. Why did I start working on Meta-CVS? Because I finally tried Subversion after waiting for it to become ready all through 2001. I was quite excited that there would be a nice replacement for CVS. I had ideas for Meta-CVS, but shelved them because of announcements that there is a new replacement for CVS that will be ready within a year. When I finally tried Subversion, I realized that it was not what I wanted, and was nowhere close to being ready. My appetite for a better CVS was whetted by then, though, so I knew I just had to write one according to my own vision. I wanted a working, robust system, with complete directory structure versioning handling all corner cases, in my hands as soon as possible. I got one.

I represent the ``old school'' of development which places an emphasis on producing working code first. These days, it seems as if everyone is too busy trying to create communities first, hoping that the code can be fixed later. It shows.

Meta-CVS's layering strategy reduces initial development time, but it does not achieve any kind of meaningful compatibility with the installed base.

Understatement of the year! Zero to self-hosting in three weeks is very reduced development time. Meaningful compatibility with the installed base? Are you kidding? You can convert a CVS project to Meta-CVS directly from its unmodified RCS files! There is no information loss whatsoever. All the tags, branches, checkin comments, everything is preserved. You get to keep your CVS server, and the configuration that surrounds it. Your CVSROOT environment variable works with Meta-CVS, and so does your SSH setup or CVS pserver password database. Chances are your server-side CVS scripts like commitinfo work too. Is this a lack of any kind of meaningful compatibility?

[T]he CVS codebase ... is widely acknowledged to be awful.

The CVS codebase is well debugged, and stable. Such codebases sometimes look awful. The reason they look awful is that they start as a clean, academic program which does not handle all of the corner cases to make it suitable for the real world. What looks like cruft is actually the result of a pile of very necessary fixes for proper behavior, platform portability and so forth. There is an excellent Joel on Software article about messy code and rewrites. Every developer should read this.

[Meta-CVS] cannot achieve whole-tree versioning or failure-atomic commits

There is an assumption here that everyone agrees that whole-tree versioning is superior and the ``right way'' to implement version control. Whole-tree versioning means that every checkin, even of just changes to one file, generates a new tree—this is very much like functional programming, which involves the construction of a new object when some constituent element needs to be changed. For example, in the Lisp language, suppose we have the tree (1 (2 3)) and would like to change it to (2 (2 3)). One way to do it is to use functional programming, which avoids destruction of the original object. Every place in the program that has a reference to (1 (2 3)) will continue to see that object. What we do is make a new tree (2 (2 3)). To save space and time, the (2 3) constituent is just a pointer to the (2 3) constituent of the original (1 (2 3)) object. This is called substructure sharing. Whole tree versioning works similarly. The directory structure is a bunch of pointers on disk, and new trees of pointers are allocated to create new versions.

Not everyone agrees that whole-tree versioning is a good idea. Part of the reason I wrote Meta-CVS instead of becoming a Subversion user is that I specifically like the CVS approach to versioning, and don't require anything different. Files are versioned independently, and baselines are simply set-associations based on some common property such as a tag. This approach provides a lot of flexibility. It's easy to do things like selecting arbitrary revisions of files independently and associating them to form a logical baseline that can be used as the basis of branch, etc. It's also a low-risk approach; if something goes wrong and recovery is needed, one deals with individual files. It also helps that those files are in a text format.

A big disadvantage of whole-tree versioning is that the directory structure is not versioned in the same way as the document contents. A separate set of algorithms is needed to operate on the directory structure as a versioned object. In Meta-CVS, branching and merging works the same way on the directory structure as on text documents. It's stored as just another text document that is versioned independently. For instance, it's possible to compute a delta between arbitrary versions of the directory structure, and apply that as a patch to your local version. The directory structure is a simple, flat, text database that can be manipulated with a text editor.

It is written on the Subversion website that ``directory structure versioning is a Hard Problem'' but the truth is that what they have is a solution that is hard. The problem isn't hard at all. It's sometimes easy to fall into the trap of viewing a problem as hard when one has only generated hard solutions. It's like mathematics: sometimes a clever algebraic trick, a change to some other coordinate system or some other piece of insight reduces a problem to shreds. In physics, the discovery of some underlying unifying principle like conservation of energy can blow away a myriad calculations. In computer science, sometimes choosing the right data structures makes nice, simple algorithms ``pop'' right out, albeit those algorithms are not always efficient.

To address the second point, one version of CVS, namely CVSNT, now has atomic commits. I believe that Meta-CVS can be used on the client side with CVSNT on the server side, thereby benefiting from atomic commits. It's just a matter of time before CVSNT merges into mainstream CVS.

Moreover, the layering strategy almost certainly introduces puzzling failure modes where a CVS error messages is presented in response to a Meta-CVS command.

Almost certainly? Is this more theoretical guessing from a non-user of the program? Here is the truth: CVS error messages are perfectly meaningful through Meta-CVS. The only regard in which CVS errors might be seen as puzzling is that they refer to objects by their CVS filenames which are 128 bit identifiers represented in hex. For situations when this is a problem, Meta-CVS originally provided a stream editing command that substituted pathnames for these object ID's. That filtering was rolled into the program and turned into a default behavior which can be disabled. After this object ID to name substitution, the CVS output is perfectly clear.

For instance, here is what a diff looks like:

$ mcvs diff -u clisp-backquote-patch.html
Index: clisp-backquote-patch.html
===================================================================
RCS file: /home/projects/cvsroot/old-website/F-EFC489EB97D9B5DAC50DB9539838C69D.html,v
retrieving revision 1.1.1.1
diff -u -r1.1.1.1 clisp-backquote-patch.html
--- clisp-backquote-patch.html	7 Aug 2010 03:13:38 -0000	1.1.1.1
+++ clisp-backquote-patch.html	7 Aug 2010 17:24:55 -0000
@@ -1,13 +1,14 @@
 <html>
-  <title>New backquote implementation for CLISP</title>
+  <title>Not-so-new-anymore backquote implementation for CLISP</title>
   <body>
-    <h2>New backquote implementation for CLISP</h2>
-    I have developed a new implementation of the backquote syntax
-    for CLISP. This is available as a patch 
+    <h2>Not-so-new-anymore backquote implementation for CLISP</h2>
+    In the spring of 2003, I developed a new implementation of the backquote
+    syntax for CLISP. This was originally available available as a patch 
     <a href="clisp-2-30-backquote-2003-04-01.diff">against CLISP 2.30</a>
     and
     <a href="clisp-cvs-2003-04-01-backquote.diff">against the CLISP CVS trunk (2003-04-26)</a>.
-    The patch has now been integrated into the CLISP CVS.
+    Not long afterward, the patch has now been integrated into the CLISP CVS,
+    and further hacked on by Bruno Haible and others.
     <p>
     This backquote implementation eliminates a number of defects that
     are present in the original, and also provides a list-based target

Without the implicit filtering (disabled by --nofilt), it's like this:

$ mcvs --nofilt diff clisp-backquote-patch.html
Index: F-EFC489EB97D9B5DAC50DB9539838C69D.html
===================================================================
RCS file: /home/projects/cvsroot/old-website/F-EFC489EB97D9B5DAC50DB9539838C69D.html,v
retrieving revision 1.1.1.1
diff -u -r1.1.1.1 F-EFC489EB97D9B5DAC50DB9539838C69D.html
--- F-EFC489EB97D9B5DAC50DB9539838C69D.html	7 Aug 2010 03:13:38 -0000	1.1.1.1
+++ F-EFC489EB97D9B5DAC50DB9539838C69D.html	7 Aug 2010 17:24:55 -0000
@@ -1,13 +1,14 @@
[ ... et cetera]

I have not received one e-mail from a user who was stumped by a bizarre failure. Meta-CVS is poorly documented, and yet people are running with it, without any help from me. Go figure.

So, while [DCVS] can accomplish atomic commits and other improvements over CVS, it cannot accomplish proper directory structure versioning.

I really have to use the word ``doh'' here, because Meta-CVS can be modified to work over DCVS! The DCVS client program has almost exactly the same interface as CVS except that the command is renamed to ``dcvs'' and all the environment variables and administrative directory are named with the DCVS prefix rather than CVS. All we have to do in Meta-CVS is to support these different names and we have a distributed CVS with directory structure versioning, symbolic links, etc.

I've already opened a dialog with the DCVS developers regarding Meta-CVS integration, to find out what it would take and whether there are any hidden surprises. I was assured that there aren't any.

I'm not aware, by the way, that DCVS accomplishes atomic commits. To my understanding, commits are done to a CVS repository in the normal way. DCVS achieves distribution among repositories, by way of the cvsup codebase with some important enhancements to allow local lines of development not to conflict with other lines when they are distributed. Namely, tags are augmented with namespace qualifiers, and nodes in the DCVS network are assigned non-conflicting ranges of integers for use as local branch numbers. DCVS might get atomic commits one day if it merges with CVSNT.