[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

RE: How well does CVS handle other types of data?

From: Greg A. Woods
Subject: RE: How well does CVS handle other types of data?
Date: Tue, 24 Jul 2001 13:36:24 -0400 (EDT)

[ On Saturday, July 21, 2001 at 00:02:51 (-0400), Ralph A Mack wrote: ]
> Subject: RE: How well does CVS handle other types of data?
> Ah, I am learning something here! The primary problem with simply storing
> and retrieving binary files is that there is a finite probability that one
> day a binary file will simply refuse to be retrieved. Let me see if the above
> information gives me a better understanding of the mechanism of this
> problem:
> If the file contains not one 0x0d character, it will treat the entire file
> as a single line and thereby overflow even the most generous buffer. This
> is why it can fail? In the more common case, of course, there will be a
> generous sprinkling of accidental 0x0d characters and so it will work.
> However this problem is almost guaranteed to surface under sufficient load.
> All a remote unlikelyhood requires in order to become a near-certainty
> of happening at least once is a sufficient sample size.

The lack of a newline in a sufficiently large file is certainly one of
the potential problems.

I'm not 100% sure that all cases of handling of revisions without a
trailing newline will always work in RCS (they flat out don't work
properly in SCCS).  (In unix the <LF> is a line _separator_, not a line
terminator, and as such it's not necessary at the end of a file, except
when you're dealing with tools like "diff" which treat it as a separator.)

There are perhaps other implementation specific problems in diff and
diff3.  There are several known cases where GNU diff can try to use
effectively infinite amounts of VM, for example.  That's one of the
reasons why diff normally gives up by default on "binary" files and thus
why RCS (and CVS) must always tell it "--binary".

> For the truly paranoid, the "required files" can include the closure of
> software needed to build the system. - yeah, that's a lot of binary files to
> store.

Rumour has it that one of the largest software vendors in the world
includes not just all the required files, but the hardware too.  They
supposedly literally "freeze" the entire build system in a vault.  If
this is true it would be very interesting to know if they've ever done
anything with such systems after the fact, and if so what and why....

> A less paranoid person might exclude those files for which version
> changes are deemed least likely to have significant impact.

I myself generally assume that the system headers and libraries, kernel,
compiler, shell, etc., will generaly always produce a working binary
given the same source code as input.  I've been caught off-guard by that
though, either with library or kernel bugs, library or kernel
incompatabilities, or even by compiler bugs being either introduced, or
even fixed.  The march down the line of tracking NetBSD-current is an
excellent example of where things can go wrong.  There are many many
many instances of places where tools simply have to be manually rebuilt
and re-installed on the build system before the rest of the system can
be built successfully.  Almost always you have to rebuild the kernel and
reboot before you even begin with the rest of the system.  When you're
lucky the tool in question will simply refuse to work right (eg. make
dies because new makefiles make use of new syntax).  However when you're
not so lucky the re-built system simply refuses to work.

Software construction is a horribly messy business.

> As long as it only uses a single set of underlying tools (i.e. rcs, diff,
> diff3) you are absolutely right. If it permitted different sets of tools to
> operate on different kinds of files, as Paul Sander has argued, then it could
> operate conceptually identically as it does today but fully support a wider
> range of files. I would only extend the model by allowing separate tools for
> identifying and marking meaningful regions of difference between files and
> for merging the differences thus marked. This allows for insertion of true
> computer-assisted manual merge, etc.

I don't disagree -- but I don't hold my breath either, especially not
for any implementation that might preserve sufficient repository
compatability to be of any real use to me personally.

> Both may be true. Unfortunately, that doesn't get me where I want to go. So 
> once
> again, I get to the point of "put up or shut up" and, as with merge tracking, 
> this
> is also too big a task for me to contemplate under present circumstances.

Well, the trick is to use the right tool for the job!  :-)

You see if you take the unix tool-building philosophy as your approach
to the problem then you can use CVS as one of the tools within your
entire SCM system.  It actually works very very well in that capacity!

> Despite my limited time, I really would like to be involved in a project like
> this, though. My plan of action will be first to evaluate TCCS to see if it is
> closer to what I want.

Though in many respects it's not much more of a total SCM solution than
CVS is, you might also be interested in the abilities of Aegis.....

                                                        Greg A. Woods

+1 416 218-0098      VE3TCP      <address@hidden>     <address@hidden>
Planix, Inc. <address@hidden>;   Secrets of the Weird <address@hidden>

reply via email to

[Prev in Thread] Current Thread [Next in Thread]