[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

RE: binary files bad idea? why?

From: Greg A. Woods
Subject: RE: binary files bad idea? why?
Date: Thu, 8 Jul 2004 23:58:42 -0400 (EDT)

[ On Tuesday, July 6, 2004 at 15:18:50 (-0700), Paul Sander wrote: ]
> Subject: RE: binary files bad idea? why?
> If you can come up with a time frame and subject thread in which such
> agreement was made, then I'll be happy to review the entire discussion and
> debate its merits.

How about from the "beginning of time" in this forum, say a decade ago.  ;-)

And BTW, you keep waving over RCS compatability with arguments about
minimum reproducibility that simply do not wash.  RCS compatabiltiy
means _all_ of RCS' features, warts, and concepts.  More about the
reasons for this below.

In the mean time, and in the real world, RCS (and thus CVS) and all the
tools they use and are used with them work _best_ and primarily with
text files.  I.e. until someone provides working code that makes the
diff/diff3 toolset used _internally_ in CVS (_and_ RCS) to be selectable
(on a per-revision basis!!!!), there's no point to even pretending that
non-text files can be handled generically by CVS.

And BTW, the point about "on a per-revision basis" is/was supposed to be
a strong clue to you to show just how hair-brained and nearly impossible
to achieve your ideas are.  It's also a _necessary_ requirement for both
RCS and CVS, which manage _files_ and groups of _files_, but not the
structure of the grouping.

The main idea of change management is to capture and identify _changes_,
not to record exact replicas of specific revisions across time.  The
latter comes from the former, not the other way around.  Changes are
best specified as the the edit actions that were done to make them.  Why
do you think it is that deltas are stored as edit scripts in both RCS
and SCCS files?  I'll tell you for certain it wasn't just because there
were already well known algorithms (and ready implemented tools) to
create and make use of those edit scripts (though that was of course a
big part of it).

If you want to capture the essense of the changes made to binary files
then you _must_ capture the actions used to effect those changes no
matter what form those actions take -- that's the whole idea.  In
computer science one of the most logical, and most widely used, ways to
do that is to design a human/computer language that can specify the
binary file's internal structure and content.  Once you have done that
then it's trivial to use the text editing and comparison tools we
already have to capture and record and document the changes in the
binary file by capturing the edits made to the text file(s) containing
the instructions in the language used to create the binary file.  And
what do you know but that's _exactly_ what a programmer does when
writing and modifying a program in a compiled language such as C (or
even a CAD operator using a drawing tool such as xfig which is just a
complicated GUI editor for a human/computer language used to define and
describe drawings).  Note though that most of us (the sane ones amongst
us :-) don't bother to record copies of revisions of the intermediate or
product files created from our source code, nor the deltas between their
revisions, because the changes between those revisions of binary files
are meaningless (and besides we can reproduce those intermediate and
product files on demand assuming we've also captured enough of the
relevant information about our build process within the rest of our
software configuration management environment).  Even the changes
between revisions of some text files that are created from other text
files are meaningless (e.g. PostScript generated from Troff or Lout; or
"configure" scripts generated from "configure.m4" sources), and so
storing those text-form intermediate files, especially in any scenario
where they might ever have to be merged, is ludicrous and wasteful and

I.e. the mere idea of wanting to store a binary file in a change control
system, especially a generic one like CVS, and most especially one
that's designed explicitly and specifically for the primary goal of
supporting concurrent editing, is very wrong and completely nonsensical
from the get go.  If all you have are binary files then _any_ other and
_all_ other version management tools are better than CVS (and RCS and
SCCS and anything else using diff, diff3, & patch).  One sure as hell
doesn't need to use RCS files to store revisions of binary files if the
deltas between those revisions are meaningless to most any human when
they're presented to that human in the format they're stored in (i.e. in
the RCS internal form).  If anyone's going to go to the trouble of
trying to record and document revision history of any set of files then
they'd be outright stupid to use an inappropriate tool to do so,
especially if their sole reason for using the wrong tool was simply that
it was what they had at hand or what they happened to already know.

Now if you really wanted to make some progress in computer science and
software engineering technology then you'd think about designing and
implementing tools that could identify and capture a more expressive
form of edits by comparing to copies of a text file (*), instead of just
continuing to blow your horn about storing in a change control system
such as CVS what should always be considered to be intermediate and/or
product files.

(*) e.g. how about inventing an extension to existing diff/merge
algorithms that could spot identifier and word substitutions
(e.g. variable renames, etc.) just by textual comparison of a revised
file with its ancestor.  If you could compress a variable rename where,
say, 25% of the lines of a file are changed as a result down into one
single edit command then you'd do wonders for conflict avoidance in
merges where such edits would certainly conflict with other variable
renames done on other branches, not to mention structural changes, and
so on.

                                                Greg A. Woods

+1 416 218-0098                  VE3TCP            RoboHack <address@hidden>
Planix, Inc. <address@hidden>          Secrets of the Weird <address@hidden>

reply via email to

[Prev in Thread] Current Thread [Next in Thread]