gnu-arch-users
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Gnu-arch-users] arch lkml


From: Tom Lord
Subject: Re: [Gnu-arch-users] arch lkml
Date: Sat, 13 Dec 2003 10:21:31 -0800 (PST)



    > From: address@hidden (Eric W. Biederman)

    > I guess my feel is that if you have the concept of a distributed
    > archive where each instance of it had some subset of the entire
    > archive it would be quite similar to the current case.
    > Different instances would play the roles of caches, mirrors, and
    > development archives.

That is what arch provides for.


    > As far as I can tell because of the current structure of arch
    > repositories I cannot have two copies of the same repository on
    > two separate machines because then there would be no way to
    > prevent a collision when two simultaneous revisions of the same
    > base version are checked into the different archives.

Can not have two _writable_ copies -- yes, that's true.

That's for two reasosn: (1) to maintain two writable copies would
require providing a "two-stage commit" operation.  It is physically
impossible (in this universe, as far as we know) to implement such an
operation without simultaneously introducing additional complexities
to the recovery operation in the event such a commit is interrupted.
(In the Star Trek universe, it appears that the situation may be
different.)  (2) nothing needed would be gained by providing two-stage
commit and multiple writable instances: all committers would need to
be able to (in effect) contact both copies at every commit (so there's
no savings (in fact a penalty) in network traffic, for example).
You'll always have better performance and reliability by either having
all committers contact a single writable archive, or having them
commit to distinct archives (but branch between them).  Since arch is
able to operate across archive boundaries without penalty, there is no
reason not to use multiple archives.

But of course you can have an unbounded number of read-only, full or
partial mirrors of an archive.



    >> Look at it this way: we have some set of "raw data" (all the
    >> data stored by `commit', `tag', or `import' across all of the
    >> archives in some domain of consideration (e.g., "the free
    >> software developer community" or "the developers employed by
    >> XYZZY Corp.").

    >> That raw data expands over time according to some simple, core,
    >> transactional rules (i.e., what `commit', `import', and `tag' mean).

    > And this is fundamentally where my concern lies.  What `commit',
    > `import', and `tag' mean.  These things are tied very closely to the
    > archive design and format.  

No they aren't.  Why do you think they are?  The current archive
format happens to make them easy to implement and inexpensive -- but
then what else would you expect (a format that makes them hard to
implement and expensive?).

    > If the semantics are too limited an archive can hit a wall.  
    [....]
    > The problem is once an archive system is a fundamental part of
    > your process it becomes very hard to change.

The semantics of the three write operations (import, commit, tag) are
not defined operationally in terms of the archive format -- they are
defined abstractly in terms of changeset semantics and namespace structure.


    > The best practical test I can think of for having semantics that
    > are a superset of other systems is if you can import and
    > reexport other archives without loss of information.  One of the
    > things unicode got right.  Of course once you have done
    > sophisticated things you may no longer be able to reexport into
    > a lesser format without data loss, but that does not apply into
    > the import/export case.

A superset of, for example, CVS would be a mistake.  CVS has an ad
hoc, needlessly complex semantics.

    > Looking for a base format that can do everything efficiently is
    > certainly the wrong thing.  But at the same time it is stupid to
    > stop the consideration of other formats because you have found
    > something.  If a better layout can be found that meets all of
    > your original requirements but can do more things simply that is
    > a better thing.

It would indeed be "stupid to stop the consideration of other formats"
-- but why do you think anyone in arch world has done so?

There is nothing in the semantics of arch's core operations that
presumes a particular archive format.  I've worked hard to preserve
that property and you can see a recent example of that in the design
effort to add support for signatures.  There are other approaches to
signatures -- including some that have been proposed on the list or in
#arch in the past -- which would "break" the design of arch by linking
the semantics of core functionality to details of the archive format
which are purely internal.  Those other approaches have therefore been
rejected.

So, the arch design process is very much on top of the issues you are
fretting over.   We're way ahead of you, so to speak.


    > What are my requirements and problems?  
    > I wish I could easily give a list, and make this problem easy to solve
    > but I can't.  The best I can do is think things through one piece
    > at a time and look.  At least for now.

    > The open questions I have are for making my decision are:

    > 1) Is there truly a benefit to the binary data structures used by
    >    xdelta, svn, and talked about in several academic papers.

    > 2) Are the semantics of arch rich enough to keep it from running
    >    into a wall I care about in the future?

You might want to spend some time on two other questions:

    3) could arch be upwardly compatably modified to make effective
       use of one of the "binary data structures" you refer to?
       In fact, could this be done so that such structures interoperate
       fully with the existing archive formats?

       (The answer is "yes" but if you are really interested in these
        questions you should try to figure out for yourself _why_ the
        answer is yes.)


    4) is it really worth the trouble to implement such formats?

       (This is an open question.   In my opinion, the answer is 
        "probably not," but then my opinion has varied over time on
        that question.)

-t





reply via email to

[Prev in Thread] Current Thread [Next in Thread]