gnu-arch-users
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Gnu-arch-users] Re: libraries / cacherevs


From: Stefan Monnier
Subject: [Gnu-arch-users] Re: libraries / cacherevs
Date: 04 Mar 2004 15:35:29 -0500
User-agent: Gnus/5.09 (Gnus v5.9.0) Emacs/21.3.50

>> Why not only keep the currently-non-hardlinked files (and rely on the
>> inventory to discover that the file exists in the revision and then look
>> for older revs in the lib if you ever need to look at that file).

> There are three reasons.

> One is simply convenience.   By using hardlinks, you get a complete
> tree that all tools understand rather than only tools that understand
> the format of an arch revision library.

OK, I guess it's minor but indeed convenient at times when tla itself
fails you.

> Another is storage management.   Suppose that I have two trees
> sharing a common file.   With hardlinks, I can delete either tree and
> the other is still valid.   Without hardlinks, if I delete one of the
> trees, then the revision library no longer contains all of the files
> needed for the other.

But if the ,,index files indicates in which rev it can be found, we could
easily auto-rebuild that part of the revlib.  Also keeping this info could
be useful in that we'd know in which prev revision was a file modified (not
an indispensable piece of info, but convenient to have.  Subversion provides
it for example).

> A third is good performance without code complexity.   Naively
> implementing a user-space directory stack (which is what you're
> describing) would be comparatively slow for common operationos like
> mkpatch or get.

Why is `get' a common operation?
As for `mkpatch', if we kept a file similar to `CVS/Entries' we could discover
the "potentially modified files" first without looking at any revlib, and
then we'd only need to look for a few files in the revlib.

> It can surely be optimized but the incentives for
> doing so in this case are relatively slight.

Admittedly I don't care much for how the revlib is implemented and the
current implementation has the advantage of "transparence".

What I care about is speed and I'm not always impressed.  One case that
comes up all the time in my daily use is:

      tla file-diffs <file>

which can takes several seconds to go through the whole revlib (probably
checking its consistency or somesuch) whereas all it has to do is look for
that one file, check its arch-tag and run diff (i.e. it should be
instantaneous as is Subversion in this case).

Another case is:

      tla commit <file1>
      ...
      tla file-diffs <file2>

where <file2> might be the same as file1 or not.  The file-diffs is damn
slow because it has to rebuild a whole new revision in the library just to
find this one file which it already has in a previous revision or which it
has just committed.

Another problem I have with tla is that the full patch-log is always attached to
the tree.  Maybe it's convenient to have the whole log available offline,
but the problem of working offline is already addressed by archive mirrors
and revision libraries, so I don't think it's a good justification to spend
so much disk space and time (all the revlib operations also have to build/check
those umpteen patch-log files thingies).  Just for the record, `tla' is not
very old, but its patch-log already takes up three time the disk space
and 4 times the inode space of the actual src/tla code:

   src/tla-0% du -s \{arch\}/ .
   15200   {arch}
   21164   .
   src/tla-0% find \{arch\} -type f | wc
      1997    1997  173749
   src/tla-0% find . -type f | wc
      2532    2532  191645
   src/tla-0%

The inode space is particularly significant since sadly many/most tla
operations take time proportional to the number of inodes in a tree.


        Stefan




reply via email to

[Prev in Thread] Current Thread [Next in Thread]