gnu-arch-users
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Gnu-arch-users] GNU Arch 2.0 -- first source


From: Thomas Lord
Subject: Re: [Gnu-arch-users] GNU Arch 2.0 -- first source
Date: Sun, 10 Jul 2005 16:23:31 -0700

On Sun, 2005-07-10 at 17:45 +0400, nick wrote:

> I have tried 'revc'.
> But I have some question.
> Why changed files have stored in revc archieve as full content?
> Tla in such case have less space consume.
> I have project with small number of big text files that modify in every 
> commit and revc doesn't seems resonable for me.
> Maybe try to resolve space consuming problem while revc is alfa stage?

You are talking about the on-disk format of local archives.

In that format, each changed file in a commit is stored as a zipped copy
of the modified form of the file.

This storage format optimizes random access to individual files within
arbitrary revisions.   That's valuable because if individual file access
is very fast, many higher-level operations can be implemented in simple
ways yet also be fast.

In tla archives, in contrast, a changed text file is stored as a stream
of `diff' output, relative to the ancestor, stashed in a tar bundle and
compressed.

In general -- tla archives have a big space advantage when dealing with
text files; especially (as you point out) very large text files,
modified often, changing only a few lines at a time.

Crudely put, the main answer as to why revc is more space hungry is
that, at the scales of use cases of greatest interest, nobody is
(sanely) keeping score about this difference in space consumption.

I'll say it differently: a modern revision control system can and
probably should gleefully consume local disk space at rates 10x..1000x
greater than, for example, CVS if, in return for doing so, user benefits
can be realized.

Even at that stepped-up rate of space consumption, the cost-of-operation
of a modern system will still be less than the relative cost for
operating CVS in it's heyday.

tla revision libraries are a good hint that this analysis is right:
afaict, plenty of users like me have revlibs that just grow and grow
and include essentially every revision i've got archived or mirrored.
Revision libraries cost more per revision than revc commits yet provide
pretty much the same benefit.

Other good hints are monotone and git -- we crossed the line at which
sub-file delta-compression ceases to be important in the common cases
some time ago.

-t






reply via email to

[Prev in Thread] Current Thread [Next in Thread]