gnu-arch-users
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Gnu-arch-users] Storage efficiency of revlibs


From: Mikhael Goikhman
Subject: Re: [Gnu-arch-users] Storage efficiency of revlibs
Date: Mon, 12 Dec 2005 16:34:39 +0000
User-agent: Mutt/1.4.2.1i

On 12 Dec 2005 09:11:12 +0100, Ludovic Courtès wrote:
> 
> For a single revision, tar+gz will obviously always be better than no
> compression at all.  Is this what you mean?

Nope, this would be too trivial to mean. I meant that you possibly came
to the conclusion too early. You said your revlib compression ratio is 4
or 8. I asked you to show a sample ratio of cacherev of patch-300 versus
revlib tree of the same revision. So we may compare these two ratios and
judge whether it is really correct to conclude for your project that
"revlib consumes slightly less disk than cacherev per _every_ revision".

> [...]
> 
> Right, but without knowing the actual implementation, it seems quite
> relevant to consider that the size of meta-data is much lower than the
> size of data in general (assuming an average file size of 8 KiB).
> 
> [...]
> 
> Right.  But again, it doesn't seem too stupid to consider this overhead
> negligible compared to the gain.

Why to assume and consider when we can speak about real numbers? :)

If anyone still thinks that the current arch revlib is good to be defined
by default for everyone, he may repeat this experiment.

Start with empty greedy revlib (sparse or not). Touch 1000 empty files
and import them to a local archive. Then repeat this in a loop:

  tla commit -s ''; tla changes; find ../arch-revlib | wc -l

One million of files will be reached on patch-442 (it may happen much
earlier if real non-empty changesets are commited). And what is the size
of revlib that includes nothing but hardlinked zero-sized project files?

  % find *.txt | wc -l
  1000
  % find . | wc -l         # 1000 + 1000 ids + 443 patch logs + dirs
  2460
  % du -s --block-size=1 .
  7524352
  % ls -s --block-size=1 .../testproject--devel--0--patch-442.tar.gz
  69632

  % find ../arch-revlib | wc -l
  1001616
  % du -s --block-size=1 ../arch-revlib
  386289664
  % du -sl --block-size=1 ../arch-revlib
  3087196160

My FS is formatted for 4Kb per node, roughly double all sizes for 8Kb.
So, anyone who wants to work with managable trees of 7Mb will soon end
up with unmanageable trees of million of files and 400Mb.

I think these numbers are the most optimistic possible. For the real life
changesets the revlib is even larger, and the revlib sharing is weekier
(most of files here are just hardlinked 443 times). Share your numbers.

Regards,
Mikhael.




reply via email to

[Prev in Thread] Current Thread [Next in Thread]