[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: cvssync (was Re: [Monotone-devel] Re: big repositories inconvenience
From: |
Markus Schiltknecht |
Subject: |
Re: cvssync (was Re: [Monotone-devel] Re: big repositories inconveniences (partial pull?)) |
Date: |
Fri, 08 Sep 2006 11:25:38 +0200 |
User-agent: |
Thunderbird 1.5.0.5 (X11/20060812) |
Hi,
Please excuse the longish mail. I got carried away a little with two or
three thoughs...
Christof Petig wrote:
Now I have to come up with a coding for push certificates (which, in the
past were a simple xdiff to a specified .mtn-sync-cvs file). And I have
to think about flagging a revision as synched (a changed attribute might
still indicate that this revision is synched).
I don't want to attach another certificate to each and every revision
(which it would easily gain if certificates flag synchronisation).
Hm.. that makes me think again about how certificates are stored. I
know certs can store arbitrary texts, but this cert only stores a flag,
i.e. having the cert vs not having it would be enough. Other certs
change their text values only very seldom or only parts of it.
To understand how certs are stored, I took a look at schema.sql and found:
CREATE TABLE revision_certs
(
hash not null unique, -- hash of remaining fields separated by ":"
id not null, -- joins with revisions.id
name not null, -- opaque string chosen by user
value not null, -- opaque blob
keypair not null, -- joins with public_keys.id
signature not null, -- RSA/SHA1 signature of "address@hidden:val]"
unique(name, id, value, keypair, signature)
);
Now, I understand most of it, only what are 'remaining fields'?
(Likewise in manifest_certs and public_keys)
I was thinking about delta-compressing cert values, but it gets clear
that can't be done that easily. I.e. one would need to choose a good
base cert to delta-compress uppon. 'Good' meaning one which is from a
revision in the same branch, which gives good compression and which is
close to being a base (not nested too deeply with delta compression).
How about only using compression? (Or is the cert value already compressed?)
To get humble and more real now: is this an issue at all? (Except for
CVS revision info which should better be stored at other places.) If
not, at least I understand monotone better, now ;-)
Another thought I had was using some sort of 'inverted indexes' to store
'flag-certs' (which don't have a value, but are boolean in the sense
that attached = true, missing = false), i.e.:
flag cert 'PUSHED' is attached to revisions A, C, D and E,
flag cert 'COMPILES_CLEANLY' is attached to revisions A, B, C and E
but at least that would also need to take into account the keypair, so
it would look more like:
flag cert 'PUSHED' with key 'address@hidden'
is attached to rev A, C and E
flag cert 'PUSHED' with key 'address@hidden'
is attached to rev B
etc..
And it would not be trivial to implement such an inverted index in
sqlite. (Performance problems, as soon as you have lots of revisions to
store).
Regarding the CVS information again:
Nathaniel Smith wrote in another mail:
> E.g., if monotone's tree has ~1800 files, and if it were created by
> importing from cvs, writing down such a cert would take on the order
> of 64kB. (Calculated by 'mtn ls known | wc -c' to get filename
> lengths, plus some fudge for the version numbers.) Certs are not
> delta compressed nor, in the current implementation, even gzipped.
> My database has ~7000 revisions in it. If every revision in it had
> such a cert on it (again, as if it were imported from CVS), then that
> would come to ~450 megabytes of certs, so almost 7 times more data
> than the entire history combined.
The filenames of all files are already stored in the manifest of the
revision, right? Why not cut them from the calculation above and only
store RCS versions in the cert, in the same order as the files appear in
the manifest. I.e:
sample manifest:
format_version "1"
dir ""
dir "fs"
file "fs/readdir.c"
content [f2e5719b97...]
file "fs/read_write.c"
content [fe238a9d34...]
CVS revision cert:
1.6
1.1
That would reduce the amount of CVS history data stored per revision to
the minimum required.
Or even better: use the manifest to store that information... AFAICT
manifests are delta compressed and store the filenames and file
revisions anyway. Why not store 'origin VCS' information from imports
there? Per revision that would be, looks like a much better fit for
other VCS like svn and git, too. I.e:
sample manifest:
format_version "1"
dir ""
dir "fs"
file "fs/readdir.c"
content [f2e5719b97...]
RCS_rev "1.6"
file "fs/read_write.c"
content [fe238a9d34...]
RCS_rev "1.1"
CVS_server_path ":pserver:address@hidden:/foo/cvsroot"
CVS_module "monotone"
CVS_revision_date_start "02/07/1997 13:41:05"
CVS_revision_date_end "02/07/1997 13:41:12"
CVS_revision_conflicts_with [e6a903d31...]
That would suffice to store all the information known at import time. Of
course this can not be changed later on, but for most CVS imports that's
not necessary.
Regards
Markus
- Re: cvssync (was Re: [Monotone-devel] Re: big repositories inconveniences (partial pull?)), Christof Petig, 2006/09/08
- Re: cvssync (was Re: [Monotone-devel] Re: big repositories inconveniences (partial pull?)), Nathaniel Smith, 2006/09/08
- Re: cvssync (was Re: [Monotone-devel] Re: big repositories inconveniences (partial pull?)), Christof Petig, 2006/09/08
- Re: cvssync (was Re: [Monotone-devel] Re: big repositories inconveniences (partial pull?)),
Markus Schiltknecht <=
- Re: cvssync (was Re: [Monotone-devel] Re: big repositories inconveniences (partial pull?)), Daniel Carosone, 2006/09/08
- Re: cvssync (was Re: [Monotone-devel] Re: big repositories inconveniences (partial pull?)), Daniel Carosone, 2006/09/08
- Re: cvssync (was Re: [Monotone-devel] Re: big repositories inconveniences (partial pull?)), Markus Schiltknecht, 2006/09/08
- Re: cvssync (was Re: [Monotone-devel] Re: big repositories inconveniences (partial pull?)), Christof Petig, 2006/09/08
- Re: cvssync (was Re: [Monotone-devel] Re: big repositories inconveniences (partial pull?)), Markus Schiltknecht, 2006/09/08
- Re: cvssync (was Re: [Monotone-devel] Re: big repositories inconveniences (partial pull?)), Daniel Carosone, 2006/09/08
- Re: cvssync (was Re: [Monotone-devel] Re: big repositories inconveniences (partial pull?)), Markus Schiltknecht, 2006/09/09
- Re: cvssync (was Re: [Monotone-devel] Re: big repositories inconveniences (partial pull?)), Thomas Moschny, 2006/09/08
Re: cvssync (was Re: [Monotone-devel] Re: big repositories inconveniences (partial pull?)), Daniel Carosone, 2006/09/08