monotone-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Monotone-devel] Re: database schema questions


From: Derek Scherger
Subject: [Monotone-devel] Re: database schema questions
Date: Wed, 21 Apr 2004 19:33:55 -0600
User-agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.6) Gecko/20040225

graydon hoare wrote:

this is a very important and rarely-stated aspect of monotone's design: file and manifest storage is *unrelated* to history. history is a description of what happened, in terms of certificates which point to SHA1 values. the method of constructing the data for a given SHA1 value is *completely independent* of that history.

ahh... I had missed that

yes, but that would consume more bytes than the current form. in some cases quite a few more bytes. it is important to note that we store manifests as *gzipped* text. this means we get free statistical data compression. look at a large manifest (say gcc: about 900k). most of the bytes on most of the lines of that manifest are identical path prefixes:

I was expecting that the file names might be encoded and possibly compressed as well, but "compressing" single names would probably make them longer.

I'm curious, how does monotone perform on something this large? I've been wondering when I do a monotone diff on a large tree how bad computing lots of sha1 id's would be. I assume that diff/status must recompute the hash of every file and compare them with the value in the manifest.

now, we could possibly achieve the same (or even better) compression result "explicitly" at the storage level by storing directories as first class citizens, and putting files as leaves in directories, and chaining through the directory table to get to a file.. but that adds a lot of complexity. I didn't want to bother.

No, the way you have it now has some definite appeal over handling directories explicitly. It's simple and it works!

 - merging is more difficult than SQL union in all but the most trivial

I did expect that this was over simplifying things!

   cases, so you wind up having to construct a memory representation
   anyways. diff_patch.cc::merge3 is one of the largest and most
   complex functions in monotone (300 lines), and that's the simplified

You're doing pretty well if that's your worst example of complexity I think!

 - my representation of "a file" at the database level is really not
   compatible with the notion people have of "a history line". storage
   delta chains can be broken, or mis-ordered, or whatever. they don't

This might take a bit of time to sink in.... ;)

if you want to conduct an experiment and try rewriting part of the schema in terms of explicit directories and manifests-as-tuples, I won't stop you. it's possible it would be a space win and maybe simplify some algorithms. I found the current arrangement beneficial though.

I was just wondering why you hadn't gone with what looked like the obvious way to store manifests. It doesn't surprise me that it didn't work out that way.

Thanks again for all the info!
--
Cheers,
Derek




reply via email to

[Prev in Thread] Current Thread [Next in Thread]