monotone-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Monotone-devel] Re: Support for binary files, scalability andWindows po


From: graydon hoare
Subject: [Monotone-devel] Re: Support for binary files, scalability andWindows port
Date: Wed, 21 Jan 2004 15:44:41 -0500
User-agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.6b) Gecko/20031205 Thunderbird/0.4

Asger Kunuk Ottar Alstrup wrote:

Yes, that is of course perceivable, but that would defeat the advantage
of a truly distributed system, where a user does not have to be online
to work. I'd prefer another approach. Maybe we will set up separate
databases for different purposes - we can divide the work into sealed
compartments.

yes, I'd recommend the approach -- if possible -- of creating multiple "collections" of files. I'll briefly describe what I have in mind with the hashtrees:

each database will have a set of "collections", each of which has a hashtree associated with it. collections can share members -- they're working on the same underlying store of blocks and keys and whatnot -- but the hashtree picks out a subset and names it. each collection will also map to a set of peers which the database syncs that collection with normally (eg. when you just type "monotone sync <collection>"). one of the states in a hashtree is called "tombstone", which marks a block you do not have and do not want, but hashes to the same value as if you had the block. this makes it possible to expire blocks from your copy of a collection. of course if you un-expire the block (set from tombstone -> empty) you will fetch it on the next sync.

in answer to the previous question about distribution costs: obviously you have to send, at least once, every block you expect to exist at the other end of a connection. the hashtree is an auxiliary structure used to discover which blocks are missing from either end of a connection. as of 2 nights ago, I have a prototype working which synchronizes block collections in a pair of sqlite databases, over a TCP connection. it shows pretty good promise; the interactive protocol isn't *quite* pipelined enough, plus the encodings could use some tuning; it's at least twice as bulky as it needs to be. even now it finds and sends the 50 missing random 512-byte blocks amongst a collection of 10,000 such, with only 16k written and 130k received (including the 34.7k worth of base64-encoded data blocks). it will add a little overhead in the case of small collections, but the cost curve is a very flat logarithm of the collection size.

I'm not certain a 256-ary tree is the best fan-out though; I only chose it because it's easy to prototype in ASCII. a 16-ary tree is equally easy, so maybe I'll run through that too.. it is a subtle issue: larger nodes mean fewer round trips and less protocol chatter, but also more retransmission of hash values for unchanged portions of the tree and a bit more asymmetry in the transmit/receive load (as in this instance). I'll see if it's sufficiently easy to make the whole protocol and hashtree calculator depend on a couple template constants, and try to write it that way so we can wiggle it around and find a sweet spot.

OK. I tried with VS.NET, but did not have time to complete it. It seems
there are a couple of places where the code uses some non-standard C++
features not supported by VS.NET. Also, it uses a bunch of Unix-only
#includes.

But AFAIK it's nothing that can not be handled with a few days of work -
it should also be possible to use VS6.

I think you might find some of the fancy-pants stuff spirit does with templates will make VS6 upset. I think that compiler series really only approximates the ISO standard around version 7 (around the ".net" edition)

-graydon




reply via email to

[Prev in Thread] Current Thread [Next in Thread]