Re: [Monotone-devel] Re: Support for binary files, scalability and Windo

monotone-devel

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Monotone-devel] Re: Support for binary files, scalability and Windo

From:	graydon hoare
Subject:	Re: [Monotone-devel] Re: Support for binary files, scalability and Windows port
Date:	Sat, 17 Jan 2004 01:11:14 -0500
User-agent:	Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.6b) Gecko/20031205 Thunderbird/0.4

Ori Berger wrote:

I had been thinking about this as well


there are some things in here I'm curious about, some I disagree with..

First, it's possible to use a suffix tree to locate repeating parts.It's possible to implement much, much faster than rsync's algorithm

rsync's algorithm is a 2-level thing which works when only one source isavailable and you only have an [adler32,md4] tuple list from the otherguy; xdelta's "modification" is to drop that assumption and extendadler32-matched windows forwards and backwards by directly comparingsource and destination regions. cost per byte in that phase is 1compare. the cost per byte when scanning for matching windows is 1 add,1 subtract, a mask, and a hashtable lookup (the expensive part). I don'treally see how or why a suffix array would find anchor regions anyfaster. wouldn't the cost per byte be a partial (failing) suffix arraylookup? can you elaborate?

<http://www.cl.cam.ac.uk/~cpk25/libstree/>. This implementation doesn'tuse persistent storage, but if it did, then it would take more or lesszero time to locate _all_ sources for a file almost immediately.

I don't know what you mean by "all sources for a file". could you bemore specific about where you mean to use a suffix array, and how ithelps with a particular bit of retrieval or storage? the way I see it, apersistent suffix array makes the on-disk representation of a stringconsiderably larger, without improving the xdelta problem (which is:find all common regions between these two strings)

Find a way to partition files into blocks, in such a way that:
1. Block lengths are reasonable (definitions later)
2. The block boundaries are likely to be unchanged by smallmodifications - i.e. an insertion of 1 byte somewhere should have smallprobability of changing the boundary area, smaller probability ofchanging two boundaries, .... vanishingly small probability of changingall boundaries.
Now, given a file, break it into blocks; Store each block independently,with its own hash; Store a manifest that says how to rebuild a file fromits blocks. And that's it.

yes, I've been thinking something somewhat along these lines (fordealing with the file storage issue, anyways). I would make a fewmodifications to this scheme, though.


  - since this scheme doesn't involve coalescing or extending matching
    blocks (like xdelta does) the blocks need to be larger than you're
    thinking. a block identified by a SHA1 needs to be a few hundred
    bytes at minimum. the SHA1 itself is 20 bytes (binary) and as I
    will note momentarily I think adding extents will help, so a
    reference to a block extent will likely be 20 + 8 + 8 = 36 bytes
    (sha1,off,len).

  - for inserts smaller than minimum block size, it is better to just
    put an "insert len [literal data]" marker in the script, rather
    than a reference to a tiny block fragment. if the sum of all
    the literals in a script is larger than block size, you can
    concatenate them and push that one block down to the block layer,
    replacing the literals with references.

  - for deletes less than the size of the block containing the delete,
    or any sort of insert in the middle of a block, your script will
    benefit from being able to denote extents within blocks, rather than
    all-or-nothing references to blocks.

  - I'm still not precisely sure how to accomodate searching for
    matching blocks or sub-block extents between a new file and "the
    entire database". it's possible your thoughts on suffix arrays
    will come in handy here; otherwise I am thinking a simpler
    algorithm:

           - when the file is "new" to the database (has no previous
             matching manifest entry) split it up into new blocks.

           - when a file is being "patched", run fine-grained xdelta
             to locate matching extents with a table containing the
             adler32s of all small windows within all blocks of the
             previous version of the file

           - after each xdelta-like operation, check to see if there are
             enough inserts to coalesce into a new block

    that technique is really nothing more than extending xdelta to work
    over block-structured files, which is (as I'll get to) advantageous
    since it permits Very Large Files (~2 ** 40 or so) and also makes
    network sync work.

    hashing Very Large Files would still be a bit of a bitch, though.

The blocks could be identified by their SA1, and the manifest couldeither be file-specific, or the standard manifest extended to include apart specification, e.g.

hmm, no, not changing the manifest format. the file remains identifiedby its SHA1. if you don't like SHA1, substitute some other functionwhich makes an identifier given an input string, but if the filesystemcan maintain the notion of a "file" using inodes, so can manifests.

as far as I'm concerned, I'd keep doing what I'm doing now, reuse theblock + delta storage system for *storing* manifests, too. they're data too.

This also interoperates nicely with network stuff - To fetch a versionfrom a depot, you fetch the manifest, and then fetch all sha1 blockslisted in the manifest, that you don't already have. As simple as that.

hm, for the network I have a broader idea: use hash trees over theentire space of SHA1 to synchronize my collection + your collection intothe union of both (on both ends). with some special accounting to managesingletons and tombstones, and a good spread factor, it's very efficientto synchronize hash trees, and I'd use the exact same scheme to sync thecollection of blocks, the collection of manifests, the collection offiles, the collection of keys, and the collection of certs.

I'm cooking up a little proof of concept for this scheme now. it'll be afew days before I have anything working, but I'm pretty excited about it.

There's no need to have a staging queue for the depot.

indeed. removing queues and all the associated logic is something Iwould very much like.

About NNTP, email, "dumb web" distribution - all you have to do isrecord, for each block whether or not it was sent to a specificdestination.

no, I think I'd just remove these things altogether, let people syncbetween databases directly as the primary mode of operation.

* Requires more storage and /or bandwidth; A one
  byte change in a 100MB file would cost ~500KB with this
  scheme, and ~ ~100B with the current delta scheme.


nah, it's not quite so bad. a 100mb file will have something like 10

blocks of 10mb each. the script for building it will be at minimum 320bytes long (though I'd prefer a human-readable script form, so more like500 bytes). inserting 1 byte will add perhaps 10 bytes to the script,plus require transmission of a new script: 510 bytes total. deleting 1bytes will probably split 1 extent reference into 2, adding maybe 50bytes: 550 bytes total.

for a heavily edited file, it'll slowly get worse, but maybe you couldhave a "defragment" routine which builds some fresh blocks (especiallyif a bunch of blocks appear with refcount=1; might as well toss them)

* Latest version, unless properly cached, will take longer to
  construct (need to pull all blocks, which will be scattered
  in the database). And proper caching costs space....


true. this could get to be a noticable cost. again, defragmentation is

a possibility, or else just building an LRU file cache in the db. I'mnot adverse to that.


-graydon

[Prev in Thread]

Current Thread

[Next in Thread]

[Monotone-devel] Support for binary files, scalability and Windows port, Asger Ottar Alstrup, 2004/01/12
- [Monotone-devel] Re: Support for binary files, scalability and Windows port, graydon hoare, 2004/01/12
  - [Monotone-devel] Re: Support for binary files, scalability and Windows port, Asger Kunuk Ottar Alstrup, 2004/01/15
    - Re: [Monotone-devel] Re: Support for binary files, scalability and Windows port, Zbynek Winkler, 2004/01/15
    - [Monotone-devel] Re: Support for binary files, scalability and Windows port, graydon hoare, 2004/01/16
    - Re: [Monotone-devel] Re: Support for binary files, scalability and Windows port, Ori Berger, 2004/01/16
    - Re: [Monotone-devel] Re: Support for binary files, scalability and Windows port, graydon hoare <=
    - Re: [Monotone-devel] Re: Support for binary files, scalability and Windows port, Nathaniel Smith, 2004/01/17
    - [Monotone-devel] Re: Support for binary files, scalability and Windows port, graydon hoare, 2004/01/19
    - Re: [Monotone-devel] Re: Support for binary files, scalability and Windows port, Zbynek Winkler, 2004/01/19
    - Re: [Monotone-devel] Re: Support for binary files, scalability and Windows port, Ori Berger, 2004/01/18
    - Re: [Monotone-devel] Re: Support for binary files, scalability and Windows port, Zack Weinberg, 2004/01/18
    - [Monotone-devel] Re: Support for binary files, scalability and Windows port, graydon hoare, 2004/01/19
    - [Monotone-devel] RE: Support for binary files, scalability and Windows port, Asger Kunuk Alstrup, 2004/01/18
    - [Monotone-devel] Re: Support for binary files, scalability and Windows port, Peter Simons, 2004/01/18
    - [Monotone-devel] Re: Support for binary files, scalability and Windows port, graydon hoare, 2004/01/19
    - [Monotone-devel] RE: Support for binary files, scalability and Windows port, Asger Kunuk Ottar Alstrup, 2004/01/19

Prev by Date: [Monotone-devel] Re: Re-creating a depot?
Next by Date: [Monotone-devel] dumb servers
Previous by thread: Re: [Monotone-devel] Re: Support for binary files, scalability and Windows port
Next by thread: Re: [Monotone-devel] Re: Support for binary files, scalability and Windows port
Index(es):
- Date
- Thread