monotone-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Monotone-devel] Re: netsync status


From: graydon hoare
Subject: [Monotone-devel] Re: netsync status
Date: Tue, 24 Feb 2004 10:00:37 -0500
User-agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.6b) Gecko/20031205 Thunderbird/0.4

Asger Ottar Alstrup wrote:

Do you have an indication of the overhead when synching? If X bytes are
different in a repository of size Y, how much data is transferred?  I
understand that you can not give an accurate formula, but I hope it is a
linear function in X only, and that the constant factor in front of X is
less than 2.

it is not just a function of X, but it is close to that. I will explain the protocol and you can see where the overhead is.

  - the hashed index exchange happens only over manifest certs and keys.
    after that the manifest certs imply an ancestry graph, the ancestry
    graph implies a manifest data / delta structure, each manifest edge
    implies a file data / delta structure. so once the keys and certs
    are exchanged there is no more overhead, just streaming requests
    and responses.

  - the manifest cert synchronization will probably be the only source
    of "increasing" overhead with time. it will be bounded by something
    like (forgive the inaccuracy, I haven't done detailed analysis):

         ~ log_B(K) * N

    where N is the number of certs eventually found missing (and thus
    transmitted), K is the size of your cert set, and B is a tunable
    branching factor serving as the log base, currently set to 16.

    two further qualifications should be made: the number N should
    really be smaller since each path of length L through the hashed
    index has N*(B^-L) probability of sharing a prefix with another
    path -- or some such factor -- and the overall *size* of each index
    node, in bytes, varies with the load of the tree, though the formula
    for that is proving a bit uglier than I feel like working out in
    the first email of the morning.

  - note that this only scales with the number of *manifest* certs, so
    it is really scaling with about 4 * number-of-change-sets-in-branch,
    which for most practical users will mean no more than 4 (or 5
    if you're talking about the linux kernel) index exchanges per
    missing element. each index node varies between about 50 and 400
    bytes (depending on load) so in the worst realistically imaginable
    case, with these branching settings, it could cost say 1.6k of sync
    traffic to pick out a missing node in amongst a quarter million
    changesets (which -- ballpark -- is about 75mb of certificates).
    but you'd only see that if there were only a couple missing nodes;
    if there are "lots", the number of shared prefixes will rise and the
    efficiency will improve a bit.

  - in practical terms: I just tested a netsync of a change to monotone
    against an HTTP post of the same change (using old packets) and the
    netsync used fewer bytes, despite including a synchronization of
    196-cert collection. the netsync encodings are all a bit tighter
    than those used for packets.

What about security? Can you encrypt the data transferred?

not at the moment, but it's not beyond reason. it does authenticate the peers connecting, with RSA signatures on nonces, and calls a lua hook to evaluate their read/write access to the collection they're syncing. if you want "transport" encryption, you could also just tunnel the connection over SSH. should work fine.

-graydon




reply via email to

[Prev in Thread] Current Thread [Next in Thread]