monotone-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Monotone-devel] Hash collisions resiliency


From: Nathaniel Smith
Subject: Re: [Monotone-devel] Hash collisions resiliency
Date: Thu, 14 Apr 2005 00:22:55 -0700
User-agent: Mutt/1.5.8i

On Wed, Apr 13, 2005 at 08:56:36PM -0700, tekHedd wrote:
> A good point, and oen that has been (heh) rehashed on the list over and
> over. However, there are two aspects to risk management:
> 
>   1) Knowing what the odds of failure are, and minimizing that chance
> 
>   2) Understanding what is at risk. Knowing _what_ you lose when you
>      lose), minimizing the loss, and having a plan for dealing with
>      it if it happens.
> 
> I've seen #1 exhaustively discussed, and of course we all can see that
> the risk is pretty darn amazingly close to 0. Sadly, as long as it is
> not _exactly_ 0, a discussion of failure modes is appropriate.
> 
> (Or have you never considered what would happen in the unlikely
> situation where you are driving your family across town and your car is
> in a terrible accident? If you have, you may have purchased something
> called insurance.) (Is it just me, or are these analogies getting
> gruesome? :o )

It is worth keeping in mind the orders of magnitude here.  Hash
collisions are not only less likely than that car crash, they are also
less likely than a simultaneous car crash and your insurance company
(and all its government backers, etc.) collapsing simultaneously --
do you have meta-insurance for that?  (And who _do_ you buy it from?
:-))

It's also worth remembering that hashes play a pretty critical role in
all sorts of crypto applications, and anyone who can collide a hash
has a decent chance of being able to break into the machine hosting
any other VCS's repository and inserting or removing whatever data
they like there.

That said:

> Once a (hypothetical) hash key conflict is actually detected, it can be
> solved by switching to a (hypothetical) new algorithm. I am cool with
> this. It is good. The unanswered question is: what damage can occur
> before the conflict is detected? I would love to have these questions
> answered:
> 
>   1) How long can you run with a duplicate key before monotone will
>      simply refuse to work?
> 
>   2) How much damage could this cause? If undetected for some time (and
>      the program doesn't crash), would more damage follow?

Detecting a problem: this depends on what collides.  If it's a merkle
node, you will notice when you realize you are mysteriously missing
stuff that the other side was supposed to send you.  (But this will
eventually fix itself, as new items enter the database, and change
the values being hashed.)  If it's two manifests or two revisions,
then you will probably notice immediately, because there are strong
consistency constraints on this -- applying a changeset to the old
manifest has to give the new manifest, etc. -- which monotone checks.
If it's two files, then monotone will may notice the next time
the file in question is changed, since it will download a delta
against the old version, apply it to what it incorrectly believes to
be the old version, and discover that the result is nonsense.  It's
possible this wouldn't happen, though, since with common hashes
colliding blocks can generally have arbitrary identical data appended
to them and stay colliding, so it would depend on quirks of xdelta...
sorry, anyway.

In summary, you don't have any guarantee that monotone will notice.
It is highly likely in some accidental collision cases, less so if
someone is maliciously attacking and can create a hash collision with
properties that they want.

Understandably, this makes some people nervous -- these consequences
aren't fun, and even if you _know_ that 1/2^80 is ridiculous and
impossible, our minds really aren't set up to deal with that kind of
number.  (Perhaps because the way we think about probabilities,
anything that's very very unlikely is thought of as "impossible" --
suddenly gaining the ability to fly, for instance, is a
low-but-nonzero probability event, but we don't think of it that way.
For most purposes, we really define "possible" to mean "high enough
probability to need to deal with".  Therefore, when we hear that
mathematically hash collisions are possible but extremely unlikely, we
latch onto the "possible" part, and can't help but think about them as
if they were much more likely than they actually are...)

Monotone is designed for humans; we're non-judgemental about this kind
of thing.  We think that the benefits of using hashes far outweigh
the negatives, but hey, if it makes you nervous, that's a bug too and
worth trying to compensate for.  Here's on idea: we used to have
something called "vcheck" certs, that contained a hmac of the data
they were attached to.  The idea is that each vcheck cert on a
revision gives you another 2^80 factor of protection, so you can
improve things to whatever your safety limit is.  (They also would
make things much harder for an attacker who can collide hashes.)

They got ripped out a few months ago, when I was cleaning up old
pre-0.15 cruft, because the implementation didn't work and depended on
other cruft that was being removed; but they could certainly be
re-added, and in fact it's on the todo list.  I probably won't get
around to re-implementing these any time too soon, though; patches
welcome.  (Anyone interested in adding this should take a look at
tests/t_vcheck.at, which is a placeholder test with a lot of notes on
how to do this.)

-- Nathaniel

-- 
Eternity is very long, especially towards the end.
  -- Woody Allen




reply via email to

[Prev in Thread] Current Thread [Next in Thread]