monotone-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Monotone-devel] Re: Hash collisions resiliency


From: Bruce Stephens
Subject: [Monotone-devel] Re: Hash collisions resiliency
Date: Wed, 13 Apr 2005 23:39:53 +0100
User-agent: Gnus/5.11 (Gnus v5.11) Emacs/22.0.50 (gnu/linux)

William Uther <address@hidden> writes:

>    Couldn't you just have a table of hashes in the database.  When
>    you add a new hash, you add it to the table.  If it is already
>    there, *THEN* you do a full-text check of the two files with the
>    matching hashes. 

Which we already have, of course (which is the whole point).  There
are several tables, but that's OK: it's not a problem if a file
version happens to have the same hash as a changeset, I think?

>    If the two files actually are the same, then you're ok.  If the
>    two files are different, you have a problem, *and* you have
>    detected it.

I suspect the problem is that things with the same hash as something
that's already there are very common: most commits change a few files,
and you know which files have changed by checking their hashes.  And
checking the contents instead is likely to be expensive enough that
you wouldn't want to do it.

I guess you could compute a different hash on checkout and store that
somewhere, and use that as a second check (rather than constructing
contents).  That would work for unchanged files, and then any other
collisions would be unusual enough that constructing and comparing
contents wouldn't be so bad.  

Or just storing length as well as the hash; that might be useful
anyway.  (So you'd detect hash collisions, so long as you didn't get a
hash+length collision.)




reply via email to

[Prev in Thread] Current Thread [Next in Thread]