[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Chicken-hackers] [PATCH] Proper fix for hash collision attack [Was: Re:
From: |
Peter Bex |
Subject: |
[Chicken-hackers] [PATCH] Proper fix for hash collision attack [Was: Re: On Hash Collisions (28C3)] |
Date: |
Wed, 4 Jan 2012 22:16:04 +0100 |
User-agent: |
Mutt/1.4.2.3i |
On Sun, Jan 01, 2012 at 01:41:26PM +0100, Peter Bex wrote:
> That's confirmed. Attached is a preliminary patch to fix this issue.
This patch was indeed *very* preliminary and provided almost zero
protection. That's because:
- When a hash collision is found in the *underlying* algorithm's
output as opposed to the output of a call to (hash ...), no amount
of the xoring that HASH will help, the bucket will still be the same.
To fix this properly, the random perturbation factor needs to fed
into the hashing function itself as an initial "seed" value.
- A special case of this is when strings of different lengths containing
just "NUL"-characters are hashed, they all map to the same value
in almost all algorithms. Again, xoring afterwards won't help so
the perturbation factor must be used as an initialization value.
- The underlying hash implementation itself was extremely weak; it
provided almost no dispersion. When given strings with shared
suffixes of 8 characters or more, it *always* caused a collision.
This is also bad for non-malicious cases as many identifiers share
longer suffixes (think -lambda*, -values*, -output-port etc).
A randomization factor doesn't help perturb the output in this
case, even if it was used as a seed value since after 8 steps
all randomization would be pushed out of the register (on 32 bits).
So the proper fix entails modifying the C functions to accept an
extra "seed" argument (the random value). Unfortunately this causes
an API-incompatibility, something I tried to avoid in the original
patch (a fundamentally flawed approach). I worked around this by
adding new functions and deprecating the old ones.
For the interested, you can find a good introduction here:
http://www.eternallyconfuzzled.com/tuts/algorithms/jsw_tut_hashing.aspx
including a link to a survey of existing hashes (and implementation of
a very good but extremely hairy and big hashing implementation):
http://burtleburtle.net/bob/hash/doobs.html
I chose to use a shift-add-xor function because it's pretty simple to
understand and reasonably well-studied, for instance in the 1997 paper
"Performance in Practice of String Hashing Functions":
http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.18.7520
It's also what Perl's hash function basically does.
> If you'd like, you could help out testing this. The patch ensures
> no two hash tables use the same offset, so even if the hash value of
> a given table were to leak out (which the user still should take every
> precaution to avoid!), it won't compromise the randomization of other
> hash tables.
This has not changed for SRFI-69 hash tables; in the new patch they
also get an individual value.
> It has been pointed out that symbol interning also uses a hash table
> which is not based on SRFI-69. This needs to be fixed as well.
> This patch does not address that.
Attached is a proper, hopefully final patch that addresses this too.
It's also pushed to git in the sjamaan-pending branch, as changeset
05047644ee380f9d80c0c497641011c29fee610a.
Besides SRFI-69 and symbol table hashes this also changes the
so-called "low-level" hash tables used throughout the compiler and
for property lists (used, among other things, by the macro expander
and reader macro registration).
I had initially overlooked the symbol table hashes because
they're strewn about the source. One thing this patch does is move
the hashing algorithm itself into one function, which is called by
all the others. This should also make it easier to experiment with
different hashing algorithms (like those listed in the survey paper
above).
What we could still use is a better testsuite (maybe one that actually
looks for real collisions, not just the quick 'n dirty check for the
pathological cases the original implementation had) and some benchmarks.
I'll leave that for someone else who feels like it :)
Cheers,
Peter
--
http://sjamaan.ath.cx
--
"The process of preparing programs for a digital computer
is especially attractive, not only because it can be economically
and scientifically rewarding, but also because it can be an aesthetic
experience much like composing poetry or music."
-- Donald Knuth
0001-Provide-protection-against-algorithmic-complexity-at.patch
Description: Text document
- Re: [Chicken-hackers] On Hash Collisions (28C3), Jörg F . Wittenberger, 2012/01/01
- Re: [Chicken-hackers] On Hash Collisions (28C3), Peter Bex, 2012/01/01
- Re: [Chicken-hackers] On Hash Collisions (28C3), John Cowan, 2012/01/01
- Re: [Chicken-hackers] On Hash Collisions (28C3), Peter Bex, 2012/01/01
- Re: [Chicken-hackers] On Hash Collisions (28C3), John Cowan, 2012/01/01
- Re: [Chicken-hackers] On Hash Collisions (28C3), Peter Bex, 2012/01/01
- Re: [Chicken-hackers] On Hash Collisions (28C3), Alan Post, 2012/01/01
- Re: [Chicken-hackers] On Hash Collisions (28C3), Jörg F . Wittenberger, 2012/01/01
- Re: [Chicken-hackers] On Hash Collisions (28C3), Ivan Raikov, 2012/01/01
- Re: [Chicken-hackers] On Hash Collisions (28C3), Jörg F . Wittenberger, 2012/01/02
[Chicken-hackers] [PATCH] Proper fix for hash collision attack [Was: Re: On Hash Collisions (28C3)],
Peter Bex <=