chicken-hackers
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Chicken-hackers] [PATCH] Proper fix for hash collision attack [Was: Re:


From: Peter Bex
Subject: [Chicken-hackers] [PATCH] Proper fix for hash collision attack [Was: Re: On Hash Collisions (28C3)]
Date: Wed, 4 Jan 2012 22:16:04 +0100
User-agent: Mutt/1.4.2.3i

On Sun, Jan 01, 2012 at 01:41:26PM +0100, Peter Bex wrote:
> That's confirmed.  Attached is a preliminary patch to fix this issue.

This patch was indeed *very* preliminary and provided almost zero
protection.  That's because:

- When a hash collision is found in the *underlying* algorithm's
   output as opposed to the output of a call to (hash ...), no amount
   of the xoring that HASH will help, the bucket will still be the same.
   To fix this properly, the random perturbation factor needs to fed
   into the hashing function itself as an initial "seed" value.
- A special case of this is when strings of different lengths containing
   just "NUL"-characters are hashed, they all map to the same value
   in almost all algorithms.  Again, xoring afterwards won't help so
   the perturbation factor must be used as an initialization value.
- The underlying hash implementation itself was extremely weak; it
   provided almost no dispersion.  When given strings with shared
   suffixes of 8 characters or more, it *always* caused a collision.
   This is also bad for non-malicious cases as many identifiers share
   longer suffixes (think -lambda*, -values*, -output-port etc).

   A randomization factor doesn't help perturb the output in this
   case, even if it was used as a seed value since after 8 steps
   all randomization would be pushed out of the register (on 32 bits).

So the proper fix entails modifying the C functions to accept an
extra "seed" argument (the random value).  Unfortunately this causes
an API-incompatibility, something I tried to avoid in the original
patch (a fundamentally flawed approach).  I worked around this by
adding new functions and deprecating the old ones.

For the interested, you can find a good introduction here:
http://www.eternallyconfuzzled.com/tuts/algorithms/jsw_tut_hashing.aspx
including a link to a survey of existing hashes (and implementation of
a very good but extremely hairy and big hashing implementation):
http://burtleburtle.net/bob/hash/doobs.html

I chose to use a shift-add-xor function because it's pretty simple to
understand and reasonably well-studied, for instance in the 1997 paper
"Performance in Practice of String Hashing Functions":
http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.18.7520
It's also what Perl's hash function basically does.

> If you'd like, you could help out testing this.  The patch ensures
> no two hash tables use the same offset, so even if the hash value of
> a given table were to leak out (which the user still should take every
> precaution to avoid!), it won't compromise the randomization of other
> hash tables.

This has not changed for SRFI-69 hash tables; in the new patch they
also get an individual value.

> It has been pointed out that symbol interning also uses a hash table
> which is not based on SRFI-69.  This needs to be fixed as well.
> This patch does not address that.

Attached is a proper, hopefully final patch that addresses this too.
It's also pushed to git in the sjamaan-pending branch, as changeset
05047644ee380f9d80c0c497641011c29fee610a.
Besides SRFI-69 and symbol table hashes this also changes the
so-called "low-level" hash tables used throughout the compiler and
for property lists (used, among other things, by the macro expander
and reader macro registration).

I had initially overlooked the symbol table hashes because
they're strewn about the source.  One thing this patch does is move
the hashing algorithm itself into one function, which is called by
all the others.  This should also make it easier to experiment with
different hashing algorithms (like those listed in the survey paper
above).

What we could still use is a better testsuite (maybe one that actually
looks for real collisions, not just the quick 'n dirty check for the
pathological cases the original implementation had) and some benchmarks.
I'll leave that for someone else who feels like it :)

Cheers,
Peter
-- 
http://sjamaan.ath.cx
--
"The process of preparing programs for a digital computer
 is especially attractive, not only because it can be economically
 and scientifically rewarding, but also because it can be an aesthetic
 experience much like composing poetry or music."
                                                        -- Donald Knuth

Attachment: 0001-Provide-protection-against-algorithmic-complexity-at.patch
Description: Text document


reply via email to

[Prev in Thread] Current Thread [Next in Thread]