[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Chicken-hackers] On Hash Collisions (28C3)

From: Peter Bex
Subject: Re: [Chicken-hackers] On Hash Collisions (28C3)
Date: Sun, 1 Jan 2012 16:04:50 +0100
User-agent: Mutt/

On Sun, Jan 01, 2012 at 09:57:15AM -0500, John Cowan wrote:
> Peter Bex scripsit:
> > Agreed.  Changing datastructures is not an option.
> Why not?

Because not fixing hash tables makes them into another footgun.
Also because fixing them transparently is simple and has almost
zero overhead.

> It's a matter of replacing a datastructure whose worst-case
> cost is quadratic with one whose worst-case cost is linear.

Yes, and doing it in *every* *freaking* program.  Including third-party
libraries written long ago or by people assuming a sane srfi-69
implementation (or more likely, not having thought about it).

And you don't want to unneccessarily go change every external
library you import.

> Do we even know what the break-even point is between a-lists with their
> higher algorithmic complexity vs. hash tables with their higher constant
> factors?  I tried writing a little benchmark to determine this, but so
> far without success.

That's besides the point here.

> > The patch ensures no two hash tables use the same offset, so even
> > if the hash value of a given table were to leak out (which the user
> > still should take every precaution to avoid!), it won't compromise the
> > randomization of other hash tables.
> What you are doing is imposing an additional cost on *all* users of hash
> tables so as to make them more resistant to malicious attack.  The vast
> majority of hash tables do not need to deal with malicious data.

There is almost no cost; just one call to get a random number when
creating the hash table and one extra xor when hashing a value.
The user doesn't need to do anything; it's completely transparent.

> > Indeed.  If other languages have fixed it transparently, so can we.
> We aren't Perl or Python, where hash tables are deeply embedded in
> the implementation.  Using a hash table in Scheme should always be an
> informed choice, not a default one.  Right now the information is lacking.

Whether it is a correct choice or not depends on the application.  We're
not talking about an application, we're talking about the library.

"The process of preparing programs for a digital computer
 is especially attractive, not only because it can be economically
 and scientifically rewarding, but also because it can be an aesthetic
 experience much like composing poetry or music."
                                                        -- Donald Knuth

reply via email to

[Prev in Thread] Current Thread [Next in Thread]