[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Chicken-hackers] SRFI-69 compatibility problem and hash table segfaults

From: Peter Bex
Subject: [Chicken-hackers] SRFI-69 compatibility problem and hash table segfaults
Date: Sat, 7 Jan 2012 14:55:48 +0100
User-agent: Mutt/

Hi all,

The thought just popped into my head that my hash table patch breaks
SRFI-69 compatibility.  When the user passes a custom hash procedure,
this procedure is expected to accept an extra argument, which is not
part of the SRFI-69 API.

While looking into this, I also found that the result of user-supplied
hash functions is used without checking whether it returns a fixnum and
whether the fixnum is within the hash table's bounds.  This can cause
a segfault:

#;1> (use srfi-69)
; loading library srfi-69 ...
#;2> (define h (make-hash-table equal? (lambda x 'boom)))
#;3> (hash-table-ref h 'y)
zsh: segmentation fault  csi

The attached patch also fixes this by adding a check, but only
in the case of user-supplied functions, so that we don't pay the
penalty in the common case of not supplying a hash function
or using one of the builtins.  I'm assuming that people hooking
the number-hash-hook know what they're doing and will not
return out-of-bounds values.  (that reminds me, the numbers
egg should use this hook...)

For fixing the extra argument problem, here are some options I

1) Pass the randomization value by a SRFI-49 parameter.
2) Get rid of randomization-per-hashtable and have one global
    randomization value instead, like low-level hash tables.
    Symbol tables could still have per-table randomization.
    Users must take care of randomization themselves.
3) Assume the user knows what they're doing when passing a custom
    hash function and let them handle the randomization themselves.
    This will mean when re-using existing core hash procedures
    the user must remember to pass their own value to that
    procedure.  If the user wants per-hash-table randomization
    they must take care of this themselves by passing a different
    closure for each table.

Of course, all approaches have disadvantages.  1 seems the least
invasive and error-prone but it's just ugly.  2 is less secure because
when for whatever reason some hash table's hash values are leaked out
all other values are also known.  3 means that in some cases the
user might want to use the chicken-supplied ones but needs to
remember to use the randomization factor.  3 is of course the easiest
since it basically just punts on the issue and lets the user sort it
out (but only when passing a custom procedure!).

I removed the ability to get at the hash-table's randomization
value and to pass it because this just introduces more chance for
mistakes; when a user supplies a custom procedure and wants to
make use of the hash table's randomization value, this becomes
difficult because then when you use hash-table-copy the first hash
table stays around in the closure and never gets GCed.

To keep the property that each hash table has its own randomization
value, the original procedure is stored in the hash table and the
a new hashing closure is generated when the table is created, or
when using hash-table-copy.  Unfortunately, for user-supplied functions
this can't be done.  This could optionally be simplified to not store
the original value and just copy the closure around.

Sorry for the messy handling of the original fix.

"The process of preparing programs for a digital computer
 is especially attractive, not only because it can be economically
 and scientifically rewarding, but also because it can be an aesthetic
 experience much like composing poetry or music."
                                                        -- Donald Knuth

Attachment: hash-compatibility.patch
Description: Text document

reply via email to

[Prev in Thread] Current Thread [Next in Thread]