[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Chicken-hackers] SRFI-69 compatibility problem and hash table segfaults
From: |
Peter Bex |
Subject: |
[Chicken-hackers] SRFI-69 compatibility problem and hash table segfaults |
Date: |
Sat, 7 Jan 2012 14:55:48 +0100 |
User-agent: |
Mutt/1.4.2.3i |
Hi all,
The thought just popped into my head that my hash table patch breaks
SRFI-69 compatibility. When the user passes a custom hash procedure,
this procedure is expected to accept an extra argument, which is not
part of the SRFI-69 API.
While looking into this, I also found that the result of user-supplied
hash functions is used without checking whether it returns a fixnum and
whether the fixnum is within the hash table's bounds. This can cause
a segfault:
#;1> (use srfi-69)
; loading library srfi-69 ...
#;2> (define h (make-hash-table equal? (lambda x 'boom)))
#;3> (hash-table-ref h 'y)
zsh: segmentation fault csi
The attached patch also fixes this by adding a check, but only
in the case of user-supplied functions, so that we don't pay the
penalty in the common case of not supplying a hash function
or using one of the builtins. I'm assuming that people hooking
the number-hash-hook know what they're doing and will not
return out-of-bounds values. (that reminds me, the numbers
egg should use this hook...)
For fixing the extra argument problem, here are some options I
considered:
1) Pass the randomization value by a SRFI-49 parameter.
2) Get rid of randomization-per-hashtable and have one global
randomization value instead, like low-level hash tables.
Symbol tables could still have per-table randomization.
Users must take care of randomization themselves.
3) Assume the user knows what they're doing when passing a custom
hash function and let them handle the randomization themselves.
This will mean when re-using existing core hash procedures
the user must remember to pass their own value to that
procedure. If the user wants per-hash-table randomization
they must take care of this themselves by passing a different
closure for each table.
Of course, all approaches have disadvantages. 1 seems the least
invasive and error-prone but it's just ugly. 2 is less secure because
when for whatever reason some hash table's hash values are leaked out
all other values are also known. 3 means that in some cases the
user might want to use the chicken-supplied ones but needs to
remember to use the randomization factor. 3 is of course the easiest
since it basically just punts on the issue and lets the user sort it
out (but only when passing a custom procedure!).
I removed the ability to get at the hash-table's randomization
value and to pass it because this just introduces more chance for
mistakes; when a user supplies a custom procedure and wants to
make use of the hash table's randomization value, this becomes
difficult because then when you use hash-table-copy the first hash
table stays around in the closure and never gets GCed.
To keep the property that each hash table has its own randomization
value, the original procedure is stored in the hash table and the
a new hashing closure is generated when the table is created, or
when using hash-table-copy. Unfortunately, for user-supplied functions
this can't be done. This could optionally be simplified to not store
the original value and just copy the closure around.
Sorry for the messy handling of the original fix.
Cheers,
Peter
--
http://sjamaan.ath.cx
--
"The process of preparing programs for a digital computer
is especially attractive, not only because it can be economically
and scientifically rewarding, but also because it can be an aesthetic
experience much like composing poetry or music."
-- Donald Knuth
hash-compatibility.patch
Description: Text document
- [Chicken-hackers] SRFI-69 compatibility problem and hash table segfaults,
Peter Bex <=