[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

bug#36447: 27.0.50; New "Unknown keyword" errors

From: Daniel Colascione
Subject: bug#36447: 27.0.50; New "Unknown keyword" errors
Date: Tue, 9 Jul 2019 20:19:38 -0700
User-agent: SquirrelMail/1.4.23 [SVN]

>> From: Stefan Monnier <address@hidden>
>> Cc: Eli Zaretskii <address@hidden>,  address@hidden,
>> address@hidden,  address@hidden
>> Date: Tue, 09 Jul 2019 17:05:53 -0400
>> I think we should get Daniel's opinion on this.
> Daniel, could you please comment on the issues discussed in this bug?

Thanks for debugging this problem. It really is nasty. AIUI, the problem
is that hash-consing the hash vectors might unify vectors that happen to
have the same contents under one hashing scheme but different contents
under a different hashing scheme, so when we rehash lazily, we correctly
rehash one hash table and corrupt a different hash table's index array by
side effect. There are two proposed solutions:

1) Copy the hash table internal vectors on rehash, and
2) "Freeze" hash tables by eliminating the index arrays and rebuilding
them all eagerly on dump start.

#1 works, but it's somewhat inefficient.

#2 is a variant of #1, in a sense. Instead of copying the hash table
vectors and mutating them, we rebuild them from scratch. I don't
understand why we have to do that eagerly.

#1 isn't as bad as you might think. The existing hash table rehashing
stuff is inefficient anyway. Suppose we dump a hash table, load a dump,
and access the hash table. Right now, we do the rehashing and take COW
faults for the arrays we mutate. So far, so good. What happens if we grow
the hash table past its load factor limit? We allocate new vectors and
rehash into those, forgetting the old vectors. In the non-pdumper case, GC
will collect those older vectors eventually. In the pdumper case, those
COWed pages will stick around in memory forever, unused. I don't think it
counts as a "leak", since the memory waste is bounded, but it's still
memory waste.

If we use approach #1, we don't mutate the hash table pages mapped from
the dump. Besides, the amount of work we do is actually the same. In the
COW-page case, the kernel allocates a new page, reads the dump bytes, and
writes them to the new page. In the Fcopy_sequence case, *we* allocate a
new vector, read the dump bytes, and write them into anonymous memory.
It's the same work either way, except that if we copy, when we grow the
hash table, we can actually free the original vectors.

IMHO, the right approach is to check in #1 for now and switch to a #2-like
approach once master is stable. Noticing that we don't actually have to
store the hash table internal arrays in the dump is a good catch.

reply via email to

[Prev in Thread] Current Thread [Next in Thread]