Modulo the usual comments about irrelevant hunks :) it looks good,
thanks! Can you however ensure the invariant that the queue is empty
upon entry to _gst_mark, by making it accept an OOP? I believe this
simplifies the code and lets you add back the tail recursion
optimization you removed.
And it would be even better if you could keep the distinction between
marking one OOP and marking a range, because it removes a lot of traffic
to/from the mark queue (a bit cheaper than the stack because of no stack
frame overhead, but still expensive cache-wise). Which in turn means,
keep the API as is! :)