bug#24937: "deleting unused links" GC phase is too slow

From: Ludovic Courtès
Subject: bug#24937: "deleting unused links" GC phase is too slow
Date: Tue, 16 Nov 2021 14:54:13 +0100
User-agent: Gnus/5.13 (Gnus v5.13) Emacs/27.2 (gnu/linux)


Ludovic Courtès <ludo@gnu.org> skribis:

> Files smaller than 4 KiB typically represent ~60% of the entries in
> /gnu/store/.links but only contribute to ~2.5% of the space savings
> afforded by deduplication.
> Not considering these files for deduplication speeds up file insertion
> in the store and, more importantly, leaves 'removeUnusedLinks' with
> fewer entries to traverse, thereby speeding it up proportionally.
> Partly fixes <https://issues.guix.gnu.org/24937>.

Pushed a variant of this as commit
472a0e82a52a3d5d841e1dfad6b13e26082a5750, with a threshold of 8 KiB.

Concretely, the number of .links entries shrinks by ~70%, from
2M to 700K on my laptop, and (presumably) from 64M to 19M on berlin.

I’ll deploy it within a few days on berlin.  I hope the speedup will
reduce pressure there, though obviously it’ll still be an expensive
operation (but fundamentally I think it’ll always be linear in the size
of the store.)

I’m preparing an update of the ‘guix’ package to make this readily
available.  When you deploy the new daemon, .links will be trimmed of
entries for files smaller than 8 KiB the first time you run ‘guix gc’.


