bug-guix
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

bug#24937: "deleting unused links" GC phase is too slow


From: Mark H Weaver
Subject: bug#24937: "deleting unused links" GC phase is too slow
Date: Sun, 11 Dec 2016 09:23:38 -0500
User-agent: Gnus/5.13 (Gnus v5.13) Emacs/25.1 (gnu/linux)

address@hidden (Ludovic Courtès) writes:

> Here’s a proposed patch that follows your suggestion, Mark, but places
> an upper bound on the number of directory entries loaded in memory.
>
> On my laptop, which has ~500k entries in /gnu/store/.links, the result
> is something like this (notice the inode numbers in ‘lstat’ calls):
>
> 13738 write(4, "gmlo\0\0\0\0\31\0\0\0\0\0\0\0deleting unused "..., 48) = 48
> 13738 open("/gnu/store/.links", O_RDONLY|O_NONBLOCK|O_DIRECTORY|O_CLOEXEC) = 8
> 13738 fstat(8, {st_dev=makedev(8, 2), st_ino=4014083, st_mode=S_IFDIR|0755, 
> st_nlink=2, st_uid=0, st_gid=0, st_blksize=4096, st_blocks=119224, 
> st_size=60977152, st_atime=2016/12/11-12:01:59, st_mtime=2016/12/11-09:39:45, 
> st_ctime=2016/12/11-09:39:45}) = 0
> 13738 getdents(8, {{d_ino=4014083, d_off=4294967296, d_reclen=24, d_name="."}
> [...]
> 13738 
> lstat("/gnu/store/.links/1f2f3g8waxwymp9sl2slcfyara164i8w1y2sz3h9js2fcviv2rnc",
>  {st_dev=makedev(8, 2), st_ino=47, st_mode=S_IFREG|0444, st_nlink=2, 
> st_uid=0, st_gid=0, st_blksize=4096, st_blocks=88, st_size=41482, 
> st_atime=2015/03/10-11:29:06, st_mtime=1970/01/01-01:00:01, 
> st_ctime=2015/11/25-11:29:20}) = 0
> 13738 
> lstat("/gnu/store/.links/1p25kpyfw354in4kykmgh5sy9h925hnil1jdzgxhz7n6abbws8px",
>  {st_dev=makedev(8, 2), st_ino=65, st_mode=S_IFREG|0444, st_nlink=9, 
> st_uid=0, st_gid=0, st_blksize=4096, st_blocks=8, st_size=2313, 
> st_atime=2016/12/08-21:02:26, st_mtime=1970/01/01-01:00:01, 
> st_ctime=2016/12/11-01:44:49}) = 0
> 13738 
> lstat("/gnu/store/.links/163187g637br9ys5pmshb01wjav53bs1g1a83m7c2alpdyx3yqz2",
>  {st_dev=makedev(8, 2), st_ino=68, st_mode=S_IFREG|0444, st_nlink=2, 
> st_uid=0, st_gid=0, st_blksize=4096, st_blocks=32, st_size=13561, 
> st_atime=2015/03/10-11:29:06, st_mtime=1970/01/01-01:00:01, 
> st_ctime=2015/11/25-11:29:20}) = 0
> [...]
> 13738 getdents(8, {{d_ino=4257114, d_off=1734093409898198492,
> [...]
> 13738 
> lstat("/gnu/store/.links/1m6g06i01ybbkhjjbirjnj7fckw1b772cwygkvbd6v6zgkln7f7m",
>  {st_dev=makedev(8, 2), st_ino=19, st_mode=S_IFREG|0444, st_nlink=4, 
> st_uid=0, st_gid=0, st_blksize=4096, st_blocks=8, st_size=2553, 
> st_atime=2016/12/08-21:02:54, st_mtime=1970/01/01-01:00:01, 
> st_ctime=2016/12/07-00:05:19}) = 0
> 13738 
> lstat("/gnu/store/.links/1ml8n3q55ikn8h60sn67jq1y7z7mvdp5kwr33pqrid2r6kk1d4kb",
>  {st_dev=makedev(8, 2), st_ino=30, st_mode=S_IFREG|0444, st_nlink=2, 
> st_uid=0, st_gid=0, st_blksize=4096, st_blocks=8, st_size=2090, 
> st_atime=2015/03/10-11:29:06, st_mtime=1970/01/01-01:00:01, 
> st_ctime=2015/11/25-11:29:21}) = 0
> 13738 
> lstat("/gnu/store/.links/1c8cvwlqyyqby3k13cwm40g26pwca5iiz5dcj43xrgn9y91lfvc2",
>  {st_dev=makedev(8, 2), st_ino=33, st_mode=S_IFREG|0444, st_nlink=2, 
> st_uid=0, st_gid=0, st_blksize=4096, st_blocks=16, st_size=7958, 
> st_atime=2015/11/04-18:55:32, st_mtime=1970/01/01-01:00:01, 
> st_ctime=2016/01/05-11:35:49}) = 0
> [...]
> 13738 getdents(8, {{d_ino=328672,
> [...]
> 13738 
> lstat("/gnu/store/.links/1l55l59dxb74himmkfzx5v63cv7287i6rjhdns1fdlwajqd73lnz",
>  {st_dev=makedev(8, 2), st_ino=21, st_mode=S_IFREG|0444, st_nlink=31, 
> st_uid=0, st_gid=0, st_blksize=4096, st_blocks=8, st_size=615, 
> st_atime=2016/12/08-21:02:30, st_mtime=1970/01/01-01:00:01, 
> st_ctime=2016/12/11-01:44:47}) = 0
> 13738 
> lstat("/gnu/store/.links/1c7mm5amw743mb45f1zg4d4r3g549ch35wks9izkcgkx0jirpxsg",
>  {st_dev=makedev(8, 2), st_ino=48, st_mode=S_IFREG|0444, st_nlink=2, 
> st_uid=0, st_gid=0, st_blksize=4096, st_blocks=360, st_size=176750, 
> st_atime=2015/03/10-11:29:06, st_mtime=1970/01/01-01:00:01, 
> st_ctime=2015/11/25-11:29:20}) = 0
> 13738 
> lstat("/gnu/store/.links/0z5s7b0yk8mfn1np6gk3cdbmpnjgxg1g0l8vfq1aa01zwp06d3f0",
>  {st_dev=makedev(8, 2), st_ino=61, st_mode=S_IFREG|0444, st_nlink=2, 
> st_uid=0, st_gid=0, st_blksize=4096, st_blocks=8, st_size=1440, 
> st_atime=2016/11/03-20:37:50, st_mtime=1970/01/01-01:00:01, 
> st_ctime=2016/11/07-00:01:57}) = 0
>
> I can’t tell exactly how this affects performance because my machine has
> an SSD where this operation takes ~3 seconds on a cold cache.  I suspect
> it has performance comparable to that of doing all the ‘readdir’ at once
> followed by all the ‘lstat’.
>
> Mark, how does that sound?

I think we should sort the entire directory using merge sort backed to
disk files.  If we load chunks of the directory, sort them and process
them individually, I expect that this will increase the amount of I/O
required by a non-trivial factor.  In each pass, we would load blocks of
inodes from disk, almost all of which are likely to be present in the
store and thus linked from the directory, but in this scheme we will
process only a small number of them and drop the rest on the floor to be
read again in the next pass.  Given that even my fairly optimal
implementation takes about 35 minutes to run on Hydra, I'd prefer to
avoid multiplying that by a non-trivial factor.

Why not just use GNU sort?  It already exists, and does exactly what we
need.

If you object to using an external program for some reason, I would
prefer to re-implement a similar algorithm in the daemon.

     Thanks,
       Mark





reply via email to

[Prev in Thread] Current Thread [Next in Thread]