[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: Compressed man pages (was: Accessibility of man pages (was: Playgrou
From: |
Alejandro Colomar |
Subject: |
Re: Compressed man pages (was: Accessibility of man pages (was: Playground pager lsp(1))) |
Date: |
Sun, 9 Apr 2023 14:17:57 +0200 |
User-agent: |
Mozilla/5.0 (X11; Linux x86_64; rv:102.0) Gecko/20100101 Thunderbird/102.9.1 |
On 4/9/23 14:05, Alejandro Colomar wrote:
> [Added back linux-man@, and people that commented on this (sub)topic]
> [Added Sam, I've got a question for you]
>
> Hi Alexis,
>
> Please keep (at least) linux-man@ in the loop.
>
> On 4/9/23 08:44, Alexis wrote:
>>
>> As a related data point, i'd like to mention Gentoo's position on
>> this, i.e. that man pages will continue to be bzip2-compressed by
>> default:
>>
>> "app-text/mandoc bzip2 support"
>> https://bugs.gentoo.org/854267
>>
>> "Remove /usr/share/man from default inclusion list for docompress"
>> https://bugs.gentoo.org/836367
>
> As Ingo said[1] 3 years ago, I don't think in this year it makes any
> sense to compress pages anymore. However, since it's simple for me
> to add support for that, and it can be interesting for testing
> purposes, I added support for installing the Linux man-pages
> compressed with bzip2 using the Makefile[2]. While I was at it, I
> also added support for generating .tar.bz2 release tarballs[3].
>
> With this, I was able to test a bit more than what I did yesterday:
>
>
> $ sudo rm -rf /opt/local/man/
> $ sudo make install-man prefix=/opt/local/man/gz_ -j LINK_PAGES=symlink Z=.gz
> | wc -l
> 2570
> $ sudo make install-man prefix=/opt/local/man/bz2 -j LINK_PAGES=symlink
> Z=.bz2 | wc -l
> 2570
> $ sudo make install-man prefix=/opt/local/man/man -j LINK_PAGES=symlink Z= |
> wc -l
> 2570
> $ du -sh /opt/local/man/*
> 5.4M /opt/local/man/bz2
> 5.5M /opt/local/man/gz_
> 9.4M /opt/local/man/man
>
>
> $ export MANPATH=/opt/local/man/gz_/share/man
> $ /bin/time -f %e dash -c "man -Kaw RLIMIT_NOFILE | wc -l"
> 37
> 0.31
> $ /bin/time -f %e dash -c "find $MANPATH -type f | xargs zgrep -l
> RLIMIT_NOFILE | wc -l"
> 17
> 1.56
> $ /bin/time -f %e dash -c "find $MANPATH -type f | xargs -P0 zgrep -l
> RLIMIT_NOFILE | wc -l"
> 17
> 1.56
> $ /bin/time -f %e dash -c "find $MANPATH -type f | while read f; do zcat \$f
> | grep -l RLIMIT_NOFILE >/dev/null && echo \$f; done | wc -l"
> 17
> 1.24
> $ /bin/time -f %e dash -c "find $MANPATH -type f | while read f; do gzip -d -
> <\$f | grep -l RLIMIT_NOFILE >/dev/null && echo \$f; done | wc -l"
> 17
> 1.14
>
>
> $ export MANPATH=/opt/local/man/bz2/share/man
> $ /bin/time -f %e dash -c "man -Kaw RLIMIT_NOFILE | wc -l"
> 37
> 10.90
> $ /bin/time -f %e dash -c "find $MANPATH -type f | xargs bzgrep -l
> RLIMIT_NOFILE | wc -l"
> 17
> 1.33
> $ /bin/time -f %e dash -c "find $MANPATH -type f | xargs -P0 bzgrep -l
> RLIMIT_NOFILE | wc -l"
> 17
> 1.31
> $ /bin/time -f %e dash -c "find $MANPATH -type f | while read f; do bzcat \$f
> | grep -l RLIMIT_NOFILE >/dev/null && echo \$f; done | wc -l"
> 17
> 1.21
> $ /bin/time -f %e dash -c "find $MANPATH -type f | while read f; do bzip2 -d
> - <\$f | grep -l RLIMIT_NOFILE >/dev/null && echo \$f; done | wc -l"
> 17
> 1.22
>
>
> $ export MANPATH=/opt/local/man/man/share/man
> $ /bin/time -f %e dash -c "man -Kaw RLIMIT_NOFILE | wc -l"
> 37
> 0.56
> $ /bin/time -f %e dash -c "find $MANPATH -type f | xargs grep -l
> RLIMIT_NOFILE | wc -l"
> 17
> 0.01
> $ /bin/time -f %e dash -c "find $MANPATH -type f | xargs -P0 grep -l
> RLIMIT_NOFILE | wc -l"
> 17
> 0.01
>
> Weird thing: today, the symlink bug in man(1) was reproducible in
> all kinds of pages, while yesterday it only reproduced in
> uncompressed ones.
>
> Another weird thing: times today changed considerably for the
> find(1) pipelines (half of yesterday's). It's not a thing of
> using dash(1), because I get similar times with bash(1) and its
> builtin time(1).
>
> Important note: Sam, are you sure you want your pages compressed
> with bz2? Have you seen the 10 seconds it takes man-db's man(1) to
> find a word in the pages? I suggest that at least you try to
> reproduce these tests in your machine, and see if it's just me or
> man-db's man(1) is pretty bad at non-gz pages.
>
> Test results:
>
> - man-db's man(1) is slower with plain man(7) source than with .gz
> pages for some misterious reason.
>
> - man-db's man(1) is turtle slow with .bz2 pages.
>
> - xargs -P0 doesn't affect significantly. As Ralph said, this is
> probably because the main issue with find(1) was having the
> bottleneck in clone/fork+exec, and xargs(1) already solves that.
>
> Expanding the pipeline to use zcat(1) instead of zgrep(1)
> improves a little bit more, because the zgrep(1) script is
> probably quite inefficient, while zcat(1) is just a simple
> wrapper around gzip(1). We see that zgrep(1) is more
> inefficient than running ourselves a few programs per file in a
> pipeline!
>
> Calling gzip(1) directly is even faster, since we avoid invoking
> a shell for such a small script.
>
> Expanding the bzgrep(1) pipeline into one using bzcat(1) has
> similar improvements. However, since bzcat(1) is a binary, we
> don't get further improvement from calling bzip2(1) directly.
And I forgot the obvious one:
- Using plain man(7) source is blazingly fast. So much that I
don't miss mdoc(7)'s indexability so much.
However, I must admit that I do miss mdoc(7)'s power sometimes.
The man_lsfunc() and man_lsvar() functions for finding function
prototypes and variable declarations in man(7) source would be
much simpler using mdoc(1), and I could even use mandoc(1) to
find such things.
>
>
> Cheers,
> Alex
>
>>
>>
>> Alexis.
>>
>
>
> [1]: <https://marc.info/?l=mandoc-discuss&m=160668087317110&w=2>
>
> [2]:
> <https://git.kernel.org/pub/scm/docs/man-pages/man-pages.git/commit/?id=6a828d5b6879ef19c3f59034fe1d0850d25d0056>
>
> [3]:
> <https://git.kernel.org/pub/scm/docs/man-pages/man-pages.git/commit/?id=e5b23b9c5b318d69ee78af0906e3bf0c665f9ae5>
>
--
<http://www.alejandro-colomar.es/>
GPG key fingerprint: A9348594CE31283A826FBDD8D57633D441E25BB5
OpenPGP_signature
Description: OpenPGP digital signature
- Re: Accessibility of man pages, (continued)
- Re: Accessibility of man pages, Ralph Corderoy, 2023/04/09
- Re: Accessibility of man pages, Ingo Schwarze, 2023/04/08
- Re: Accessibility of man pages, Dirk Gouders, 2023/04/08
- Re: Accessibility of man pages, Ingo Schwarze, 2023/04/08
- Re: Accessibility of man pages, Dirk Gouders, 2023/04/09
- Re: Accessibility of man pages, Dirk Gouders, 2023/04/09
- Re: Accessibility of man pages (was: Playground pager lsp(1)), Alexis, 2023/04/09
- Compressed man pages (was: Accessibility of man pages (was: Playground pager lsp(1))), Alejandro Colomar, 2023/04/09
- Re: Compressed man pages (was: Accessibility of man pages (was: Playground pager lsp(1))),
Alejandro Colomar <=
- Re: Compressed man pages (was: Accessibility of man pages (was: Playground pager lsp(1))), G. Branden Robinson, 2023/04/09
- Re: Compressed man pages (was: Accessibility of man pages (was: Playground pager lsp(1))), Colin Watson, 2023/04/09
- Re: Compressed man pages (was: Accessibility of man pages (was: Playground pager lsp(1))), Alejandro Colomar, 2023/04/09
- Re: Compressed man pages, Ralph Corderoy, 2023/04/09
- Re: Compressed man pages (was: Accessibility of man pages (was: Playground pager lsp(1))), Sam James, 2023/04/12
- Re: Compressed man pages, Ralph Corderoy, 2023/04/12
- Re: Compressed man pages, Mingye Wang, 2023/04/12
- Re: Compressed man pages, Ralph Corderoy, 2023/04/12
- Re: Compressed man pages (was: Accessibility of man pages (was: Playground pager lsp(1))), Kerin Millar, 2023/04/12
- Re: Compressed man pages (was: Accessibility of man pages (was: Playground pager lsp(1))), Alejandro Colomar, 2023/04/12