groff
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Compressed man pages (was: Accessibility of man pages (was: Playgrou


From: Sam James
Subject: Re: Compressed man pages (was: Accessibility of man pages (was: Playground pager lsp(1)))
Date: Wed, 12 Apr 2023 09:13:13 +0100
User-agent: mu4e 1.10.1; emacs 29.0.90

Alejandro Colomar <alx.manpages@gmail.com> writes:

> [[PGP Signed Part:Undecided]]
> [Added back linux-man@, and people that commented on this (sub)topic]
> [Added Sam, I've got a question for you]
>
> Hi Alexis,
>
> Please keep (at least) linux-man@ in the loop.
>
> On 4/9/23 08:44, Alexis wrote:
>> 
>> As a related data point, i'd like to mention Gentoo's position on 
>> this, i.e. that man pages will continue to be bzip2-compressed by
>> default:
>> 
>> "app-text/mandoc bzip2 support"
>> https://bugs.gentoo.org/854267
>> 
>> "Remove /usr/share/man from default inclusion list for docompress"
>> https://bugs.gentoo.org/836367
>
> As Ingo said[1] 3 years ago, I don't think in this year it makes any
> sense to compress pages anymore.  However, since it's simple for me
> to add support for that, and it can be interesting for testing
> purposes, I added support for installing the Linux man-pages
> compressed with bzip2 using the Makefile[2].  While I was at it, I
> also added support for generating .tar.bz2 release tarballs[3].
>
> With this, I was able to test a bit more than what I did yesterday:
>
>
> $ sudo rm -rf /opt/local/man/
> $ sudo make install-man prefix=/opt/local/man/gz_ -j LINK_PAGES=symlink Z=.gz 
> | wc -l
> 2570
> $ sudo make install-man prefix=/opt/local/man/bz2 -j LINK_PAGES=symlink 
> Z=.bz2 | wc -l
> 2570
> $ sudo make install-man prefix=/opt/local/man/man -j LINK_PAGES=symlink Z= | 
> wc -l
> 2570
> $ du -sh /opt/local/man/*
> 5.4M  /opt/local/man/bz2
> 5.5M  /opt/local/man/gz_
> 9.4M  /opt/local/man/man
>
>
> $ export MANPATH=/opt/local/man/gz_/share/man
> $ /bin/time -f %e dash -c "man -Kaw RLIMIT_NOFILE | wc -l"
> 37
> 0.31
> $ /bin/time -f %e dash -c "find $MANPATH -type f | xargs zgrep -l 
> RLIMIT_NOFILE | wc -l"
> 17
> 1.56
> $ /bin/time -f %e dash -c "find $MANPATH -type f | xargs -P0 zgrep -l 
> RLIMIT_NOFILE | wc -l"
> 17
> 1.56
> $ /bin/time -f %e dash -c "find $MANPATH -type f | while read f; do zcat \$f 
> | grep -l RLIMIT_NOFILE >/dev/null && echo \$f; done | wc -l"
> 17
> 1.24
> $ /bin/time -f %e dash -c "find $MANPATH -type f | while read f; do gzip -d - 
> <\$f | grep -l RLIMIT_NOFILE >/dev/null && echo \$f; done | wc -l"
> 17
> 1.14
>
>
> $ export MANPATH=/opt/local/man/bz2/share/man
> $ /bin/time -f %e dash -c "man -Kaw RLIMIT_NOFILE | wc -l"
> 37
> 10.90
> $ /bin/time -f %e dash -c "find $MANPATH -type f | xargs bzgrep -l 
> RLIMIT_NOFILE | wc -l"
> 17
> 1.33
> $ /bin/time -f %e dash -c "find $MANPATH -type f | xargs -P0 bzgrep -l 
> RLIMIT_NOFILE | wc -l"
> 17
> 1.31
> $ /bin/time -f %e dash -c "find $MANPATH -type f | while read f; do bzcat \$f 
> | grep -l RLIMIT_NOFILE >/dev/null && echo \$f; done | wc -l"
> 17
> 1.21
> $ /bin/time -f %e dash -c "find $MANPATH -type f | while read f; do bzip2 -d 
> - <\$f | grep -l RLIMIT_NOFILE >/dev/null && echo \$f; done | wc -l"
> 17
> 1.22
>
>
> $ export MANPATH=/opt/local/man/man/share/man
> $ /bin/time -f %e dash -c "man -Kaw RLIMIT_NOFILE | wc -l"
> 37
> 0.56
> $ /bin/time -f %e dash -c "find $MANPATH -type f | xargs grep -l 
> RLIMIT_NOFILE | wc -l"
> 17
> 0.01
> $ /bin/time -f %e dash -c "find $MANPATH -type f | xargs -P0 grep -l 
> RLIMIT_NOFILE | wc -l"
> 17
> 0.01
>
> Weird thing: today, the symlink bug in man(1) was reproducible in
> all kinds of pages, while yesterday it only reproduced in
> uncompressed ones.
>
> Another weird thing: times today changed considerably for the
> find(1) pipelines (half of yesterday's).  It's not a thing of
> using dash(1), because I get similar times with bash(1) and its
> builtin time(1).
>
> Important note: Sam, are you sure you want your pages compressed
> with bz2?  Have you seen the 10 seconds it takes man-db's man(1) to
> find a word in the pages?  I suggest that at least you try to
> reproduce these tests in your machine, and see if it's just me or
> man-db's man(1) is pretty bad at non-gz pages.
>
> Test results:
>
> -  man-db's man(1) is slower with plain man(7) source than with .gz
>    pages for some misterious reason.
>
> -  man-db's man(1) is turtle slow with .bz2 pages.

I started looking into changing to xz (or just.. not bz2, anyway),
partially motivated by https://gitlab.com/man-db/man-db/-/issues/4 /
just interest locally (without having done measurements to see if it
would be worth a global change) and the xz maintainer ended up
recommending a different implementation to how man-db currently handles
external utilties entirely (which I have a draft of).

The xz author had some suggestions on the best parameters to use
for man pages too which I need to look into and dig up...

https://bugs.gentoo.org/169260 was an interesting discussion
about our choice of bz2 (it came up a bit in
https://bugs.gentoo.org/372653 too).

(I'll get back and read the rest of the thread later, but wanted
to add this tidbit.)

Definitely surprised to learn bz2 is *that* bad though!

best,
sam

Attachment: signature.asc
Description: PGP signature


reply via email to

[Prev in Thread] Current Thread [Next in Thread]