groff
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Compressed man pages (was: Accessibility of man pages (was: Playground p


From: Alejandro Colomar
Subject: Compressed man pages (was: Accessibility of man pages (was: Playground pager lsp(1)))
Date: Sun, 9 Apr 2023 14:05:08 +0200
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:102.0) Gecko/20100101 Thunderbird/102.9.1

[Added back linux-man@, and people that commented on this (sub)topic]
[Added Sam, I've got a question for you]

Hi Alexis,

Please keep (at least) linux-man@ in the loop.

On 4/9/23 08:44, Alexis wrote:
> 
> As a related data point, i'd like to mention Gentoo's position on 
> this, i.e. that man pages will continue to be bzip2-compressed by 
> default:
> 
> "app-text/mandoc bzip2 support"
> https://bugs.gentoo.org/854267
> 
> "Remove /usr/share/man from default inclusion list for docompress"
> https://bugs.gentoo.org/836367

As Ingo said[1] 3 years ago, I don't think in this year it makes any
sense to compress pages anymore.  However, since it's simple for me
to add support for that, and it can be interesting for testing
purposes, I added support for installing the Linux man-pages
compressed with bzip2 using the Makefile[2].  While I was at it, I
also added support for generating .tar.bz2 release tarballs[3].

With this, I was able to test a bit more than what I did yesterday:


$ sudo rm -rf /opt/local/man/
$ sudo make install-man prefix=/opt/local/man/gz_ -j LINK_PAGES=symlink Z=.gz | 
wc -l
2570
$ sudo make install-man prefix=/opt/local/man/bz2 -j LINK_PAGES=symlink Z=.bz2 
| wc -l
2570
$ sudo make install-man prefix=/opt/local/man/man -j LINK_PAGES=symlink Z= | wc 
-l
2570
$ du -sh /opt/local/man/*
5.4M    /opt/local/man/bz2
5.5M    /opt/local/man/gz_
9.4M    /opt/local/man/man


$ export MANPATH=/opt/local/man/gz_/share/man
$ /bin/time -f %e dash -c "man -Kaw RLIMIT_NOFILE | wc -l"
37
0.31
$ /bin/time -f %e dash -c "find $MANPATH -type f | xargs zgrep -l RLIMIT_NOFILE 
| wc -l"
17
1.56
$ /bin/time -f %e dash -c "find $MANPATH -type f | xargs -P0 zgrep -l 
RLIMIT_NOFILE | wc -l"
17
1.56
$ /bin/time -f %e dash -c "find $MANPATH -type f | while read f; do zcat \$f | 
grep -l RLIMIT_NOFILE >/dev/null && echo \$f; done | wc -l"
17
1.24
$ /bin/time -f %e dash -c "find $MANPATH -type f | while read f; do gzip -d - 
<\$f | grep -l RLIMIT_NOFILE >/dev/null && echo \$f; done | wc -l"
17
1.14


$ export MANPATH=/opt/local/man/bz2/share/man
$ /bin/time -f %e dash -c "man -Kaw RLIMIT_NOFILE | wc -l"
37
10.90
$ /bin/time -f %e dash -c "find $MANPATH -type f | xargs bzgrep -l 
RLIMIT_NOFILE | wc -l"
17
1.33
$ /bin/time -f %e dash -c "find $MANPATH -type f | xargs -P0 bzgrep -l 
RLIMIT_NOFILE | wc -l"
17
1.31
$ /bin/time -f %e dash -c "find $MANPATH -type f | while read f; do bzcat \$f | 
grep -l RLIMIT_NOFILE >/dev/null && echo \$f; done | wc -l"
17
1.21
$ /bin/time -f %e dash -c "find $MANPATH -type f | while read f; do bzip2 -d - 
<\$f | grep -l RLIMIT_NOFILE >/dev/null && echo \$f; done | wc -l"
17
1.22


$ export MANPATH=/opt/local/man/man/share/man
$ /bin/time -f %e dash -c "man -Kaw RLIMIT_NOFILE | wc -l"
37
0.56
$ /bin/time -f %e dash -c "find $MANPATH -type f | xargs grep -l RLIMIT_NOFILE 
| wc -l"
17
0.01
$ /bin/time -f %e dash -c "find $MANPATH -type f | xargs -P0 grep -l 
RLIMIT_NOFILE | wc -l"
17
0.01

Weird thing: today, the symlink bug in man(1) was reproducible in
all kinds of pages, while yesterday it only reproduced in
uncompressed ones.

Another weird thing: times today changed considerably for the
find(1) pipelines (half of yesterday's).  It's not a thing of
using dash(1), because I get similar times with bash(1) and its
builtin time(1).

Important note: Sam, are you sure you want your pages compressed
with bz2?  Have you seen the 10 seconds it takes man-db's man(1) to
find a word in the pages?  I suggest that at least you try to
reproduce these tests in your machine, and see if it's just me or
man-db's man(1) is pretty bad at non-gz pages.

Test results:

-  man-db's man(1) is slower with plain man(7) source than with .gz
   pages for some misterious reason.

-  man-db's man(1) is turtle slow with .bz2 pages.

-  xargs -P0 doesn't affect significantly.  As Ralph said, this is
   probably because the main issue with find(1) was having the
   bottleneck in clone/fork+exec, and xargs(1) already solves that.

   Expanding the pipeline to use zcat(1) instead of zgrep(1)
   improves a little bit more, because the zgrep(1) script is
   probably quite inefficient, while zcat(1) is just a simple
   wrapper around gzip(1).  We see that zgrep(1) is more
   inefficient than running ourselves a few programs per file in a
   pipeline!

   Calling gzip(1) directly is even faster, since we avoid invoking
   a shell for such a small script.

   Expanding the bzgrep(1) pipeline into one using bzcat(1) has
   similar improvements.  However, since bzcat(1) is a binary, we
   don't get further improvement from calling bzip2(1) directly.


Cheers,
Alex

> 
> 
> Alexis.
> 


[1]:  <https://marc.info/?l=mandoc-discuss&m=160668087317110&w=2>

[2]:  
<https://git.kernel.org/pub/scm/docs/man-pages/man-pages.git/commit/?id=6a828d5b6879ef19c3f59034fe1d0850d25d0056>

[3]:  
<https://git.kernel.org/pub/scm/docs/man-pages/man-pages.git/commit/?id=e5b23b9c5b318d69ee78af0906e3bf0c665f9ae5>

-- 
<http://www.alejandro-colomar.es/>
GPG key fingerprint: A9348594CE31283A826FBDD8D57633D441E25BB5

Attachment: OpenPGP_signature
Description: OpenPGP digital signature


reply via email to

[Prev in Thread] Current Thread [Next in Thread]