bug-tar
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Bug-tar] use optimal file system block size


From: Joerg Schilling
Subject: Re: [Bug-tar] use optimal file system block size
Date: Thu, 19 Jul 2018 12:24:08 +0200
User-agent: Heirloom mailx 12.5 7/5/10

Christian Krause <address@hidden> wrote:

> To clarify: I do not mean to change the **record size**, which would result 
> in an incompatible tar file. I am only interested in the buffer sizes that 
> are used to read from and write to block devices.

This has been noticed.

BTW: could you please use for better readability a line length of 79 chars as
in the mail RFC? 

> $strace -T -ttt -ff -o tar-1.30-factor-4k.strace tar cbf 4096 data4k.tar data
>
> $ strace-analyzer io tar-1.30-factor-4k.strace.72464 | grep data | column -t
> read   84M  in  1.520   s   (~  55M  /  s)  with  43  ops  (~  2M  /  op,  ~  
> 2M  request  size)  data/blob
> write  86M  in  61.316  ms  (~  1G   /  s)  with  43  ops  (~  2M  /  op,  ~  
> 2M  request  size)  data4k.tar
> ```

Are you mainly interested in the # of "ops" in your output?

>
> Due to changing the **record size**, this creates a different, 
> not-so-compatible tar file:
>
> ```
> $ stat -c %s data.tar data4k.tar
> 88084480
> 90177536
>
> $ md5sum data.tar data4k.tar
> 4477dca65dee41609d43147cd15eea68  data.tar
> 6f4ce17db2bf7beca3665e857cbc2d69  data4k.tar
> ```
>
>
> Please verify: The fact that input buffer and output buffer sizes are the 
> same as the record size is an implementation detail. The input buffer and 
> output buffer sizes could be decoupled from the record size to improve I/O 
> performance without changing the resulting tar file. Decoupling would entail 
> a huge refactoring, like Jörg suggests.

Well, since the related changes have been implemented 30 years ago already and
since the star FIFO mode was intentionally made the default frm the beginning, 
this is still rock solid code. It has been tested millions of times and star 
is at least one of the most stable tar implementations if not the stablest.

If you used the same with star (using default parameters), you would only get
11 "read ops".

If you used "star fs=100m ..." you would only get one read.

If you make performance tests, you'll notice that the IO size reported 
by stat is not the optimum but the smallest size that gives improved 
performance. If you read with even bigger IO sizes, you get better performance 
(see the star results).

> ```
> $ bsdtar --version
> bsdtar 3.2.2 - libarchive 3.2.2 zlib/1.2.8 liblzma/5.0.4 bz2lib/1.0.6
>
> $ strace -T -ttt -ff -o bsdtar-3.2.2-create.strace bsdtar -cf data-bsdtar.tar 
> data
>
> $ strace-analyzer io bsdtar-3.2.2-create.strace.14101 | grep data | column -t
> read   84M  in  388.927  ms  (~  216M  /  s)  with  42    ops  (~  2M   /  
> op,  ~  2M   request  size)  data/blob
> write  84M  in  4.854    s   (~  17M   /  s)  with  8602  ops  (~  10K  /  
> op,  ~  10K  request  size)  data-bsdtar.tar
> ```

I checked and it seems that "bsdtar" (which differs from "BSD tar") reads 64 KB 
blocks.

This gives slightly better results than gtar but does not give you what you may 
get from star.

Let me give you a simple performance result run on a FreeBSD  11.1-RELEASE-p10
virtual instance.

I did run all tars several times and reported only the fastest result:

gtar-1.30:

sudo /tmp/tar-1.30/src/tar -cf /dev/zero /usr
tar: Removing leading `/' from member names
tar: Removing leading `/' from hard link targets
42.668127 real 1.901632 user 13.029537 sys 34% cpu 243590+0io 0pf+0w

Note that gtar needs /dev/zero to prevent it from cheating.

bsdtar 3.3.1:

sudo tar -cf /dev/null /usr                                                  
tar: Removing leading '/' from member names
47.094941 real 5.640348 user 13.939351 sys 41% cpu 177123+0io 12pf+0w

star-1.5.4:

sudo star -c -f /dev/null /usr
star: Cannot allocate memory. Cannot lock fifo memory.
star: 175948 blocks + 0 bytes (total of 1801707520 bytes = 1759480.00k).
26.913171 real 1.403554 user 10.688413 sys 44% cpu 174209+0io 5pf+0w

Jörg

-- 
 EMail:address@hidden                    (home) Jörg Schilling D-13353 Berlin
    address@hidden (work) Blog: http://schily.blogspot.com/
 URL: http://cdrecord.org/private/ http://sf.net/projects/schilytools/files/'



reply via email to

[Prev in Thread] Current Thread [Next in Thread]