bug-tar
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Bug-tar] use optimal file system block size


From: Christian Krause
Subject: [Bug-tar] use optimal file system block size
Date: Wed, 18 Jul 2018 14:58:13 +0200
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:52.0) Gecko/20100101 Thunderbird/52.9.1

Dear tar Community,

We are using **tar** at our High-Performance Computing (HPC) at our research 
institute iDiv. The networked file system serving (scientific) data on our 
cluster is using a block size of 2 MiB:

```
$ mkdir data
$ dd if=/dev/zero bs=2M count=42 of=data/blob status=none
$ stat -c %o data/blob
2097152
```

**tar** does not explicitly use the block size of the file system where the 
files are located, but, for a reason I don't know (feel free to educate me), 10 
KiB:

```
$ tar --version | head -1
tar (GNU tar) 1.30

$ strace -T -ttt -ff -o tar-1.30.strace tar cf data.tar data

$ strace-analyzer io tar-1.30.strace.59539 | grep data | column -t
read   84M  in  444.041  ms  (~  189M  /  s)  with  8602  ops  (~  10K  /  op,  
~  10K  request  size)  data/blob
write  84M  in  404.483  ms  (~  208M  /  s)  with  8602  ops  (~  10K  /  op,  
~  10K  request  size)  data.tar.gz
```

If you're interested, you can find strace-analyzer 
[here](https://github.com/wookietreiber/strace-analyzer). It is, more or less, 
just doing some stats over the strace log.

Especially for a networked file system, the comparatively high amount of IOPS 
with that block size results in not so good performance. Using the native file 
system block size would generally yield better performance.

I would like to propose to use the native file system block size in favor of 
the currently used 10 KiB. The block size can be queried with the `stat` 
syscall, just like with the `stat` command from above. If the syscall does not 
return the block size, e.g. if the file system does not support it, the current 
default of 10 KiB could still be applied as a fallback.

What do you think about an improvement like this?

I can offer to try to implement this myself and provide a patch. I'm fairly new 
to GNU Savannah, so I'm still a bit fuzzy on what the preferred way to submit 
patches to the project is (I'm used to the fork plus pull request / merge 
request model as you can find on GitHub/GitLab).

Best Regards

--

Christian Krause

Scientific Computing Administration and Support

-----------------------------------------------------------------------------

Email: address@hidden

Office: BioCity Leipzig 5e, Room 3.201.3

Phone: +49 341 97 33144

-----------------------------------------------------------------------------

German Centre for Integrative Biodiversity Research (iDiv) Halle-Jena-Leipzig

Deutscher Platz 5e

04103 Leipzig

Germany

-----------------------------------------------------------------------------

iDiv is a research centre of the DFG – Deutsche Forschungsgemeinschaft

iDiv ist eine zentrale Einrichtung der Universität Leipzig im Sinne des § 92 
Abs. 1 SächsHSFG und wird zusammen mit der Martin-Luther-Universität 
Halle-Wittenberg und der Friedrich-Schiller-Universität Jena betrieben sowie in 
Kooperation mit dem Helmholtz-Zentrum für Umweltforschung GmbH – UFZ. 
Beteiligte Kooperationspartner sind die folgenden außeruniversitären 
Forschungseinrichtungen: das Helmholtz-Zentrum für Umweltforschung GmbH - UFZ, 
das Max-Planck-Institut für Biogeochemie (MPI BGC), das Max-Planck-Institut für 
chemische Ökologie (MPI CE), das Max-Planck-Institut für evolutionäre 
Anthropologie (MPI EVA), das Leibniz-Institut Deutsche Sammlung von 
Mikroorganismen und Zellkulturen (DSMZ), das Leibniz-Institut für 
Pflanzenbiochemie (IPB), das Leibniz-Institut für Pflanzengenetik und 
Kulturpflanzenforschung (IPK) und das Leibniz-Institut Senckenberg Museum für 
Naturkunde Görlitz (SMNG). USt-IdNr. DE 141510383




reply via email to

[Prev in Thread] Current Thread [Next in Thread]