Re: [Help-tar] Extraction performance problem

help-tar

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Help-tar] Extraction performance problem

From:	Jakob Bohm
Subject:	Re: [Help-tar] Extraction performance problem
Date:	Thu, 05 Feb 2015 19:57:53 +0100
User-agent:	Mozilla/5.0 (Windows NT 6.3; WOW64; rv:31.0) Gecko/20100101 Thunderbird/31.4.0

On 05/02/2015 19:22, Paul Eggert wrote:

On 02/05/2015 09:04 AM, Mark Lehrer wrote:
I haven't yet done a full blktrace analysis yet, but the 250MB/sec speed seems to match other tests I have done with a 512 byte block size. Is it possible to do the equivalent of dd's "obs" option, to increase the output block size to 4k, 64k, or even 1M?
Sure, the -b option does that. The default is 20 (i.e., 20 x 512 = 10 KiB).

Which happens not to be a multiple of 4Kio.

has anyone tried to make a multi-threaded version?
Not as far as I know. It's not clear that going multithreaded would be worth the hassle.

I would agree, but given the typical behavior of correctly implemented file system flush logic, it might pay to somehow overlap the closing of extracted regular files with the extraction of subsequent files (because close(fd) must imply fdflush(fd) which must wait for disk I/O thus preventing efficient coalescing of metadata writes to any media, even SSDs). Additionally, if an on-access "virus scanner" is installed, close(fd) may also trigger a wait for the extracted file being checked by the scanner process. A simple form could be to "batch together" the close() calls in bundles of e.g. a few dozen or a few hundred. A more advanced form would delegate the close() calls to a group of helper threads which are awoken for each batch of file handles, and which pick up queued close() calls one at a time from a shared queue (because the time to complete each call will vary and having multiple close() calls in parallel gives kernels the best chance to coalesce meta-data writes).During archive creation, two optimizations would help with disk cache interactions:1. Do open(O_RD) calls ahead of time in background threads so the file handles will usually be ready for read() calls as soon as the end of the previous file is reached. This should be a speedup when taring up many small files, such as a large source code tarball.2. In --sparse mode, use a large buffer to only do one pass over each file. With current tar formats this means building up an in-memory representation of the non-zero file blocks and their locations, until the buffer is filled, then output the chunk using the special tar records already defined for a huge file that has been split into multiple tar entries for other reasons. This should be a speedup when taring up highly sparse larger-than-memory files such as virtual machine disk images. It should also protect against race conditions caused by blocks changing from zero to non-zero between passes, and/or the file length changing during file reading (such things can happen when dumping a live file system without the benefit of a cow disk snapshot mechanism).

Enjoy

Jakob
-- 
Jakob Bohm, CIO, Partner, WiseMo A/S.  http://www.wisemo.com
Transformervej 29, 2860 Søborg, Denmark.  Direct +45 31 13 16 10
This public discussion message is non-binding and may contain errors.
WiseMo - Remote Service Management for PCs, Phones and Embedded

[Prev in Thread]

Current Thread

[Next in Thread]

[Help-tar] Extraction performance problem, Mark Lehrer, 2015/02/05
- Re: [Help-tar] Extraction performance problem, Paul Eggert, 2015/02/05
  - Re: [Help-tar] Extraction performance problem, Jakob Bohm <=
    - Re: [Help-tar] Extraction performance problem, Paul Eggert, 2015/02/05
    - Re: [Help-tar] Extraction performance problem, Jakob Bohm, 2015/02/06
    - Re: [Help-tar] Extraction performance problem, Paul Eggert, 2015/02/06
    - Re: [Help-tar] Extraction performance problem, Jakob Bohm, 2015/02/13
    - Re: [Help-tar] Extraction performance problem, Paul Eggert, 2015/02/13

Prev by Date: Re: [Help-tar] Extraction performance problem
Next by Date: Re: [Help-tar] Extraction performance problem
Previous by thread: Re: [Help-tar] Extraction performance problem
Next by thread: Re: [Help-tar] Extraction performance problem
Index(es):
- Date
- Thread