[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Help-tar] tar + lbzip2 proposal
From: |
ERSEK Laszlo |
Subject: |
[Help-tar] tar + lbzip2 proposal |
Date: |
Tue, 6 Oct 2009 16:06:20 +0200 (CEST) |
Dear GNU Tar Maintainer,
here's an idea to add lbzip2 and other parallel bzip2 implementation support
to GNU Tar. I'm asking for your opinion. I'm willing to implement the
suggested functionality if you accept the proposal (after necessary
amendments, of course).
In my preliminary understanding, the name of the compression program to use
is pointed to by the "use_compress_program_option" global variable. This
variable can be set up in a multitude of ways:
1 -j / --bzip2 and the like set it up by a call to
set_use_compress_program_option() with a fixed argument -- specifying more
than one distinct values via this function (eg. with -j -z) makes tar exit
with an error.
2 -I / --use-compress-program allows the user to specify the argument to
set_use_compress_program_option() directly.
3 -a / --auto-compress selects the compression program at archive creation
time from the name suffix of the file-to-be-created, by calling
set_compression_program_by_suffix(). If this attempt fails because of an
unknown suffix, then tar doesn't override a compression program specified
otherwise (see 1 and 2 above). Thus, when creating an archive, if both -j
(or --use=bzip2) and -a are specified, and -f has argument "file.tar.gz",
then -a takes precendece and gzip will be selected. If -f has argument
'file.tar.qqq', then -j takes effect.
4 If the user didn't specify a compression program via methods 1 or 2, then
at testing/extraction time tar selects the compression program according
to the magic signature stored in the file. If that fails, tar falls back
to the suffix-based method.
open_compressed_archive()
-> compress_program()
-> magic[].program
-> set_compression_program_by_suffix()
-> find_compression_program()
-> compression_suffixes[].program
This list is possibly incomplate and/or inaccurate. It would be important to
identify all write access sites to "use_compress_program_option"; please
verify the list! Thank you.
The array "compression_suffixes" could be static, I think, just like "magic"
is.
In general, --use-compress-program cannot be added to TAR_OPTIONS.
Proposal:
* Introduce new global variable "bzip2_filter", with default value "bzip2".
The variable has type "const char *".
* Introduce new command line option "--bzip-filter" to change the value of
the variable "bzip2_filter". Thus the options requires an argument. The
option can be passed only once on the command line and only before setting
"use_compress_program_option" in any way.
* The character array pointed to by "bzip2_filter" lives in either static
storage (default "bzip2") or automatic storage (parameter to main()). It
can't be modified or freed.
* Modify case 1 (-j / --bzip2) to pass the value of "bzip2_filter" to
set_use_compress_program_option(), instead of a fixed "bzip2" string.
* Case 2 is unchanged.
* Change the compress_program() macro definition into a real static function
that handles the bz2 magic value as an exception, and returns the value of
"bzip2_filter". The strings currently returned by compress_program() from
magic[] also have static storage class.
* Change set_compression_program_by_suffix() to handle the bzip2 suffixes as
exceptions, and to return the value of "bzip2_filter". The strings
currently returned by this function from compression_suffixes[] also have
static storage class.
* Due to the last two points, the auto-selection methods in 3 and 4 will use
the program passed by --bzip-filter (or per default bzip2) where bzip2 is
auto-selected now.
* As development advances, more and more multi-threaded alternaties might be
added to tar, with --gzip-filter for pigz, for example. Once the
exceptions in compress_program() and set_compression_program_by_suffix()
start to proliferate, flat tables would become desirable again, ie.
extending the current magic[] and compression_suffixes[] arrays with
pointers to global variables, each holding the selected alternative for
that family of compression. Maybe this is the preferred way to start out
with even now.
* Usage: user prepends "--bzip2-filter=lbzip2" to her TAR_OPTIONS.
* On Debian, the tar source could be patched, so that "bzip2_filter"
defaults to "/etc/alternatives/bzip2-filter", which would be a symlink to
/bin/bzip2 per default. Packages like "lbzip2" and "pbzip2" would add
alternatives.
I'm greatly interested in your opinion,
thanks,
lacos
- [Help-tar] tar + lbzip2 proposal,
ERSEK Laszlo <=