help-tar
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Help-tar] tar + lbzip2 proposal


From: ERSEK Laszlo
Subject: [Help-tar] tar + lbzip2 proposal
Date: Tue, 6 Oct 2009 16:06:20 +0200 (CEST)

Dear GNU Tar Maintainer,


here's an idea to add lbzip2 and other parallel bzip2 implementation support
to GNU Tar. I'm asking for your opinion. I'm willing to implement the
suggested functionality if you accept the proposal (after necessary
amendments, of course).

In my preliminary understanding, the name of the compression program to use
is pointed to by the "use_compress_program_option" global variable. This
variable can be set up in a multitude of ways:

1 -j / --bzip2 and the like set it up by a call to
  set_use_compress_program_option() with a fixed argument -- specifying more
  than one distinct values via this function (eg. with -j -z) makes tar exit
  with an error.

2 -I / --use-compress-program allows the user to specify the argument to
  set_use_compress_program_option() directly.

3 -a / --auto-compress selects the compression program at archive creation
  time from the name suffix of the file-to-be-created, by calling
  set_compression_program_by_suffix(). If this attempt fails because of an
  unknown suffix, then tar doesn't override a compression program specified
  otherwise (see 1 and 2 above). Thus, when creating an archive, if both -j
  (or --use=bzip2) and -a are specified, and -f has argument "file.tar.gz",
  then -a takes precendece and gzip will be selected. If -f has argument
  'file.tar.qqq', then -j takes effect.

4 If the user didn't specify a compression program via methods 1 or 2, then
  at testing/extraction time tar selects the compression program according
  to the magic signature stored in the file. If that fails, tar falls back
  to the suffix-based method.

  open_compressed_archive()
    -> compress_program()
      -> magic[].program
    -> set_compression_program_by_suffix()
      -> find_compression_program()
        -> compression_suffixes[].program

This list is possibly incomplate and/or inaccurate. It would be important to
identify all write access sites to "use_compress_program_option"; please
verify the list! Thank you.

The array "compression_suffixes" could be static, I think, just like "magic"
is.

In general, --use-compress-program cannot be added to TAR_OPTIONS.


Proposal:

* Introduce new global variable "bzip2_filter", with default value "bzip2".
  The variable has type "const char *".

* Introduce new command line option "--bzip-filter" to change the value of
  the variable "bzip2_filter". Thus the options requires an argument. The
  option can be passed only once on the command line and only before setting
  "use_compress_program_option" in any way.

* The character array pointed to by "bzip2_filter" lives in either static
  storage (default "bzip2") or automatic storage (parameter to main()). It
  can't be modified or freed.

* Modify case 1 (-j / --bzip2) to pass the value of "bzip2_filter" to
  set_use_compress_program_option(), instead of a fixed "bzip2" string.

* Case 2 is unchanged.

* Change the compress_program() macro definition into a real static function
  that handles the bz2 magic value as an exception, and returns the value of
  "bzip2_filter". The strings currently returned by compress_program() from
  magic[] also have static storage class.

* Change set_compression_program_by_suffix() to handle the bzip2 suffixes as
  exceptions, and to return the value of "bzip2_filter". The strings
  currently returned by this function from compression_suffixes[] also have
  static storage class.

* Due to the last two points, the auto-selection methods in 3 and 4 will use
  the program passed by --bzip-filter (or per default bzip2) where bzip2 is
  auto-selected now.

* As development advances, more and more multi-threaded alternaties might be
  added to tar, with --gzip-filter for pigz, for example. Once the
  exceptions in compress_program() and set_compression_program_by_suffix()
  start to proliferate, flat tables would become desirable again, ie.
  extending the current magic[] and compression_suffixes[] arrays with
  pointers to global variables, each holding the selected alternative for
  that family of compression. Maybe this is the preferred way to start out
  with even now.

* Usage: user prepends "--bzip2-filter=lbzip2" to her TAR_OPTIONS.

* On Debian, the tar source could be patched, so that "bzip2_filter"
  defaults to "/etc/alternatives/bzip2-filter", which would be a symlink to
  /bin/bzip2 per default. Packages like "lbzip2" and "pbzip2" would add
  alternatives.


I'm greatly interested in your opinion,
thanks,
lacos




reply via email to

[Prev in Thread] Current Thread [Next in Thread]