bug-tar
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Bug-tar] Making recovery from corrupted tarballs more reliable


From: Marc Aurele La France
Subject: [Bug-tar] Making recovery from corrupted tarballs more reliable
Date: Thu, 28 Oct 2004 10:33:28 -0600 (MDT)

Hi.

I was recently faced with the task of recovering as much as I could out of a corrupted tarball of some 600GB in size. Attached are changes to GNU tar that greatly facilitated this process. The changes are described below. Do with these as you see fit.

The changes are not the full story however. I ended up writing a small utility that looked for unaligned headers before piping the stream to tar. I have not spent the time to incorporate this later functionality into GNU tar.

There is also the matter that there is no guarantee that the contents of all files I ended up with are actually valid, given that the tar format does not allow for the checksumming of file data blocks.

The changes were originally developed against GNU tar 1.14, and only affect list.c. I have verified that they apply cleanly to today's cvs HEAD at Savannah.

The changes affect the following:

1) I found read_header() to be much too lax about what it considers to be a
   valid header before making decisions based on that header.  The changes
   cause read_header() to return HEADER_FAILURE early on (after the
   HEADER_ZERO_BLOCK check) if one of two additional checks fails:

   a) The header must contain a valid magic field.  Potential breakage here is
      that this check assumes the magic field is a null string in V7 tarballs.

   b) The format of the header's chksum field is as produced by GNU tar.
      Potential breakage here, of course, is that other tar's might produce a
      differently formatted chksum.

2) The remainder of this diff affects the --block-number option:

   a) Print the block number on the same unit (stdlis or stderr) as the message
      to be thus prefixed;

   b) Prefix the block number to more of list.c's messages;

   c) Change the block number printed to be the number of the very first block
      relevant to the current filename, rather than the number of the block
      current when the message is produced.

   The only "odd" behaviour I've noticed so far with these --block-number
   changes is an occasional double prefix when stdout & stderr are directed to
   the same unit.

Please reply should you have concerns regarding these changes.

Thanks.

Marc.

+----------------------------------+-----------------------------------+
|  Marc Aurele La France           |  work:   1-780-492-9310           |
|  Computing and Network Services  |  fax:    1-780-492-1729           |
|  352 General Services Building   |  email:  address@hidden          |
|  University of Alberta           +-----------------------------------+
|  Edmonton, Alberta               |                                   |
|  T6G 2H1                         |     Standard disclaimers apply    |
|  CANADA                          |                                   |
+----------------------------------+-----------------------------------+
XFree86 developer and VP.  ATI driver and X server internals.

Attachment: tar-20041028.diff.gz
Description: Binary data


reply via email to

[Prev in Thread] Current Thread [Next in Thread]