help-tar
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Decorrupting a .tar file


From: Jakob Bohm
Subject: Re: Decorrupting a .tar file
Date: Thu, 19 Nov 2020 04:16:52 +0100
User-agent: Mozilla/5.0 (Windows NT 6.3; Win64; x64; rv:78.0) Gecko/20100101 Thunderbird/78.4.0

I don't know if the tool you linked to would do the equivalent of large memory "perl -pe 'BEGIN{binmode(STDIN);binmode(STDOUT);};s/\r\n/\n/sg' which is a one-line version of what I suggested.

GNU od and the xxd tool from the vim package are the major Linux hex editors for giant files.

Byte level copying to extract the file portion between some file offsets can be done with clever use of the "dd bs=1" command.


In your previous mail, you are wrong about the worst case scenario in search/replace.

If an original file contained an actual \r\n (0D 0A hex), a clean dos2unix-like tool would wrongly reduce it to a single \n (single 0A byte).

On the other hand a single \r in the original might be mangled to \r\n by whatever damaged the file, causing it to be further mangled to \n by a dos2unix like tool.

On 2020-11-17 23:55, I. Hope Nothing wrote:
On Tue, 17 Nov 2020 at 21:53, I. Hope Nothing <ihopenothinghappens@gmail.com <mailto:ihopenothinghappens@gmail.com>> wrote:

        3. Look up the tar file format specifications, it is actually a
            relatively simple file format and you will need to understand it
            to do the manual data rescue.  In particular, you will need to
            understand the PAX and GNU extensions to the format.


    This is one of the first things that came to my mind.  So far I know
    of the following sources of information:

    -   The GNU Tar documentation and source code
    -   Schily's star documentation and source code

    **If you know of any other source code of specifications I should be
    aware of, please let me know.**


Try searching for the POSIX specification for the PAX format on which
modern tars are based, for example that com.apple.acl.text is probably
a custom PAX attribute (PAX is an extensible tar variant trying to
create peace between warring UNIX factions each with their own tar
modifications, including the old GNU tar).

The PAX-2001 specification (maybe in draft form) is inside the manpage
at (scroll down to EXTENDED DESCRIPTION):
  https://pubs.opengroup.org/onlinepubs/9699919799/utilities/pax.html

A nice summary can be found at
  https://en.wikipedia.org/wiki/Tar_(computing)

Fundamentally, a PAX-2001 file is like this:

For each file:

1. Optionally A file entry like #2-4 storing PAX extended header
  attributes of later files instead of actual file data.
2. A fixed format "line" specifying the file name, size and some
  traditional attributes.
3. The actual file bytes, exactly as many bytes as the header line said.
4. Zero Padding to 512-byte boundary

Thus any file corruption that ads or removes \r bytes would cause the
next file to start at an offset that is not a multiple of 512 bytes,
thus messing up tar listings.

Small correction: It seems GNU tar still doesn't default to outputting the PAX-based "posix" format unless requested.



P.S. I just realized something that is probably quite relevant.  That .tar file that has been corrupted was probably originally created with Mac OS X Tar.  This makes sense considering that this is the machine I was backing up, and there are tell-tale signs and strings of "Mac OS X formatedness" in the .tar file, such as "com.apple.acl.text", "Mac OS X", and various permissions clues that indicate the original tar file was created on a Mac OS X machine.

This means that the original implementation of Tar *may* have been derived from BSD Tar (I have both GNU Tar and stock system Tar installed on that machine).  So this gives me yet another set of specifications to look through.

There was a messy point in history, years ago, when BSD Tar was patched on Darwin/Mac OS X to store resource forks.  I think (hope) those days are hopefully now behind us.  I will ask around in Darwin/Mac OS X hacker groups to see if I can get any further information.


Enjoy

Jakob
--
Jakob Bohm, CIO, Partner, WiseMo A/S.  https://www.wisemo.com
Transformervej 29, 2860 Søborg, Denmark.  Direct +45 31 13 16 10
This public discussion message is non-binding and may contain errors.
WiseMo - Remote Service Management for PCs, Phones and Embedded



reply via email to

[Prev in Thread] Current Thread [Next in Thread]