[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: Decorrupting a .tar file
From: |
Jakob Bohm |
Subject: |
Re: Decorrupting a .tar file |
Date: |
Thu, 19 Nov 2020 04:16:52 +0100 |
User-agent: |
Mozilla/5.0 (Windows NT 6.3; Win64; x64; rv:78.0) Gecko/20100101 Thunderbird/78.4.0 |
I don't know if the tool you linked to would do the equivalent of large
memory "perl -pe 'BEGIN{binmode(STDIN);binmode(STDOUT);};s/\r\n/\n/sg'
which is a one-line version of what I suggested.
GNU od and the xxd tool from the vim package are the major Linux hex
editors for giant files.
Byte level copying to extract the file portion between some file offsets
can be done with clever use of the "dd bs=1" command.
In your previous mail, you are wrong about the worst case scenario in
search/replace.
If an original file contained an actual \r\n (0D 0A hex), a clean
dos2unix-like tool would wrongly reduce it to a single \n (single 0A byte).
On the other hand a single \r in the original might be mangled to \r\n
by whatever damaged the file, causing it to be further mangled to \n by
a dos2unix like tool.
On 2020-11-17 23:55, I. Hope Nothing wrote:
On Tue, 17 Nov 2020 at 21:53, I. Hope Nothing
<ihopenothinghappens@gmail.com <mailto:ihopenothinghappens@gmail.com>>
wrote:
3. Look up the tar file format specifications, it is actually a
relatively simple file format and you will need to understand it
to do the manual data rescue. In particular, you will need to
understand the PAX and GNU extensions to the format.
This is one of the first things that came to my mind. So far I know
of the following sources of information:
- The GNU Tar documentation and source code
- Schily's star documentation and source code
**If you know of any other source code of specifications I should be
aware of, please let me know.**
Try searching for the POSIX specification for the PAX format on which
modern tars are based, for example that com.apple.acl.text is probably
a custom PAX attribute (PAX is an extensible tar variant trying to
create peace between warring UNIX factions each with their own tar
modifications, including the old GNU tar).
The PAX-2001 specification (maybe in draft form) is inside the manpage
at (scroll down to EXTENDED DESCRIPTION):
https://pubs.opengroup.org/onlinepubs/9699919799/utilities/pax.html
A nice summary can be found at
https://en.wikipedia.org/wiki/Tar_(computing)
Fundamentally, a PAX-2001 file is like this:
For each file:
1. Optionally A file entry like #2-4 storing PAX extended header
attributes of later files instead of actual file data.
2. A fixed format "line" specifying the file name, size and some
traditional attributes.
3. The actual file bytes, exactly as many bytes as the header line said.
4. Zero Padding to 512-byte boundary
Thus any file corruption that ads or removes \r bytes would cause the
next file to start at an offset that is not a multiple of 512 bytes,
thus messing up tar listings.
Small correction: It seems GNU tar still doesn't default to outputting
the PAX-based "posix" format unless requested.
P.S. I just realized something that is probably quite relevant. That
.tar file that has been corrupted was probably originally created with
Mac OS X Tar. This makes sense considering that this is the machine I
was backing up, and there are tell-tale signs and strings of "Mac OS X
formatedness" in the .tar file, such as "com.apple.acl.text", "Mac OS
X", and various permissions clues that indicate the original tar file
was created on a Mac OS X machine.
This means that the original implementation of Tar *may* have been
derived from BSD Tar (I have both GNU Tar and stock system Tar installed
on that machine). So this gives me yet another set of specifications to
look through.
There was a messy point in history, years ago, when BSD Tar was patched
on Darwin/Mac OS X to store resource forks. I think (hope) those days
are hopefully now behind us. I will ask around in Darwin/Mac OS X
hacker groups to see if I can get any further information.
Enjoy
Jakob
--
Jakob Bohm, CIO, Partner, WiseMo A/S. https://www.wisemo.com
Transformervej 29, 2860 Søborg, Denmark. Direct +45 31 13 16 10
This public discussion message is non-binding and may contain errors.
WiseMo - Remote Service Management for PCs, Phones and Embedded