bug-cpio
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Bug-cpio] cpio can't read large files


From: Sean Fulton
Subject: Re: [Bug-cpio] cpio can't read large files
Date: Sun, 15 Jan 2006 16:36:30 -0500
User-agent: Mozilla Thunderbird 1.0.7 (Windows/20050923)

Glitch: Directories read off of the tar archive have a trailing slash, while those read off of find don't. Any ideas?

sean


Sergey Poznyakoff wrote:

Hi Sean,

Apologies it took me so long to reply.

# Now back it up to the backup host
       find $filesys -mount $SKIPSTRING -print | \
       tee $LISTFILE | /bin/cpio -o | gzip | \
       ssh -l root $BU_HOST \
       "dd of=/backups/$MYHOST/$DAY/$c_filesys.cpio.gz"

I am not quite sure what $SKIPSTRING is, if I am not mistaken -mount
option to find does not take any arguments. Apart from this, the above command is equivalent to the following:

tar --rsh-command=/usr/bin/ssh \
   --one-file-system \
   -c -z -f $BU_HOST:/backups/$MYHOST/$DAY/$c_filesys.cpio.gz $filesys

(Of course, the exact path to ssh utility can differ). If $BU_HOST has
rmt command installed in an unusual location (tar --show-defaults will
show you where tar expects it to be located), you will need to add also
the following option: --rmt-command=/path/to/rmt
# Now send over the list for verification
       cat $LISTFILE | ssh -l root $BU_HOST \
       dd of=/backups/$MYHOST/$DAY/index.$c_filesys

# Now Verify what we sent (NOTE: we do the zcat/cpio on the localhost, not the BUHOST because the BUHOST handles six different machines at the same time. This reduces the workload while allowing us to send compressed data across the LAN).
       ssh -l root $BU_HOST \
"dd if=/backups/$MYHOST/$DAY/$c_filesys.cpio.gz" | zcat | cpio -it >$OUTFILE

# Compare OUTFILE/LISTFILE

       diff $LISTFILE $OUTFILE >$DIFILE

To reproduce exactly that, we will have to change the above tar
invocation adding several new options:

tar --rsh-command=/usr/bin/ssh \
   --one-file-system \
   -c -z -f $BU_HOST:/backups/$MYHOST/$DAY/$c_filesys.cpio.gz \
   --verbose --index-file $LISTFILE --show-stored-names
   $filesys

In the above invocation, --verbose prints verbose file listing,
--index-file redirects it to the given file, and --show-stored-names shows
file names as stored in the archive, not the absolute pathnames. This
possibly requires a clarification: by default GNU tar will not store
absolute filenames in the archive, instead it will strip the leading
file hierarchy suffix and store the "stripped name" in the archive. This
is done to prevent accidental overwriting of vital data while extracting
from the archive. You can disable this feature using -P option. In this
case you will not need to specify --show-stored-names (This option
appeared in the CVS version of GNU tar. Its buildable snapshots are
available from ftp://download.gnu.org.ua/pub/alpha/tar).

Now, the two following commands will do the verification:

tar --rsh-command=/usr/bin/ssh \
   -t -z -f $BU_HOST:/backups/$MYHOST/$DAY/$c_filesys.cpio.gz \
--index-file $OUTFILE diff $LISTFILE $OUTFILE >$DIFILE

Notice, that GNU tar offers a verify mode, during which it will
compare not only file names but also file contents and meta-data. This
mode, however, currently works only for plain, non-comressed archives.

We've tweaked this over the years to compensate for different issues. The main thing we want is a list of what is getting backed up, then a list of what was backed up. The problem we run into by trusting the backup utility to make the LISTFILE is that if the backup utility doesn't see the file, it wouldn't show up in LISTFILE, whereas find is pretty thorough and gives us an independent check of what is on the "tape."

Well, if you prefer to use find, then the archive creation command should be
changed as follows:

find $filesys -mount $SKIPSTRING -print |
tee $LISTFILE |
tar --rsh-command=/usr/bin/ssh \
   --one-file-system \
   -c -z -f $BU_HOST:/backups/$MYHOST/$DAY/$c_filesys.cpio.gz \
   -P -T -
   $filesys
# `-T -' option takes a list of files to be archived from the stdin
tar --rsh-command=/usr/bin/ssh \
   -t -z -f $BU_HOST:/backups/$MYHOST/$DAY/$c_filesys.cpio.gz \
-P --index-file $OUTFILE diff $LISTFILE $OUTFILE >$DIFILE

Here, we will have to use -P in both cases, to match the listing produced
by find.

BTW: I've been reading that bzip2 might be better than gzip for compression because you can recover from corrupt files and it compresses better. Any thoughts on this?

It surely compresses much better than gzip. However, it also takes much
longer to compress.

Regards,
Sergey


--
Sean Fulton
GCN Publishing, Inc.
Internet Design, Development and Consulting For Today's Media Companies
http://www.gcnpublishing.com
(914) 937-2451, x203






reply via email to

[Prev in Thread] Current Thread [Next in Thread]