bug-tar
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Bug-tar] GNU tar 1.22 creating USTAR not readable by other TAR impl


From: Kent Boortz
Subject: Re: [Bug-tar] GNU tar 1.22 creating USTAR not readable by other TAR implementations
Date: Sat, 19 Dec 2009 20:27:57 +0100
User-agent: Gnus/5.11 (Gnus v5.11) Emacs/22.1 (darwin)

Kent Boortz <address@hidden> ha escrit:
>> the resulting archive might not unpack correctly using some other TAR
>> implementations,

Sergey Poznyakoff <address@hidden> writes:
> Thanks for reporting. Looks like a bug in directory name splitting
> algorithm. Please try the attached patch.

Examining the USTAR package produces before your patch, it had three
TAR headers, the second one with an empty "name", that I guess made
the other TAR implementations think we were done and skipt all but
the first header.

After your patch the GNU TAR and Solaris TAR produces identical three
headers, only difference is that the resulting files are padded
differently, making the GNU TAR one slightly larger. I have no idea
what this padding is for, just zero bytes.

But I think I might have found more bugs :(

This is how I interpret the USTAR format, but could be wrong of course

  - We have 155 characters in the "prefix" field

  - We have 100 characters in the "name" field

  - The strings in "prefix" and "name" can fill the fields, but
    if the string is shorter than the limit, it is null terminated

  - The field "name" can't be empty (is this stated in the standard?)

  - We split on dir/dir or dir/file boundaries (is this stated in the
    standard?)

  - If path ends in a directory, "name" ends with a slash (is this
    stated in standard?)

Some test results below with GNU TAR + the patch. I also added Solaris
results as referense. Note that I do expect some of the below tests to
fail, just trying out "border cases". In some cases just the error
message is a bit misleading.

  GNU directory 155 + file 100 => pass
  Sol directory 155 + file 100 => aaaa...aaaaa: filename is greater than 100

  GNU directory 155 + file  99 => pass
  Sol directory 155 + file  99 => aaaa...aaaaa: filename is greater than 100

  GNU directory 154 + file 100 => aaaa...aaaa/: file name is too long (cannot 
be split); not dumped
  Sol directory 154 + file 100 => aaaa...aaaaa: filename is greater than 100

  GNU directory 100 + file 100 => aaaa...aaaa/: file name is too long (cannot 
be split); not dumped
  Sol directory 100 + file 100 => aaaa...aaaaa: filename is greater than 100

  GNU directory  99 + file 100 => pass
  Sol directory  99 + file 100 => pass

So somehow both Solaris and GNU TAR has a 99 character limitation on
the first directory part that it is to be put into "prefix", EXCEPT
that GNU TAR seems to accept exactly 155 characts as well. I did
expect the first directory part to be accepted if 155 characters and
below, no 99 character limit.

Now the same test, but directory + subdirectory

  GNU directory 155 + subdir 100 => aaaa...aaaaa/bbbb...bbbb/: file name is too 
long (max 256); not dumped
  Sol directory 155 + subdir 100 => aaaa...aaaaa: filename is greater than 100

  GNU directory 155 + subdir  99 => pass
  Sol directory 155 + subdir  99 => aaaa...aaaaa: filename is greater than 100

  GNU directory 154 + subdir  99 => aaaa...aaaa/: file name is too long (cannot 
be split); not dumped
  Sol directory 154 + subdir  99 => aaaa...aaaaa: filename is greater than 100

  GNU directory 100 + subdir  99 => aaaa...aaaa/: file name is too long (cannot 
be split); not dumped
  Sol directory 100 + subdir  99 => aaaa...aaaaa: filename is greater than 100

  GNU directory  99 + subdir 100 => aaaa...aaaaa/bbbb...bbbb/: file name is too 
long (max 256); not dumped
  Sol directory  99 + subdir 100 => bbbb...bbbbb: filename is greater than 100

  GNU directory  99 + subdir 99 => pass
  Sol directory  99 + subdir 99 => pass

Seems that if path end with a directory, the "name" field is to end
with a slash. This is why the limit is different from when the ending
part is a file.

Now, two directory parts and a file part

  GNU directory 100 + 55 + file 100 => aaaa...aaaaa/: file name is too long 
(cannot be split); not dumped
  Sol directory 100 + 55 + file 100 => aaaa...aaaaaa: filename is greater than 
100

  GNU directory 100 + 55 + file  99 => aaaa...aaaaa/: file name is too long 
(cannot be split); not dumped
  Sol directory 100 + 55 + file  99 => aaaa...aaaaaa: filename is greater than 
100

  GNU directory  99 + 55 + file 100 => pass (1)
  Sol directory  99 + 55 + file 100 => pass

  GNU directory  99 + 54 + file 100 => pass
  Sol directory  99 + 54 + file 100 => pass

  GNU directory 99 + 50 + file 50 => pass
  Sol directory 99 + 50 + file 50 => pass

The third line (1) triggers the bug again, making second header to
have nothing in the "name" field.

Worse, in the test (1) the third record is corrupt, "name" is
"cccc....bbbb...." with no "/", i.e. file name is wrong. But "prefix"
is correct.

Finally, three directory parts and a file part

  GNU directory 99 + 54 + file 50 + 50 => aaa.../bbb.../ddd: file name is too 
long (cannot be split); not dumped
  Sol directory 99 + 54 + file 50 + 50 => aaa.../bbb.../ddd: prefix is greater 
than 155

  GNU directory 99 + 54 + file 49 + 50 => pass
  Sol directory 99 + 54 + file 49 + 50 => pass

The "name" field needs a "/" in it, that takes one character.

Maybe it would be a good idea to include some sort of test suite for
this with the GNU TAR sources? Maybe even a test generator that
generates all/most of the permutations that should pass, and border
cases for those that should fail? Including tests that verify that
the headers are correct?

In any case, we have two problems here

 - GNU TAR should do the splitting "correctly". To me it seems it is
   not working that well, likely needs a complete rewrite.

 - GNU TAR should produce USTAR packages that other USTAR
   implementations can read. Lets say USTAR allow the first directory
   part to be up to 155 characters, if most other USTAR
   implementations think the limit is 99 when unpacking, maybe that is
   the limit GNU TAR should use as well.

   No fun if what GNU TAR produces using --format=ustar is not
   readable by other USTAR implementations.

Just what I think, you are the experts, what do you think?

kent

Attachment: tarheader.c
Description: Binary data

-- 
Kent Boortz, Senior Production Engineer
Sun Microsystems Inc., the MySQL team
Office: +46 863 11 363
Mobile: +46 70 279 11 71

reply via email to

[Prev in Thread] Current Thread [Next in Thread]