bug-tar
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Bug-tar] star <-> GNU tar interchange issue


From: Joerg Schilling
Subject: Re: [Bug-tar] star <-> GNU tar interchange issue
Date: Mon, 25 Mar 2013 13:57:42 +0100
User-agent: nail 11.22 3/20/05

Mark <address@hidden> wrote:

> Hi,
>
> I noticed a strange problem when using GNU tar to unpack an archive which
> I had created using star. I'm not sure whether this is a bug with star or
> GNU tar. The problem seems to be related to a file and/or directory name
> having an o-umlaut character (ö).
>
> I used this command to create the archive:
> star -c f=test_star.tar artype=xustar -numeric -sparse -force-hole
> "Sven-Göran Eriksson's World Challenge SLES-50852"

What kind of locale are you using?
>From the attached tar archive, it looks like you are using an UTF-8 based 
locale wich on UNIX is still less probable than using a ISO-8859-1 based 
locale. 

> However if I use GNU tar to list the archive:
> # tar -tf test_star.tar
> Sven-Göran Eriksson's World Challenge SLES-50852/
> Sven-Göran Eriksson's World Challenge
> SLES-50852/Sven-Göran_Eriksson's_World_Challenge_SLES-50852_A0100414717-0101___78_hashes.txt
> Sven-Göran Eriksson's World Challenge
> SLES-50852/Sven-Göran_Eriksson's_World_Challenge_SLES-50852_A0100414717-0101___78.toc
> Sven-GÃ?ran Eriksson's World Challenge
> SLES-50852/Sven-GÃ?ran_Eriksson's_World_Challenge_SLES-50852_A0100414717-0101___78.bin
> Sven-Göran Eriksson's World Challenge
> SLES-50852/Sven-Göran_Eriksson's_World_Challenge_SLES-50852_A0100414717-0101___78_TOC.bin
> Sven-Göran Eriksson's World Challenge
> SLES-50852/Sven-Göran_Eriksson's_World_Challenge_SLES-50852_A0100414717-0101___78.bin.sfv
> Sven-Göran Eriksson's World Challenge
> SLES-50852/Sven-Göran_Eriksson's_World_Challenge_SLES-50852_A0100414717-0101___78.bin.md5
>
> Notice that the ö character is replaced by Ã? in the line for the
> ...78.bin file. If I use GNU tar to unpack the archive, two directories
> are created, corresponding to the correct and bogus directory names in the
> GNU tar listing.
>
> I looked at a hex dump of the test_star.tar archive. For all files except
> the ...78.bin file, the o-umlaut character is represented by two bytes:
> 0xC3 0xB6. For the ...78.bin file the o-umlauts are represented by C3 83
> C2 B6 (see offsets 0x0C69 and 0x0C9D in the file).
>
> The archive is attached (1008 bytes compressed) if anyone feels like
> investigating this issue.

In case of a sparse file, the prefix path part of a POSIX.1-1988 tar header is 
reused for sparse information and for this reason, the filename does not fit 
into the simple tar header anymore.

For this reason, the filename is expressed in an extended tar header.

Filenames in extended tar headers need to be converted into UTF-8 and as star 
still asumes the most probable ISO-8859-1 on UNIX, there is an unneeded 
conversion.

If people start using UTF-8, it seems to be the time to let star check for the 
real locale encoding....

Jörg

-- 
 EMail:address@hidden (home) Jörg Schilling D-13353 Berlin
       address@hidden                (uni)  
       address@hidden (work) Blog: http://schily.blogspot.com/
 URL:  http://cdrecord.berlios.de/private/ ftp://ftp.berlios.de/pub/schily



reply via email to

[Prev in Thread] Current Thread [Next in Thread]