bug-tar
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Bug-tar] Wildcards do not match invalid characters


From: jlh
Subject: Re: [Bug-tar] Wildcards do not match invalid characters
Date: Thu, 07 Feb 2008 01:20:28 +0100
User-agent: Thunderbird 2.0.0.9 (X11/20071118)

Hello list!

jlh wrote:
> export LC_ALL=en_US.utf8
> $ touch $(echo -en 'file-\0344')
> $ tar -vcf my.tar file-*
> file-\344
> $ tar -tf my.tar
> file-\344
> $ tar -tf my.tar --wildcards '*'
> tar: *: Not found in archive
> tar: Error exit delayed from previous errors

Ok, here's an update.  I could track down the cause of this
problem.  In order to match file names to patterns, tar uses the
fnmatch(3), which is provided by glibc.  This happens in
lib/exclude.c:149:exclude_fnmatch().  fnmatch() is documented to
return 0 on a successful match, FNM_NOMATCH (defined to be 1) on a
not-match, and anything else on error.  exclude_fnmatch() only
compares the return value to 0 and thus treats a non-match and an
error the same way.  The particular problem I'm experiencing
triggered an error and fnmatch() indeed returns -1, which means an
error happened and perror() says "Invalid or incomplete multibyte
or wide character".  The message is correct, since the byte is
invalid in utf8, but I was under the impression that a path
component may consist of any sequence of non-nul, non-slash bytes.
Since fnmatch() is specially aimed at matching paths I would think
it should also handle the cases where a path component contains
arbitrary bytes.  I've been able to reproduce this error as a
stand-alone small test-case that calls fnmatch(), so this is not a
tar problem anymore (excepted that tar doesn't check for errors).
I will take it to the glibc list.

One other comment: I also noticed that tar makes the call to
fnmatch with the flag value 0x50000008 in this particular case.
The low bit corresponds to the flag FNM_LEADING_DIR, but the two
high bits have no meaning to fnmatch() as far as I can see,
they're only used by tar itself for internal use.  Does it say
somewhere that one may set undefined bits in flags and expect
things to still work?  It seems to work here, but I thought I'd
comment on this.

Thanks,
jlh




reply via email to

[Prev in Thread] Current Thread [Next in Thread]