bug-gnulib
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Bug-tar] Wildcards do not match invalid characters


From: Micah Cowan
Subject: Re: [Bug-tar] Wildcards do not match invalid characters
Date: Thu, 07 Feb 2008 16:33:45 -0800
User-agent: Thunderbird 2.0.0.9 (X11/20071031)

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Bruno Haible wrote:
> Any volunteer wants to write a 'mbsfnmatch' function that works like fnmatch
> but supports invalid byte sequences?

(I've removed bug-tar from the Cc list but left everyone else; I hope
that's as it should be.)

Wget is in need of such a facility as well:
http://article.gmane.org/gmane.comp.web.wget.patches/2233

Or, possibly, a "c-fnmatch" would suit our needs more. Wget is currently
locale/character set unaware; while we'd like to change that in the
future, in the meantime we need things to "work" :) ... in any case, it
could be a challenge to figure out the encoding used for remote
filenames on an FTP server.

What would be involved in writing such a facility? I might be interested
in doing so, but need a clearer picture of what it would be.

It appears, from looking at the current code, that the current
mbs-handling fnmatch() simply converts the strings to wcs format, and
then passes them to internal_fnwmatch().

One dead-simple approach would be that whenever an unrecognized byte is
found, it is simply expanded to its wide-character version. This would
end up doing the right thing if the locale is UTF-8 but the input string
is in ISO-8859-1. It would be less functional for other encodings,
including the other ISO-8859-* ones: character classification would be
munged.

OTOH, perhaps it's better not to let such characters be mapped to real
wide characters at all, so that they'll work fine for * and ?, but fail
all character-classification tests (or perhaps succeed at one specific
one we've chosen for such cases). Perhaps the WEOF value (where
available) could be used for this purpose (but care might be needed to
ensure we don't pass it on
to standard library functions).

- --
Micah J. Cowan
Programmer, musician, typesetting enthusiast, gamer...
http://micah.cowan.name/
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.7 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iD8DBQFHq6Np7M8hyUobTrERAhgjAJ9jhieW0x0UccmRYLNK6LZfW37EcACeO/gN
mN8ASZrkx37VS6A2N8jHsLg=
=RZnm
-----END PGP SIGNATURE-----




reply via email to

[Prev in Thread] Current Thread [Next in Thread]