bug-gnulib
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: bugs in dirname module


From: Eric Blake
Subject: Re: bugs in dirname module
Date: Mon, 07 Nov 2005 06:42:20 -0700
User-agent: Mozilla Thunderbird 1.0.2 (Windows/20050317)

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

According to Paul Eggert on 11/6/2005 3:59 AM:
> 
>>My approach was to add a dependency on the c-ctype module to use c_isalpha
> 
> That seems a bit overkill here, since you can assume ASCII and DOS.
> Why not use this instead?
> 
> #define c_isalpha(c) (((unsigned int) (c) | ('a' - 'A')) - 'a' <= 'z' - 'a')

Agreed - all platforms with drive letter prefixes are based on ASCII
encodings, so I can resubmit without a dependency on c-ctype.  Should I
add a dependency on the verify module to ensure that drive letter prefixes
are only used on ASCII platforms?  Should I still move
FILE_SYSTEM_PREFIX_LEN out of config.h and into dirname.h?

> 
>>Second, cygwin currently normalizes all drive letters as absolute paths,
>>which is different from djgpp
> 
> 
> Sigh.  This stuff will probably break a lot of code, not just dirname.
> Do we really want to bother with it?  Perhaps we should just tell
> users not to mess with names like "c:foo".  I assume they're pretty
> rare.  Will that simplify the code here and elsewhere?

That was the intent of tests/test-dirname.c - a good listing of all sorts
of problematic filenames, and their division into base_name and dir_name
on the three styles of platforms (unix with no drive letters, cygwin with
drive letters always starting an absolute path, and DOS with drive letters
potentially starting a relative path).  I made the code as simple as I
could while making it correct for all three cases.

> 
> 
>>+             {
>>+               if (prefix_len)
>>+                 base = p;
>>+               else if (! DOUBLE_SLASH_IS_DISTINCT_ROOT || 2 < p - base)
>>+                 base = p - 1;
>>+             }
> 
> 
> This code doesn't seem to match the new comments for basename.

Agreed that the comment does not quite match the code.  But it is doing
exactly what I intended - this block is only invoked when the filename
consists of just a prefix and a sequence of slashes.  If there was a
prefix, it results in the empty string; otherwise, it results in the final
one or two slashes (turning c:/, c://, c:///, etc into ''; turning ///,
////, etc into the final /, leaving / alone, and leaving // alone only if
DOUBLE_SLASH_IS_DISTINCT_ROOT).

> Furthermore, I don't see why the basename of "A:/" should be "/".  or
> the basename of "//" should be "//".  The point of basename and
> dirname is that if you concatenate them, with a slash between them,
> then you get the same file.  But I don't see how this rule generalizes
> to DOS under the proposed code.

Your formulation is not quite correct.  POSIX poses the rule as: if
"$string" is a valid filename, then 'cd "$(dirname "$string")"; ls
"$(basename "$string")"' lists the same file (although POSIX fails to
mention that this will fail if the filename contains a trailing newline).
 Note that POSIX requires basename("/") to return "/", and basename("//")
to return either "/" or "//".  ALL other names subjected to basename()
will never contain a slash.  Likewise, POSIX requires dirname("/") to
return "/" and dirname("//") to return "//", and all other names subjected
to dirname() will never contain a trailing slash.  On platforms where //
is special, your formulation of dirname("//")/basename("//") results in
"/////", which is certainly not the same as "//".  But "cd //; ls //" does
indeed list the same file as the original "ls //".

Now, on applying that to systems with drive letters.  If basename("c:/")
returns "/", we have violated the POSIX rule of thumb: "cd c:/; ls /" is
much different than "ls c:/".  Yet without my patch, this is what
base_name was doing.  Part of the issue here is that POSIX lets
basename("c:/") modify its argument, but that our definition of base_name
is that we can only return a pointer to somewhere inside the passed
argument without modification.  My patch makes base_name("c:/") return "".
 The other alternative is to treat c:/ like a root, and just like
base_name("/") returns "/", have base_name("c:/") return "c:/".  But by
your earlier rule of appending a slash between the dir_name and base_name,
my patch would produce "c://" on DOS, which is still a valid spelling of
c:/, whereas having base_name return c:/ would create the invalid name
"c://c:/".

For that matter, would it help to redefine our base_name to ALWAYS return
the empty string if the argument is a root directory?  Then the rule of
recreating a valid filename becomes simpler - concatenate dir_name (less a
trailing slash), a slash, and the base_name.  Then for "//" on platforms
where it is special, you get "//" less one slash, /, and "", resulting in
"//".  Yes, this is different than POSIX basename(), but we already know
that base_name is not an exact match to POSIX.  My only worry about
changing semantics like that is that it may break existing uses of base_name.

Another thing to think about.  Currently, strip_trailing_slashes("///")
currently calls base_name("///"), gets a single "/", skips past that
slash, then returns false (leaving the user with "///" still).  But it
might be more intuitive if it changed the input string to "/\0/" and
returned true.

> 
> Many of the rest of the changes look pretty complicated, and I didn't
> take the time to review them, since I want to know what dirname and
> basename are supposed to do before worrying about the details.

I guess the first thing to review, then, would be my testsuite (which did
pass in all three configurations - unix, cygwin, and DOS).  If you can
agree that every one of those proposed filename divisions into a directory
and basename makes sense, then your review becomes a matter of validating
the codepaths to ensure that the testsuite is always passed, and that the
testsuite is comprehensive enough.

I am also working on a followup patch to coreutils that fixes basename(1)
(POSIX requires that "basename // /" return "//" on platforms where // is
special), as well as improving the documentation and testsuite of basename
and dirname.

- --
Life is short - so eat dessert first!

Eric Blake             address@hidden
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.1 (Cygwin)
Comment: Public key at home.comcast.net/~ericblake/eblake.gpg
Comment: Using GnuPG with Thunderbird - http://enigmail.mozdev.org

iD8DBQFDb1m884KuGfSFAYARApbUAKCnBiXIpLOF4eLc4aknfRSbjJabDwCgz+zu
Fe3X89Qb2GvODfEWZIaPAmI=
=348K
-----END PGP SIGNATURE-----




reply via email to

[Prev in Thread] Current Thread [Next in Thread]