[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

default quoting style

From: Eric Blake
Subject: default quoting style
Date: Wed, 29 Mar 2006 07:37:12 -0700
User-agent: Thunderbird 1.5 (Windows/20051201)

Hash: SHA1

A question was recently raised on cygwin about why 'ls
- --show-control-chars' could show accented characters (8-bit characters in
the Latin1 encoding), but filenames listed in error messages or
interactive prompts for all the other coreutils (ex. rm -i) only list
octal constants for the same characters.  I traced it to the fact that by
default, ls defaults to the 'literal' quoting style, but most other
printouts of filenames go through quote() or quote_n() which is hard-coded
to the 'locale' quoting style.  In literal mode, non-ASCII characters are
passed through untouched, but in the 'locale' style, anything that fails
isprint() is converted to an octal constant.  Unfortunately, since cygwin
uses newlib and newlib does not support locales other than "C", cygwin's
isprint() is hard-coded to the range 0x20 through 0x7e, explaining why all
the non-ASCII characters are converted.

This raises a couple of questions.  First, when MB_CUR_MAX is 1, the 'ls
- --show-control-chars' option currently keys off of !isprint() for deciding
whether to replace characters with '?'.  But based on the option name,
wouldn't it be better if this used iscntrl() instead, letting 8-bit
characters through to the console so that 'ls' would list accented
characters without having to use --show-control-characters?  Note that
- --show-control-characters is independent of the quoting style, and that it
is dangerous in the presence of filename characters 0x7f and in the range
0x0 through 0x1f (although these characters are rare in cygwin - they are
forbidden by Windows, so they only exist in cygwin managed mounts).

Second, ls honors the environment variable QUOTING_STYLE to change its
default.  Should quote_n() in lib/quote.c do likewise, rather than
hardcoding a default of 'locale'?  If this were done, then cygwin users
could set QUOTING_STYLE=literal and see accented characters instead of
octals in the various coreutils messages (although in the rare case of
file names with ASCII control characters, this could mess up the terminal).

Third, should we add a way to make quoting styles pass non-printing but
non-control characters through untouched, converting only control
characters to octal?  'literal' is too permissive, but 'locale' currently
filters based on isprint() instead of !iscntrl().  In some cases, you
really do want all non-ASCII characters converted to octal, but when
printing filenames to a terminal, the terminal font generally has distinct
glyphs for all of the 8-bit values, and no 8-bit value is interpreted as a
terminal control character (this doesn't help if the glyphs don't match
the encoding of the filename, but this is no different than the current
displayed name of 'ls --show-control-chars').  I don't know if this would
mean adding new quoting styles, or adding yet another option to struct

Finally, the comments in quotearg.h mentions that the default quoting
style if not otherwise specified is literal, but that someday it may
switch to shell.  Is it time to make that swap?

- --
Life is short - so eat dessert first!

Eric Blake             address@hidden
Version: GnuPG v1.4.2.1 (Cygwin)
Comment: Public key at home.comcast.net/~ericblake/eblake.gpg
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org


reply via email to

[Prev in Thread] Current Thread [Next in Thread]