bug-gnu-emacs
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

bug#18051: [Emacs-diffs] trunk r117726: Add string collation.


From: Eli Zaretskii
Subject: bug#18051: [Emacs-diffs] trunk r117726: Add string collation.
Date: Mon, 25 Aug 2014 18:03:32 +0300

> From: Michael Albinus <michael.albinus@gmx.de>
> Date: Mon, 25 Aug 2014 08:41:03 +0200
> Cc: Paul Eggert <eggert@cs.ucla.edu>, 18051@debbugs.gnu.org
> 
> > BTW, I think that collation functions with 3rd optional argument
> > to specify locale settings will be a bit more versatile, e.g.
> >
> > (string-collate-lessp a b "es_ES.UTF-8")
>
> We discuss this already, see 
> <http://lists.gnu.org/archive/html/bug-gnu-emacs/2014-08/msg00623.html>
>
> My major reservation to this approach is that it doesn't fit well using
> string-collate-lessp as predicate of sort. That's why I have proposed a
> global variable as alternative, which could be let-bounded.

I think that binding a variable will indeed be cleaner.  Using
process-environment for that purpose should be reserved for the
application level.  Also, what if LC_COLLATE is not set in the
environment, but 'setlocale' does return some value for it? shouldn't
we use that?

Here are a few more thoughts about related issues:

1. Why does str_collate return a ptrdiff_t value?  AFAIK, wcscoll
   etc. return int data type, and of rather small values.

2. Should we signal an error if the input strings are not pure-ASCII
   or multibyte?  Unibyte strings will at best cause incorrect
   results.  And what about strings with invalid codepoints,
   e.g. those outside of the Unicode range, which can happen inside
   Lisp strings?

3. What about errors in wcscoll?  The current code ignores them;
   however, the value returned by wcscoll in case of an error is not
   documented, so it could be random.  Should we signal an error if
   errno gets set by wcscoll?

4. How to control the optional features of the collating sequence?  I
   mean, for example, the fact that punctuation characters are ignored
   in the .UTF-8 locales on glibc hosts (or so it seems).  At least on
   Windows, a somewhat higher degree of control is available, but it
   must be specified separately of the locale ID.  E.g., the
   comparison function accepts flags to ignore punctuation and
   symbols, width differences, diacritics, etc. Should we have another
   variable, perhaps w32-specific, to request these features?
   Alternatively, we could use .UTF-8 on Windows to communicate that,
   although that sounds like a kludge.

5. The locale names on Windows are different from Posix: Windows uses
   3-letter abbreviations of the country and the language,
   e.g. "fra_FRA" instead of the Posix "fr_FR".  Do we want the locale
   string values used for let-binding the above-mentioned variable to
   be portable across systems?  Then we'd need some conversion
   database on MS-Windows.

6. I think we will want case-insensitive version of this function.





reply via email to

[Prev in Thread] Current Thread [Next in Thread]