[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
bug#20316: 24.5; `string-lessp' doesn't respect value of LC_COLLATE
From: |
Eli Zaretskii |
Subject: |
bug#20316: 24.5; `string-lessp' doesn't respect value of LC_COLLATE |
Date: |
Tue, 14 Apr 2015 17:57:32 +0300 |
> From: Alexis <flexibeast@gmail.com>
> Cc: Eli Zaretskii <eliz@gnu.org>, michael.albinus@gmx.de
> Date: Tue, 14 Apr 2015 10:55:53 +1000
>
> So by default, Emacs sorts disregarding locale-specific ordering,
> basically using the Unicode codepoints of the characters to order them.
>
> This makes sense given what you've said above, but can this still be referred
> to as 'lexicographic' ordering? To me, 'lexicographic ordering' is ordering
> as per a dictionary for the relevant language, not by codepoint for an
> arbitrary encoding. Is this wrong?
I think we use "lexicographic" for lack of a more accurate word. We
could use something like "code point (binary) order", but would that
be clear enough to be useful?
Note that we are not alone in this; at least this page:
http://en.cppreference.com/w/cpp/string/byte/strcoll
says that the C function 'strcmp' does a "lexicographical comparison".
So do a few other similar pages; google for "difference between strcmp
and strcoll".
> One of the package's users had imported a set of contacts, then expected to
> be able to sort those contacts according to Croatian rules, using `org-sort'
> (from `org.el'). However, to quote the user, this resulted in the contacts
> being sorted according to the English alphabet rules where the contact
> entries which start with Croatian characters (Č,Ć,Đ,Š,Ž) are at the end of
> the list, iow. after 'Z' entries, although it should go like this:
>
> A,B,C,Č,Ć,D,Dž,Đ,..S,Š,..Z,Ž
That's "collation order" in action, note that the diacritic order is
applied _after_ the alphabetic order of the base characters. That's
what string-collate-lessp does.
- bug#20316: 24.5; `string-lessp' doesn't respect value of LC_COLLATE, Alexis, 2015/04/13
- bug#20316: 24.5; `string-lessp' doesn't respect value of LC_COLLATE, Alexis, 2015/04/13
- bug#20316: 24.5; `string-lessp' doesn't respect value of LC_COLLATE, Michael Albinus, 2015/04/13
- bug#20316: 24.5; `string-lessp' doesn't respect value of LC_COLLATE, Alexis, 2015/04/13
- bug#20316: 24.5; `string-lessp' doesn't respect value of LC_COLLATE, Michael Albinus, 2015/04/13
- bug#20316: 24.5; `string-lessp' doesn't respect value of LC_COLLATE, Eli Zaretskii, 2015/04/13
- bug#20316: 24.5; `string-lessp' doesn't respect value of LC_COLLATE, Alexis, 2015/04/13
- bug#20316: 24.5; `string-lessp' doesn't respect value of LC_COLLATE,
Eli Zaretskii <=
- bug#20316: 24.5; `string-lessp' doesn't respect value of LC_COLLATE, Alexis, 2015/04/14
- bug#20316: 24.5; `string-lessp' doesn't respect value of LC_COLLATE, Stefan Monnier, 2015/04/14
bug#20316: 24.5; `string-lessp' doesn't respect value of LC_COLLATE, Paul Eggert, 2015/04/13