[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: strip accents and sorting [was: BibTeX issues]
From: |
Roland Winkler |
Subject: |
Re: strip accents and sorting [was: BibTeX issues] |
Date: |
Fri, 30 Aug 2019 11:27:33 -0500 |
On Thu Aug 29 2019 martin rudalics wrote:
> > But (string-lessp "ä-umlaut" "ö-combine") gives nil
>
> But (string-collate-lessp "ä-umlaut" "ö-combine") gives t
...not for me, which is likely due to my locale LC_COLLATE=C
I could use instead, say, LC_COLLATE=en_US.utf8. Then the above
call of string-collate-lessp yields t. But this also implies case
folding and ignoring dots in directory listings, which is not what I
want. In other words, these locales have too many features bundled
together.
Maybe these feature sets of different locales are documented
*somewhere* in a neat way, and there is a locale with a feature set
that does exactly what I want. But to the best of my knowledge this
documentation resides outside emacs so that things get rather
complicated when this affects an emacs session in important or
possibly subtle ways.
> so it should be fairly easy to fix `sort-lines' and friends
> accordingly.
In that sense I am not sure I would like to see `sort-lines' and
friends be fixed "accordingly". If at all, I'd vote for a user
option that likely I'd use to disable such things.
On the other hand, as Eli pointed out in his reply about accented
characters being represented via a single character as compared to
using combining characters
> The Unicode Standard mandates that they be handled identically,
> including in searching and sorting. We don't yet implement that
> 100%, but see char-fold.el for a partial (and not very efficient)
> implementation during search.
So I would assume that the locale should not matter at all in the
context of unicode combining characters. (Or there should be a way
to control exactly this aspect of unicode combining characters with
no additional (mis)features bundled with it.)
I understand that it is a different matter how accented characters
are sorted relative to each other and also relative to un-accented
characters. So it can make a lot of sense to have different locales
for that aspect.
Maybe I am missing something here. (And I have not yet looked in
more detail at char-fold.el mentioned by Eli, which could be a
better way to go within the emacs world.)
Roland
- BibTeX issues, Joost Kremers, 2019/08/27
- Re: BibTeX issues, Roland Winkler, 2019/08/28
- Re: BibTeX issues, Eli Zaretskii, 2019/08/28
- strip accents and sorting [was: BibTeX issues], Roland Winkler, 2019/08/28
- Re: strip accents and sorting [was: BibTeX issues], martin rudalics, 2019/08/29
- Re: strip accents and sorting [was: BibTeX issues],
Roland Winkler <=
- Re: strip accents and sorting [was: BibTeX issues], Eli Zaretskii, 2019/08/30
- Re: strip accents and sorting [was: BibTeX issues], Eli Zaretskii, 2019/08/30
- Re: strip accents and sorting [was: BibTeX issues], Roland Winkler, 2019/08/30
- Re: strip accents and sorting [was: BibTeX issues], Eli Zaretskii, 2019/08/30
- Re: strip accents and sorting [was: BibTeX issues], Roland Winkler, 2019/08/30
- Re: strip accents and sorting [was: BibTeX issues], Eli Zaretskii, 2019/08/31
- Re: strip accents and sorting [was: BibTeX issues], Eli Zaretskii, 2019/08/29
- Re: strip accents and sorting [was: BibTeX issues], Roland Winkler, 2019/08/30
Re: BibTeX issues, Joost Kremers, 2019/08/29