bug-texinfo
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: library for unicode collation in C for texi2any?


From: Eli Zaretskii
Subject: Re: library for unicode collation in C for texi2any?
Date: Fri, 13 Oct 2023 08:51:39 +0300

> Date: Thu, 12 Oct 2023 20:30:47 +0000 (UTC)
> Cc: pertusus@free.fr, bug-texinfo@gnu.org
> From: Werner LEMBERG <wl@gnu.org>
> 
> >> > I don't recommend to tailor index sorting for the language
> >> > indicated by @documentlanguage, either.
> >> 
> >> This surprises me.  Why not?  For some languages, the alphabetical
> >> order differs enormously from English.
> > 
> > Because indices in a Texinfo document should not depend on details
> > of how the manual was produced.
> 
> Well, if I write a book in German, say, I most definitely want an
> index sorted with a German collation (there is more than a single one,
> BTW).  This collation should be used regardless of the input encoding.
> However, ...
> 
> > And note that I said "tailoring", which is minor adjustments to the
> > general collation, which is based on character Unicode codepoints.
> 
> ... there is probably a misunderstanding on my side.  I don't know
> what you mean with 'tailoring', please give an example.

This subject is too large and complicated for me to answer this
question here.  So I will refer you to the relevant Unicode spec:

  https://unicode.org/reports/tr10/

Section 8 "Tailoring" there will probably answer your question.

The main reason why I think we should not use language-specific
tailoring is that it is implemented differently by different system
libraries, and therefore the manuals produced by using that will be
different depending on what platform they were produced.  And that is
undesirable, IMO, from our POV.  As an example, I suggest to compare
the collation of file names in GNU 'ls', as implemented by glibc
(which basically implements the entire Unicode UTS#10 mentioned above
and uses its CLDR data set, http://unicode.org/cldr/), with the
corresponding MS-Windows API documented here:

  
https://learn.microsoft.com/en-us/windows/win32/api/stringapiset/nf-stringapiset-comparestringex

The results of collation using these disparate implementations is
similar, but not identical.  My point here is that Texinfo should IMO
try to avoid these subtle differences as much as possible.  Using code
that is independent of the current locale is a large step in that
direction, but there are additional smaller steps that we should take
after that, and avoiding too strong dependence on language-specific
collation, as implemented by the underlying libraries, is one of them.



reply via email to

[Prev in Thread] Current Thread [Next in Thread]