bug-gnu-emacs
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

bug#59275: Unexpected return value of `string-collate-lessp' on Mac


From: Eli Zaretskii
Subject: bug#59275: Unexpected return value of `string-collate-lessp' on Mac
Date: Sat, 26 Nov 2022 11:22:29 +0200

> From: Ihor Radchenko <yantar92@posteo.net>
> Cc: 59275@debbugs.gnu.org
> Date: Sat, 26 Nov 2022 08:47:13 +0000
> 
> > 'downcase' uses the buffer-local case table if such is defined for the
> > buffer that happens to be the current when you invoke 'downcase', and that's
> > another cause of inconsistency and user surprises, especially when the
> > strings you compare don't really "belong" to the current buffer.
> 
> Interesting. Is there any reason why this is not mentioned in the
> docstring for `downcase'?

Yes: because we are ashamed of that and hope to change it at some point, if
we ever figure out how to do that.  The way to avoid this caveat is simple:
let-bind case-table when you call 'downcase'.

> I now see 4.10 The Case Table section of the manual, and it looks like
> case tables should be set mostly automatically (by Emacs?) according to
> the language environment.

Yes.  But a buffer can have its local case-table.

> Are details about this process documented anywhere?

No.  But see characters.el and the function I mention below.

> Are these case conversion tables independent of glibc?

Yes.  We build them completely separately and from scratch, as you will see
in characters.el.

> https://nullprogram.com/blog/2014/06/13/ that mentioned something
> similar about caveats with composition.

I don't see there anything about sorting or collation.  What did I miss?

> Just mentioning it for your reference. (I am not sure if the caveats
> discussed have been raised on Emacs devel).

What did you think ought to be discussed?

Btw, that blog fails to distinguish between display-time features and
processing of text without displaying it.  On display, Emacs combines
characters that are combining, so equivalent character sequences should look
the same.  But Emacs doesn't by default consider equivalent character
sequences as equal in all situations, leaving this to the Lisp program.
Considering them always as equal looks sexy in a blog post, because it
raises some brows and has the "whoah!" effect, but isn't a good policy in
general, since some applications definitely need to know about the original
decomposed sequence.  We cannot conceal this from Lisp programs by hiding
the original sequence on some low level that is not exposed to Lisp.  Yes,
this makes Lisp programs more complicated, but that comes with the
territory: you cannot have power without complexity.

> I feel that I miss something. Don't Emacs provide unicode case
> conversion tables?

The case tables we provide are based on Unicode, but are tweaked by the
language-environment.  See, for example, turkish-case-conversion-enable,
which is run when the Turkish language-environment is turned on.

> Why plain ASCII rules?

Your logic is.  What you suggest breaks down if you consider various
complications in some locales.

> > And we are talking about a single system where these problems happen, which
> > is macOS, right?  Wouldn't it be better for "Someone" who uses macOS to just
> > bite the bullet and write a proper collation function, or find a free
> > software implementation of one, and include it in Emacs?  This is what I did
> > for MS-Windows at the time string-collate-lessp was added to Emacs.  Why
> > cannot macOS users do the same?
> 
> It would be. But how can we ask for this? etc/TODO? Or maybe re-open
> this bug report?

Anything will be fine with me, but unless the people who are asking you to
do these workarounds are motivated enough to sit down and do the job, we
will never get there.  And guess what effect these workarounds have on their
motivation.





reply via email to

[Prev in Thread] Current Thread [Next in Thread]