bug-bash
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Bug: Ligatures are removed as one character


From: Martin D Kealey
Subject: Re: Bug: Ligatures are removed as one character
Date: Sun, 25 Feb 2024 23:37:18 +1300 (NZDT)
User-agent: Alpine 2.21 (DEB 202 2017-01-01)

n Fri, 23 Feb 2024, Chet Ramey wrote:
> On 2/19/24 9:26 PM, Avid Seeker wrote:
> > When pressing backspace on Arabic ligatures (including characters with
> > diacritics), they are removed as if they are one character.
>
> As you might guess, readline doesn't know much about Arabic, per se. In a
> UTF-8 locale, for example, it knows base characters and combining
> characters.
>
> The idea is simple: when moving backwards, move one multibyte character at
> a time, ignoring combining characters, until you get to a character for
> which wcwidth(x) > 0, and move point there. The algorithm for moving
> forward is similar.
>
> How should this be modified to support Arabic in a portable way?

Unicode has categories for "modifiers" (especially "modifier letters") and
for "combining characters". Note that each symbol can be in multiple
categories.

Modifiers change how another character is displayed. They may or may not be
considered to have their own separate semantic meaning. In the simple cases
they simply over-print an additional mark, but more complex adjustments are
possible. They don't normally change the overall size of the modified
character, so wcwidth(ch) will report zero.

What matters is that "combining characters" do not have stand-alone semantic
meaning; they should be erased along with the principal character. Accents
in European languages (and Thai) tend to be in this category.

To a first approximation, backspace should skip over the latter but not the
former. However if you've just removed a zero-width element, it would be
advisable to either re-render the whole line, or backspace over the last
full glyph, erase it, and re-render it with all its (remaining) modifiers.

https://stackoverflow.com/questions/54450823/what-is-the-difference-between-combining-characters-and-modifier-letters

On systems that need to dynamically load a shared library (linunicode.so?)
to support this, I suggest delaying doing so until it's needed -- after
setlocale("something.UTF-8") returns success, or some equivalent test. (I
hope there's a check that can be done against the already-loaded locale,
rather than inspecting the locale name as a string.)

-Martin



reply via email to

[Prev in Thread] Current Thread [Next in Thread]