bug-bash
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Bug: Ligatures are removed as one character


From: Martin D Kealey
Subject: Re: Bug: Ligatures are removed as one character
Date: Tue, 20 Feb 2024 19:52:58 +1000

It's been a long time since I looked into Unicode, but this is what I
remember.

Depending on the Unicode normalisation level, backspace is *supposed* to
remove a letter and all its associated combining marks.

The root problem seems to be that some Arabic letters change from
"non-combining" to "combining" depending on the language in which they're
used. Unicode also has a problem distinguishing a combining letter (vowel
points in Arabic or Hebrew) from a combining diacritic (accents in Latin
script).

If you think that's a bug in Unicode, you're not alone; the Unicode
consortium has been struggling with this for at least ten years - see
https://unicode.org/L2/L2014/14109-inline-chars.pdf

There's been some progress; Unicode version 12 has at least admitted
there's a problem (https://www.unicode.org/versions/Unicode12.1.0/ch07.pdf
chapter 7.9 page 327).

I'll leave it to others to survey the current state of play with Unicode,
but historically it's been a mess.

-Martin


On Tue, 20 Feb 2024 at 12:26, Avid Seeker <avidseeker7@protonmail.com>
wrote:

> When pressing backspace on Arabic ligatures (including characters with
> diacritics), they are removed as if they are one character.
>
> Example:
>
> السَّلامُ
>
> Pressing 3 backspaces leaves the word at ال. It removed لا which is a
> ligature
> combining "ل" and "ا", and removed "م" with diacritics. Compare this with
> the
> behavior of zsh.
>
> For non-Arabic speakers, this is like typing: fi (U+0046 U+0049), but when
> pressing backspace it removed it as the character: fi (U+FB01).
>
>


reply via email to

[Prev in Thread] Current Thread [Next in Thread]