bug-libunistring
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [bug-libunistring] Hangul Jamo vowels and trailing consonants should


From: Bruno Haible
Subject: Re: [bug-libunistring] Hangul Jamo vowels and trailing consonants should probably be 0 width
Date: Tue, 28 Dec 2021 11:36:10 +0100

Hello Luis,

> I've been looking at widths reported for Hangul Jamo in wcwidth 
> implementations.

Thanks for bringing up this issue. I wasn't aware of it.

> glibc gave width 0 to conjoining jungseong and jongseong at:

https://sourceware.org/bugzilla/show_bug.cgi?id=21750
https://sourceware.org/bugzilla/show_bug.cgi?id=22074

Ouch. As Egmont Koblinger wrote in the first of these glibc tickets, every
change to the commonly accepted wcwidth has the potential to cause trouble.

> In glibc and MirBSD xterm, U+1160..U+11FF and U+D7B0..U+D7FF have 0 width.

I agree that U+D7B0..U+D7FF (Hangul Jamo Extended-B) should be treated like
U+1160..U+11FF (Hangul Jamo medial and final), per Unicode standard, chapter 18
https://www.unicode.org/versions/Unicode14.0.0/ch18.pdf .

However, I don't think what people have been looking at is the right spot.

1) People (esp. Thorsten Glaser) have been arguing with the behaviour of xterm.
But xterm is rarely used nowadays. I have evaluated the popularity of terminal
emulators in August 2019, and here are the results:

  * measured through Debian popularity contest:

    https://qa.debian.org/popcon.php?package=konsole     11%
    https://qa.debian.org/popcon.php?package=emacs        7%
    https://qa.debian.org/popcon.php?package=lxterminal   6%
    https://qa.debian.org/popcon.php?package=guake        1.3%
    https://qa.debian.org/popcon.php?package=yakuake      1.1%
    https://qa.debian.org/popcon.php?package=rxvt         0.9%
    https://qa.debian.org/popcon.php?package=termit       0.7%
    https://qa.debian.org/popcon.php?package=lilyterm     0.1%

  * https://opensource.com/life/17/10/top-terminal-emulators

    1. gnome-terminal
    2. terminator
    3. konsole
    4. xterm
    5. guake
    6. yakuake
    7. tilda

The conclusion is that
  - GNOME vte based terminal emulators are probably 50% today,
  - konsole comes second,
  - xterm is not important (because who still wants to use a program
    with Athena widgets in an environment based on Gtk and/or Qt widgets?)

2) People argue about the use of these Hangul Jamo characters when
they form a complete Hangul syllable, and that in this case the
total width should be 2, and therefore 2 = 2 + medial + final the
medial and final parts should have width 0.

But in this case people would be using a precomposed Hangul syllable.

What I am more concerned about: When you look at the code charts
https://www.unicode.org/charts/PDF/U1100.pdf
https://www.unicode.org/charts/PDF/UD7B0.pdf
you see that there are glyphs.
- In which circumstances are these characters used individually?
  Maybe in a text book for Korean children?
- How are they supposed to be rendered in these situations? Surely
  as glyphs of width 2, no?

In the end, it comes down to: What is the more frequent context for
these characters?

Bruno






reply via email to

[Prev in Thread] Current Thread [Next in Thread]