bug#39799: 28.0.50; Most emoji sequences don’t render correctly

bug-gnu-emacs

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

bug#39799: 28.0.50; Most emoji sequences don’t render correctly

From:	Mike FABIAN
Subject:	bug#39799: 28.0.50; Most emoji sequences don’t render correctly
Date:	Sat, 29 Feb 2020 12:14:28 +0100
User-agent:	Gnus/5.13 (Gnus v5.13) Emacs/28.0.50 (gnu/linux)

Eli Zaretskii <eliz@gnu.org> さんはかきました:

>> From: Mike FABIAN <mfabian@redhat.com>
>> Cc: rpluim@gmail.com,  39799@debbugs.gnu.org
>> Date: Sat, 29 Feb 2020 08:59:49 +0100
>> 
>> Eli Zaretskii <eliz@gnu.org> さんはかきました:
>> 
>> > If Gedit selects a font by looking at more than one codepoint (and I'm
>> > not sure this is how it works in Gedit), then Emacs doesn't work that
>> > way.
>> 
>> Yes, Gedit does this somehow with pango. It tries to avoid switching
>> fonts in places where it would look bad. For example, if you have a
>> default font supporting only ASCII and then there is a word containing
>> some non-ASCII character like “grün” it chooses a font containing the
>> “ü” for the whole word to avoid the “ü” looking out of place.
>
> Well, "somehow" is not enough to see whether we have any additional
> work to do in Emacs, because Emacs also tries to achieve that same
> goal.  There are many different ways to achieve it, though; for
> example, Emacs will AFAIK by default not even use a font that could
> support ASCII, but not Latin-1 blocks as the default face's font.
>
> What you say about Gedit makes sense in general, but questions
> immediately pop up: how does Gedit define a "word" (Emacs, as you
> know, has very a flexible definition that can be controlled from
> Lisp), how does it "know" that a word like "grün" belongs to the same
> script (otherwise displaying a character from another script using a
> different font, as in, say, "grאn" might make sense), etc.

Yes, “word” is already too simplified.


> IOW, what we need is a detailed description of what Pango does here,
> and how does Gedit affect that by configuring its default fonts.  Only
> then we can reason about the differences between that and what Emacs
> does.

Yes, you are right, and I think this is very difficult.

I don’t know the details, but Pango seems to “cut” text into “runs”
where each “run” is rendered with a single font. And it tries to
cut the text into “runs” in a way that the overall result looks
as nice as possible. This is really difficult and doesn’t always
work well, sometimes the results are ugly although overall it seems to
do a good job.

>> > In any case, are these sequences displayed as composed characters?
>> > Does "C-u C-x =" tell that the base character U+24C2 was composed with
>> > the following variation selector?  According to the setup in
>> > japanese.el, they should compose, if the font used for U+24C2 also
>> > supports the variation selectors.
>> 
>> Yes, it does tell that it was composed with the following character:
>
> And the resulting display is what you expect?  If not, then I think
> you need to find a font which supports Emoji presentation of
> characters such as Ⓜ, and make Emacs use it for those sequences.

Yes, in the case of Ⓜ️ U+24C2 U+FE0F the result in Emacs is perfect
when using “Noto Color Emoji” or “Joypixels”. It is displayed in colour
and behaves as a single character in the buffer, the variation selector
is not displayed as a box. This is perfect.

But when using Symbola for the same sequence one sees U+FE0F as an ugly
box.

And when displaying the text representation sequence Ⓜ︎ U+24C2 U+FE0E
one always sees U+FE0E as a box no matter whether using “Symbola”,
“Noto Color Emoji” or “Joypixels”.

I am not sure whether this is wrong. Maybe it is OK to require a font
which can handle this? I am really not sure...

But what about # U+0023 NUMBER SIGN ?

This does have an emoji representation.

I.e. U+0023 U+FE0F displays in color as an emoji in pango-view and
gedit.

How could this ever work in Emacs? If you have to decide for a single
font to render U+0023 in Emacs, you would need to set a “capable” emoji
font for an ASCII character like #. One probably does not want to do
that. Then # in text representation would look different in style than
the other ASCII characters because it would come as the text
representation glyph from some emoji font which would probably not go
well together with other ASCII characters coming from some font like
for example “DejaVu Sans Mono”. So one probably wants to set
something like “DejaVu Sans Mono” for # as well, otherwise normal text
won’t look nice. But how can one display U+0023 U+FE0F as am emoji then?

This seems very messy, I don’t know how this can be solved.

> If you think this Emacs requirement for a capable font is incorrect, I
> suggest to post a question about this to the HarfBuzz mailing list,
> harfbuzz@lists.freedesktop.org, maybe HarfBuzz has capabilities in
> this regard that we somehow don't yet utilize.

Yes, I’ll try that, maybe that helps to understand it better.

-- 
Mike FABIAN <mfabian@redhat.com>
睡眠不足はいい仕事の敵だ。

[Prev in Thread]

Current Thread

[Next in Thread]

bug#39799: 28.0.50; Most emoji sequences don’t render correctly, (continued)

Prev by Date: bug#39799: 28.0.50; Most emoji sequences don’t render correctly
Next by Date: bug#39799: 28.0.50; Most emoji sequences don’t render correctly
Previous by thread: bug#39799: 28.0.50; Most emoji sequences don’t render correctly
Next by thread: bug#39799: 28.0.50; Most emoji sequences don’t render correctly
Index(es):
- Date
- Thread