[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Ligature support

From: tomas
Subject: Re: Ligature support
Date: Sat, 6 Nov 2021 10:16:25 +0100
User-agent: Mutt/1.5.21 (2010-09-15)

On Fri, Nov 05, 2021 at 10:30:47PM +0200, Eli Zaretskii wrote:
> > Date: Fri, 5 Nov 2021 20:52:45 +0100
> > From: tomas@tuxteam.de
> > Cc: emacs-devel@gnu.org
> > 
> > > > it would have to know (or guess?) the language it is treating.
> > > 
> > > We do pass the language to HarfBuzz when we think we know it, but the
> > > problem is Emacs itself has no good notion of the "current language".
> > 
> > This is what I was pointing at.
> Well, don't just point to the obvious: better sit down and code some
> features that we can use to be smarter ;-)
> > If the text itself is multilingual, your best bet is to ask the user
> Asking the user during redisplay is a non-starter.


More constructively, that's what happens while typesetting text.
That's what TeX has \/ for.

We have two classes of language: the ones, where ligatures are
essential (Arabic, Hangul -- I must admit that I know very little
about the latter). For those, there is no choice.

Then we have those where ligatures are rather a "decoration", an
accident of old handwriting further fashioned by the introduction
of movable type.

And a decoration you sometimes downright don't want (in TeX, last
time I looked, most German writers just disabled ligatures: the
"wrong ligatures" are so much more disturbing, and the prospect
of proofreading the thing for wrong ligatures and sprinkling your
source with \/ just isn't worth it).

In short, there are languages where "asking the user" is just the
only option; that means that the feature only makes sense while
typesetting (where you /can/ ask the user) and not while rendering
dynamically (the "redisplay" case we are treating here).

The problem is composed with TeX's legacy, which used its ligature
mechanism for things which strictly aren't, think -- for an em
dash. It's a nice hack, and people perceive that as a ligature,
too (you can see that elsewhere in this huge thread) but it ain't.

I still think: there isn't a general solution. Me? I'd prefer to
disable all ligatures unless I'm writing Arabic.

> > and your second-best bet is to do some statistical heuristics, which
> > only will "work" for a longer stretch of text.
> That's a waste of CPU cycles: when we don't know the language, we ask
> HarfBuzz to guess, and I trust HarfBuzz that it can guess as well or
> better as we can.

I haven't looked into it, but I wonder what magic it uses, if it
isn't some variation of "maximum likelihood over n-gram statistics".

> > > Such a notion is problematic in a multilingual editor such as Emacs.
> > > It is something we still need to figure out, and after that implement
> > > the necessary infrastructure.  What we have now is rudimentary and
> > > very insufficient.
> > 
> > I think that will always be an approximation.
> Maybe, maybe not.  I Hope at least sometimes we could do better.
> there are various hints in the form of the encoding, the source of the
> text, etc.  We just need to figure out which means we have for
> gleaning the language that is not obvious from the characters
> themselves (because HarfBuzz does the latter already), and provide the
> features for Lisp programs and users to use them.

Some day I'll peek into HarfBuzz's source code. Perhaps next year.

 - t

Attachment: signature.asc
Description: Digital signature

reply via email to

[Prev in Thread] Current Thread [Next in Thread]