emacs-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Ligatures (was: Unify the Platforms: Cairo+FreeType+Harfbuzz Everywh


From: Pip Cet
Subject: Re: Ligatures (was: Unify the Platforms: Cairo+FreeType+Harfbuzz Everywhere (except TTY))
Date: Wed, 27 May 2020 18:42:07 +0000

On Wed, May 27, 2020 at 5:13 PM Eli Zaretskii <eliz@gnu.org> wrote:
> > From: Pip Cet <pipcet@gmail.com>
> > Date: Wed, 27 May 2020 09:36:52 +0000
> > Cc: emacs-devel@gnu.org
> >
> > > Any measurements to back that up?
> >
> > Yes. With a regexp of "....", the composite.c code takes 175 billion
> > cycles to display every line of composite.c. My code takes 144 billion
> > cycles, with a lookahead/lookbehind each set to 128 but limiting it as
> > described.
>
> What did you compare, exactly?  On the one hand, the code you posted
> here, which took 128 characters around each character to be displayed?

No. Not anything like that code.

> any other changes in the code you posted here?  And what does
> "limiting it as described" mean here?

I described the algorithm for selecting context.

> And on the other hand, the existing automatic composition machinery?
> With what setup of composition-function-table, exactly?

As I said, a regexp of "....".

> And finally, which code was included in the count of cycles?

All of it.

There's no reason to believe the composite.c regexp design will
perform adequately. It doesn't.

> > > > > and others, including (but not limited to) the dreaded bidi thing.
> > > >
> > > > Looking for "bidi" in composite.c, the only relevant thing I see is a 
> > > > FIXME.
> > >
> > > That's because you look in the wrong place.
> >
> > What's the right place? I'm using all the code in bidi.c, of course,
>
> No, actually you don't.
> Your make_context copies characters in strict
> logical order, bypassing bidi.c

My current code doesn't.

> , and by that also potentially crossing
> boundaries of different directionality (and even line and paragraph
> boundaries), which is a no-no in text shaping.  Then, after you call
> the shaper, you don't reorder the glyphs it delivers, so they will
> look on display in the wrong order.

I do now.

> And there may be other subtle
> issues as well -- this stuff was finalized so long ago that I'm not
> even sure I remember all the details of what needed to be done to get
> it right.

(It's not enough. Open emacs -Q etc/HELLO, place point on the lam in
"aleikum", and hit control-space. The shape changes to something
incorrect.)

> > > > The code shouldn't break horribly for RTL text (it doesn't).
> > >
> > > It _will_ break for RTL text, you just didn't yet see it because you
> > > only tested it in simple use cases.  UAX#9 defines a lot of optional
> > > features, including multi-level directional overrides and embeddings,
> > > it isn't just right-to-left vs left-to-right.
> >
> > I assume bidi.c handles that, as it does for composite.c?
>
> Yes, but only _if_you_use_them_correctly_!  If you bypass them, then
> all bets are off.

Obviously.

> > > > We have something that superficially results in a similar screen
> > > > layout to what I want, but that actually represents display elements
> > > > in a way that makes them unusable for my purposes.
> > >
> > > Then please describe what doesn't fit your purpose, and let's focus on
> > > extending the existing code to do what's missing.
> >
> > The three main things are:
> >  - "entering" glyphs, instead of treating them as atomic
>
> Why is that needed?  A ligature is a single display entity, that's why
> fonts ligate.

"ffi" is not. When I enter "official" C-a C-f C-f, point MUST be on
the second f.

> Why would we want to break ligatures when we wrap
> lines?

Who said we do? I personally like it, but it's obviously not something
we should do by default?

> >  - providing context automatically rather than by providing specific
> > regexps for it in advance
>
> That's a separate part of the problem; I wasn't talking about it.  It
> needs a separate solution (which was not yet presented), but the
> solution doesn't have to be based on regexps if a better or smarter or
> faster way is available.  Extending composition-function-table to
> support context definition by means other than regexp is easy and
> doesn't disrupt the way the code works.
>
> >  - kerning, which requires context for every character
>
> That's again about that separate part of the problem, because once the
> context was determined correctly, the shaper will perform the kerning
> for you.

> >  - ligatures that come partly from a display property and partly from
> > the buffer (composite.c doesn't allow for those, as far as I can tell)
>
> It doesn't and it shouldn't!  Text of display strings and overlay
> strings is completely isolated from buffer text, and is even
> bidi-reordered independently.  This is by design.

Unacceptable design for my use case, then.

I don't see how revealing buffer text that has a replacing display
property, rather than the replacement, is good design.

The results of putting display properties on autocompositions
are...entertaining, in any case. I've now got an "x" character that
C-x = tells me is an "i"...

> These strings are
> more akin to images than to a part of buffer text, so mixing them with
> buffer text on display would be a grave mistake.

No, it wouldn't be. If two letters appear with no intervening space,
they need to be kerned and ligated if appropriate, no matter where
they come from. If people want a ZWNJ, that's perfectly available to
them.

> > > Please note: I'm not talking about the regexp part -- that part you
> > > anyway will need to decide how to extend or augment.  I'm telling you
> > > right here and now that blindly taking a fixed amount of surrounding
> > > text will not be acceptable.  You can either come up with some smarter
> > > regexp (and you are wrong: the regexps in composition-function-table
> > > do NOT have to match only fixed strings, you can see that they don't
> > > in the part of the table we set up for the Arabic script);
> >
> > Again, I think the limits are fixed: 4 characters of history and 500
> > characters of look-ahead. What am I missing?
>
> Fixed limits and fixed strings are two different things.  You can
> match strings of many different lengths up to a limit.

Which effectively means you can match strings of that limited length.

> The 3 previous characters are rarely needed, certainly not for English
> ligatures, because you can detect the sequence by the first character.

Precisely the same argument applies to my 16-character limit. A script
in which a glyph depends on something happening 16 codepoints onwards,
or back, is extremely unlikely.

> Anyway, you again focus on the (separate) issue of determining the
> context.  Whereas I was mainly talking about what happens _after_ you
> determine the context: how do you collect the characters to pass to
> the shaper, how you present to the layout code the glyphs returned by
> the shaper, and how you lay out those glyphs by inserting them into
> the glyph rows of the glyph matrix.  It is this code that I see no
> reason to modify, definitely not significantly.

It needs to be modified, significantly, to support entering glyphs, to
support kerning, and to support things like ligating across a buffer
text / display string boundary.

> > > or you can
> > > decide on something more complex, like a function.  Either way, the
> > > amount of text that this will pick up and pass to the shaper should be
> > > reasonable and should be determined by some understandable rules.  And
> > > those rules must be controllable from Lisp.
> >
> > That last part isn't true for the composite.c code, which imposes a
> > limit of 4 characters of history and 500 characters of look-ahead
>
> How do those limits violate the above requirement?  The 3-char
> prev-chars limit is "reasonable" because we currently don't need more,

It's hardcoded in C, though. A 16-character limit, as explained above,
is perfectly "reasonable" for determining the shape of a single glyph.

> and the other limit doesn't exist AFAICT -- you could make a regexp
> that matched very long strings, if needed.

Hmm. I thought I saw weirdness around the 500th character, but it's
probably one of the other bugs.

But, seriously, you're still willing to argue that point shouldn't be
able to enter the "ffi" glyph? Not even if the user wants it? Because
if so, I suggest we interrupt the discussion here.



reply via email to

[Prev in Thread] Current Thread [Next in Thread]