emacs-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Ligatures (was: Unify the Platforms: Cairo+FreeType+Harfbuzz Everywh


From: Eli Zaretskii
Subject: Re: Ligatures (was: Unify the Platforms: Cairo+FreeType+Harfbuzz Everywhere (except TTY))
Date: Wed, 27 May 2020 20:13:36 +0300

> From: Pip Cet <address@hidden>
> Date: Wed, 27 May 2020 09:36:52 +0000
> Cc: address@hidden
> 
> > Any measurements to back that up?
> 
> Yes. With a regexp of "....", the composite.c code takes 175 billion
> cycles to display every line of composite.c. My code takes 144 billion
> cycles, with a lookahead/lookbehind each set to 128 but limiting it as
> described.

What did you compare, exactly?  On the one hand, the code you posted
here, which took 128 characters around each character to be displayed?
any other changes in the code you posted here?  And what does
"limiting it as described" mean here?

And on the other hand, the existing automatic composition machinery?
With what setup of composition-function-table, exactly?

And finally, which code was included in the count of cycles?

> > > > and others, including (but not limited to) the dreaded bidi thing.
> > >
> > > Looking for "bidi" in composite.c, the only relevant thing I see is a 
> > > FIXME.
> >
> > That's because you look in the wrong place.
> 
> What's the right place? I'm using all the code in bidi.c, of course,

No, actually you don't.  Your make_context copies characters in strict
logical order, bypassing bidi.c, and by that also potentially crossing
boundaries of different directionality (and even line and paragraph
boundaries), which is a no-no in text shaping.  Then, after you call
the shaper, you don't reorder the glyphs it delivers, so they will
look on display in the wrong order.  And there may be other subtle
issues as well -- this stuff was finalized so long ago that I'm not
even sure I remember all the details of what needed to be done to get
it right.

> > > The code shouldn't break horribly for RTL text (it doesn't).
> >
> > It _will_ break for RTL text, you just didn't yet see it because you
> > only tested it in simple use cases.  UAX#9 defines a lot of optional
> > features, including multi-level directional overrides and embeddings,
> > it isn't just right-to-left vs left-to-right.
> 
> I assume bidi.c handles that, as it does for composite.c?

Yes, but only _if_you_use_them_correctly_!  If you bypass them, then
all bets are off.

> > > We have something that superficially results in a similar screen
> > > layout to what I want, but that actually represents display elements
> > > in a way that makes them unusable for my purposes.
> >
> > Then please describe what doesn't fit your purpose, and let's focus on
> > extending the existing code to do what's missing.
> 
> The three main things are:
>  - "entering" glyphs, instead of treating them as atomic

Why is that needed?  A ligature is a single display entity, that's why
fonts ligate.  Why would we want to break ligatures when we wrap
lines?

>  - providing context automatically rather than by providing specific
> regexps for it in advance

That's a separate part of the problem; I wasn't talking about it.  It
needs a separate solution (which was not yet presented), but the
solution doesn't have to be based on regexps if a better or smarter or
faster way is available.  Extending composition-function-table to
support context definition by means other than regexp is easy and
doesn't disrupt the way the code works.

>  - kerning, which requires context for every character

That's again about that separate part of the problem, because once the
context was determined correctly, the shaper will perform the kerning
for you.

>  - ligatures that come partly from a display property and partly from
> the buffer (composite.c doesn't allow for those, as far as I can tell)

It doesn't and it shouldn't!  Text of display strings and overlay
strings is completely isolated from buffer text, and is even
bidi-reordered independently.  This is by design.  These strings are
more akin to images than to a part of buffer text, so mixing them with
buffer text on display would be a grave mistake.

> > Please note: I'm not talking about the regexp part -- that part you
> > anyway will need to decide how to extend or augment.  I'm telling you
> > right here and now that blindly taking a fixed amount of surrounding
> > text will not be acceptable.  You can either come up with some smarter
> > regexp (and you are wrong: the regexps in composition-function-table
> > do NOT have to match only fixed strings, you can see that they don't
> > in the part of the table we set up for the Arabic script);
> 
> Again, I think the limits are fixed: 4 characters of history and 500
> characters of look-ahead. What am I missing?

Fixed limits and fixed strings are two different things.  You can
match strings of many different lengths up to a limit.

The 3 previous characters are rarely needed, certainly not for English
ligatures, because you can detect the sequence by the first character.
So this is rarely a limitation; but again, it can be expanded if
needed with little if any effect on the code.

(And where did you see the 500-character limitation of look-ahead?)

Anyway, you again focus on the (separate) issue of determining the
context.  Whereas I was mainly talking about what happens _after_ you
determine the context: how do you collect the characters to pass to
the shaper, how you present to the layout code the glyphs returned by
the shaper, and how you lay out those glyphs by inserting them into
the glyph rows of the glyph matrix.  It is this code that I see no
reason to modify, definitely not significantly.

> > or you can
> > decide on something more complex, like a function.  Either way, the
> > amount of text that this will pick up and pass to the shaper should be
> > reasonable and should be determined by some understandable rules.  And
> > those rules must be controllable from Lisp.
> 
> That last part isn't true for the composite.c code, which imposes a
> limit of 4 characters of history and 500 characters of look-ahead

How do those limits violate the above requirement?  The 3-char
prev-chars limit is "reasonable" because we currently don't need more,
and the other limit doesn't exist AFAICT -- you could make a regexp
that matched very long strings, if needed.  And the rules to use to
set up the regexp are definitely "understandable" and can be
controlled from Lisp.



reply via email to

[Prev in Thread] Current Thread [Next in Thread]