[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[bug #65601] [troff] bogus 'bogus composite' errors introduced by commit
From: |
G. Branden Robinson |
Subject: |
[bug #65601] [troff] bogus 'bogus composite' errors introduced by commit 6008b6b7aa |
Date: |
Wed, 1 May 2024 23:00:20 -0400 (EDT) |
Follow-up Comment #4, bug #65601 (group groff):
[comment #3 comment #3:]
> It is very simple!
> [derij@pip build (master)]$ echo "This is the arabic alef with a madda above
-> آ"|groff -Tutf8 -Kutf8 -z
> troff:<standard input>:1: error: cannot format glyph: 'u0627_0653' is not a
valid composite character
Yes, thank you--that's helpful.
> It is no longer possible to output this particular utf-8 character (along
with hundreds of others), all perfectly valid utf-8 to the utf-8 device.
Apparently. Bummer.
> I can't believe that is your intention, to police which particular parts of
unicode are acceptable to you.
But you can contemplate it seriously enough to put it into words.
It would help if you didn't assume autocratic motives behind my code changes.
This is the complete repertoire of composite components known to _groff_ for
quite a while now.
87909b1715 (Werner LEMBERG 2003-03-01 07:34:52 +0000 1) .\"
composite.tmac
87909b1715 (Werner LEMBERG 2003-03-01 07:34:52 +0000 2) .
77d9af6df8 (Werner LEMBERG 2003-03-12 23:00:25 +0000 3) .do composite ga
u0300
77d9af6df8 (Werner LEMBERG 2003-03-12 23:00:25 +0000 4) .do composite `
u0300
77d9af6df8 (Werner LEMBERG 2003-03-12 23:00:25 +0000 5) .do composite aa
u0301
77d9af6df8 (Werner LEMBERG 2003-03-12 23:00:25 +0000 6) .do composite '
u0301
77d9af6df8 (Werner LEMBERG 2003-03-12 23:00:25 +0000 7) .do composite a^
u0302
77d9af6df8 (Werner LEMBERG 2003-03-12 23:00:25 +0000 8) .do composite ^
u0302
77d9af6df8 (Werner LEMBERG 2003-03-12 23:00:25 +0000 9) .do composite a~
u0303
77d9af6df8 (Werner LEMBERG 2003-03-12 23:00:25 +0000 10) .do composite ~
u0303
77d9af6df8 (Werner LEMBERG 2003-03-12 23:00:25 +0000 11) .do composite a-
u0304
77d9af6df8 (Werner LEMBERG 2003-03-12 23:00:25 +0000 12) .do composite -
u0304
77d9af6df8 (Werner LEMBERG 2003-03-12 23:00:25 +0000 13) .do composite ab
u0306
77d9af6df8 (Werner LEMBERG 2003-03-12 23:00:25 +0000 14) .do composite a.
u0307
77d9af6df8 (Werner LEMBERG 2003-03-12 23:00:25 +0000 15) .do composite .
u0307
77d9af6df8 (Werner LEMBERG 2003-03-12 23:00:25 +0000 16) .do composite ad
u0308
94c91fca8c (Werner LEMBERG 2006-02-28 13:04:27 +0000 17) .do composite :
u0308
77d9af6df8 (Werner LEMBERG 2003-03-12 23:00:25 +0000 18) .do composite ao
u030A
77d9af6df8 (Werner LEMBERG 2003-03-12 23:00:25 +0000 19) .do composite a"
u030B
77d9af6df8 (Werner LEMBERG 2003-03-12 23:00:25 +0000 20) .do composite "
u030B
77d9af6df8 (Werner LEMBERG 2003-03-12 23:00:25 +0000 21) .do composite ah
u030C
77d9af6df8 (Werner LEMBERG 2003-03-12 23:00:25 +0000 22) .do composite ac
u0327
be90ad7557 (Werner LEMBERG 2003-12-19 23:30:02 +0000 23) .do composite ,
u0327
77d9af6df8 (Werner LEMBERG 2003-03-12 23:00:25 +0000 24) .do composite ho
u0328
87909b1715 (Werner LEMBERG 2003-03-01 07:34:52 +0000 25) .
47fc0a18b8 (G. Branden Robinson 2017-11-18 17:49:36 -0500 26) .\" Local
Variables:
47fc0a18b8 (G. Branden Robinson 2017-11-18 17:49:36 -0500 27) .\" mode: nroff
47fc0a18b8 (G. Branden Robinson 2017-11-18 17:49:36 -0500 28) .\" fill-column:
72
47fc0a18b8 (G. Branden Robinson 2017-11-18 17:49:36 -0500 29) .\" End:
47fc0a18b8 (G. Branden Robinson 2017-11-18 17:49:36 -0500 30) .\" vim: set
filetype=groff textwidth=72:
If a policeman's baton is being swung here, it's not mine. (Though Keith
Marshall might disagree as regards editor settings.)
> Just in case the character in the above example gets mangled by savannah,
here's the equivalent:
It came through fine for me, fortunately.
> I hope this gives you enough information to fix this bug.
Yup, I'll revert the change. At some point I would like to validate the
segregated/spacey/whatever form of escape sequence differently.
Or, if it's too hard, not at all. That also would be a bummer.
Some notes, probably mainly for my own benefit. Not particularly
PDF-relevant.
This problem has clarified my thinking around what the formatter needs to know
to delegate grapheme cluster composition to the device,[1][2] and strengthens
my suspicion that HTML output should take place in nroff mode. This is
because font families and type sizes are less HTML's job than CSS's, and we
might as well communicate such things to the document via different means
("tags", but not exactly as the Mulley/Lemberg solution has applied them).
Moreover, stylesheet selection should probably be an option in the output
driver, with a stock one generated or embedded in the absence of a user's
choice. (This isn't too different from how _grops_ uses a PostScript
prologue, for example.)
[1] We won't have problems if grapheme cluster composition can't turn a
half-width character into a full-width one or vice versa (mostly an issue for
terminals), and moreover the formatter doesn't need to know how wide a
grapheme cluster is if it's not responsible for line breaking decisions. And
for HTML, it isn't.
[2] Typesetting devices will still have to get such information back to the
formatter. If a composite character has different metrics from the base
character, the formatter **must** know this; hence the warnings already
produced.
_______________________________________________________
Reply to this item at:
<https://savannah.gnu.org/bugs/?65601>
_______________________________________________
Message sent via Savannah
https://savannah.gnu.org/