[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[bug #65601] [troff] bogus 'bogus composite' errors introduced by commit
From: |
G. Branden Robinson |
Subject: |
[bug #65601] [troff] bogus 'bogus composite' errors introduced by commit 6008b6b7aa |
Date: |
Wed, 1 May 2024 19:22:47 -0400 (EDT) |
Update of bug #65601 (group groff):
Status: None => Need Info
Summary: Bogus 'bogus composite' errors introduced by commit
6008b6b7aa => [troff] bogus 'bogus composite' errors introduced by commit
6008b6b7aa
_______________________________________________________
Follow-up Comment #2:
[comment #0 original submission:]
> If preconv produces a valid composite character groff should not reject it.
Of course if the composite is not available in any available font
Unfortunately that's not the way GNU _troff_ works. (Or I'm not understanding
the bug report.)
The list of composite characters is global.
Here's what our Texinfo manual says in Git HEAD.
-- Escape sequence: \[base-glyph combining-component ...]
...
GNU 'troff' resolves '\[...]' with more than a single component as
follows:
* Any component that is found in the GGL [groff glyph list --GBR]
is converted to the 'uXXXX' form.
* Any component 'uXXXX' that is found in the list of
decomposable glyphs is decomposed.
* The resulting elements are then concatenated with '_' in
between, dropping the leading 'u' in all elements but the
first.
No check for the existence of any component (similar to 'tr'
request) is done.
Examples:
'\[A ho]'
'A' maps to 'u0041', 'ho' maps to 'u02DB', thus the final
glyph name would be 'u0041_02DB'. This is not the expected
result: the ogonek glyph 'ho' is a spacing ogonek, but for a
proper composite a non-spacing ogonek (U+0328) is necessary.
Looking into the file 'composite.tmac', one can find
'.composite ho u0328', which changes the mapping of 'ho' while
a composite glyph name is constructed, causing the final glyph
name to be 'u0041_0328'.
'\[^E u0301]'
'\[^E aa]'
'\[E a^ aa]'
'\[E ^ ']'
'^E' maps to 'u0045_0302', thus the final glyph name is
'u0045_0302_0301' in all forms (assuming proper calls of the
'composite' request).
It is not possible to define glyphs with names like 'A ho' within a
'groff' font file. This is not really a limitation; instead, you
have to define 'u0041_0328'.
...
-- Request: .composite c1 c2
Map ordinary or special character name C1 to C2 when C1 is a
combining component in a composite character. See above for
examples. This is a strict rewriting of the special character
name; no check is performed for the existence of a glyph for
either. Typically, 'composite' is used to map a spacing character
to a combining one. A set of default mappings for many accents can
be found in the file 'composite.tmac', loaded by the default
'troffrc' at startup.
You can obtain a report of mappings defined by 'composite' on the
standard error stream with the 'pcomposite' request. *Note
Debugging::.
> Personally I see little value in this error,
I do find value in it; in the ChangeLog entry, I provided my rationale. In
the commit message I even provided exhibits of cases that should have produced
a diagnostic but did not.
[troff]: Diagnose bogus composite character escape sequences. That is,
when a composite character escape sequence like \[a ~] has a bogus
modifier (as opposed to base) character, meaning one that has not been
defined as the source _or_ destination of a `composite` request, warn
about it. For instance, \[a $] is nonsense, barring a request like
`.composite $ \[uFF00]`, which would map `$`, when used as a modifier
character in a composite special character escape sequence, to U+FF00,
which would be a modifier form of the dollar sign in an alternate
universe.
...
Input:
.nf
\[A a~]
\[A ~]
\[u0041_0301]
\[u0041_007E] \" should fail because 007E is explicitly spacing
\[u0041_0041] \" same reason, more obviously
\[u0041_0301_0301] \" should fail, would have a different meaning
\[u0041_007E_0301] \" both problems above
groff 1.23.0 and earlier:
$ groff -T ps -z EXPERIMENTS/composite_character_construction.groff
troff:...:5: warning: special character 'u0041_007E' not defined
troff:...:6: warning: special character 'u0041_0041' not defined
troff:...:7: warning: special character 'u0041_0301_0301' not defined
troff:...:8: warning: special character 'u0041_007E_0301' not defined
$ groff -Tutf8 -z EXPERIMENTS/composite_character_construction.groff
[no output due to Savannah #65109]
Now:
$ ./build/test-groff -T ps -z
EXPERIMENTS/composite_character_construction.groff
troff:...:5: warning: special character 'u0041_007E' not defined
troff:...:6: error: cannot format glyph: 'u0041_0041' is not a valid
composite character
troff:...:7: warning: special character 'u0041_0301_0301' not defined
troff:...:8: warning: special character 'u0041_007E_0301' not defined
$ ./build/test-groff -T utf8 -z
EXPERIMENTS/composite_character_construction.groff
troff:...:6: error: cannot format glyph: 'u0041_0041' is not a valid
composite character
> the existing error reporting of a special character not defined is more
helpful since if you find a font which contains the correct glyph, the error
will be gone.
Is this true in full generality? Does it also apply to output devices that
don't even have a "charset" section in their fonts because they're "unicode"
[sic] devices?
groff_font(5):
unicode
The output device supports the complete Unicode repertoire.
This directive is useful only for devices which produce
character entities instead of glyphs.
If unicode is present, no charset section is required in
the font description files since the Unicode handling built
into groff is used. However, if there are entries in a
font description file’s charset section, they either
override the default mappings for those particular
characters or add new mappings (normally for composite
characters).
The utf8, html, and xhtml output devices use this
directive.
(I feel that that's a badly named directive. As I understand it, it, it more
precisely means that a different glyph resolution mechanism is used--or none
at all, instead assuming that the device is happy to attempt to combine any
sequence of Unicode code points as a grapheme cluster.)
> I'm sure there are users capable of creating a font with all sorts of weird
composite glyphs, why should we police what they can do?
Because we have no mechanism for defining font-specific composite character
*components*. (Meaning: "foo" in `\[a foo]`; contrast with the composed
composite characters contemplated by the second paragraph of the "unicode"
directive description quoted above.) Maybe we should, but that in turn would
mean having font-specific macro files that users' documents would need to
load.
And we'd probably need a tool to generate them.
Might be better/more scalable to ask authors of such documents issue the
`composite` requests itself. We can add commonly used ones that we are
presently missing to "composite.tmac".
My anticipation of this problem is why I added a (rather, stopped discouraging
use of an existing) mechanism to delete composite character mappings and a new
request for reporting the ones the formatter knows about.
Or people can bypass this escape sequence syntax entirely and spell their
grapheme clusters in Unicode directly as is already supported.
Our Texinfo manual again:
* A glyph representing more than a single input character is named
'u' COMPONENT1 '_' COMPONENT2 '_' COMPONENT3 ...
Example: 'u0045_0302_0301'.
There may be an opportunity for some terminological revision here. This
section of the manual is one of those I haven't finished my first revision
pass on yet. I still have things to learn. Maybe you can shed some light
where things are dark for me.
commit 2c76a931b81b1e22dd419c7027d3517325c23193
Author: G. Branden Robinson <g.branden.robinson@gmail.com>
Date: Wed Jan 17 14:02:28 2024 -0600
[troff]: Fix Savannah #64937 (del composite char).
* src/roff/troff/input.cpp (map_composite_character): Stop throwing
diagnostic message when `composite` request invoked with only one
argument. This has long worked just fine to delete a composite
character mapping. That is something a (rare) user might conceivably
want to do.
Fixes <https://savannah.gnu.org/bugs/?64937>.
commit e958bb4fc65326dd9cd0d775e96aff15e944795e
Author: G. Branden Robinson <g.branden.robinson@gmail.com>
Date: Wed Jan 17 13:49:40 2024 -0600
[troff]: Implement new `pcomposite` request.
* src/roff/troff/input.cpp (report_composite_characters): Add.
(init_input_requests): Wire up `pcomposite` request name to
`report_composite_characters()`.
* doc/groff.texi (Colors, Debugging):
* man/groff.7.man (Request short reference, Debugging):
* man/groff_diff.7.man (New requests, Debugging):
* NEWS: Document it.
_______________________________________________________
Reply to this item at:
<https://savannah.gnu.org/bugs/?65601>
_______________________________________________
Message sent via Savannah
https://savannah.gnu.org/
- [bug #65601] [troff] bogus 'bogus composite' errors introduced by commit 6008b6b7aa,
G. Branden Robinson <=
- Message not available
- Message not available
- Message not available
- Message not available