[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[bug #59442] [PATCH] groff.cpp: correct the order of preprocessors in th
From: |
G. Branden Robinson |
Subject: |
[bug #59442] [PATCH] groff.cpp: correct the order of preprocessors in the pipeline |
Date: |
Wed, 11 Nov 2020 02:37:27 -0500 (EST) |
User-agent: |
Mozilla/5.0 (X11; Linux x86_64; rv:78.0) Gecko/20100101 Firefox/78.0 |
Update of bug #59442 (project groff):
Item Group: New feature => Incorrect behaviour
Status: None => Need Info
Assigned to: None => gbranden
_______________________________________________________
Follow-up Comment #1:
Bjari's point seems sound to me. The whole point of soelim is to get
preprocessors to run on `.so`ed files (`.mso`, `.pso`).
I'm trying to think of cases where this would break things where they aren't
already broken (we have no test cases here, of course...yet).
Questions to ponder, regarding both current behavior and what happens if we
accept the proposed patch.
1. What happens if a .so(urced) file has a non-ASCII character in its
filename? soelim(1) does not seem to speak to this question. Maybe because it
is assuming preconv(1) has already been run, or, more likely I think, the
issue was not considered when the page was written, since it long predates
preconv itself.
2. What about EBCDIC hosts? Even plain ASCII code points are complete
disarranged in EBCDIC. soelim uses standard C library functions for character
comparison and handling (see src/preproc/soelim/soelim.cpp:do_file()). There
are no EBCDIC worries today because preconv has already busted the input down
to 7-bit chars. What about after this patch? We can't assume that's been
done, so chars with the 8th bit set might be in the input. But I'm not seeing
any problems; anything that soelim doesn't care about as documented in its man
page, it passes through without alteration, amounting to `putchar(getc());`.
And the chars that soelim DOES care about should not be spuriously matched in
a UTF-8 sequence, because all continuation bytes in a UTF-8 sequence have the
high bit set. soelim will, therefore, not spurious match them and munge
them.
Can anyone else think of any objections?
_______________________________________________________
Reply to this item at:
<https://savannah.gnu.org/bugs/?59442>
_______________________________________________
Message sent via Savannah
https://savannah.gnu.org/