[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[bug #60566] gpinyin: puts tone mark on the wrong vowel in syllabic vowe
From: |
G. Branden Robinson |
Subject: |
[bug #60566] gpinyin: puts tone mark on the wrong vowel in syllabic vowel clusters |
Date: |
Sun, 9 May 2021 16:15:50 -0400 (EDT) |
User-agent: |
Mozilla/5.0 (X11; Linux x86_64; rv:78.0) Gecko/20100101 Firefox/78.0 |
URL:
<https://savannah.gnu.org/bugs/?60566>
Summary: gpinyin: puts tone mark on the wrong vowel in
syllabic vowel clusters
Project: GNU troff
Submitted by: gbranden
Submitted on: Sun 09 May 2021 08:15:48 PM UTC
Category: Preprocessor - others
Severity: 3 - Normal
Item Group: Incorrect behaviour
Status: None
Privacy: Public
Assigned to: None
Open/Closed: Open
Discussion Lock: Any
Planned Release: None
_______________________________________________________
Details:
gpinyin appears to always put the tone mark on the last vowel in a syllabic
vowel cluster.
Sadly the rule isn't that simple.
https://web.mit.edu/jinzhang/www/pinyin/spellingrules/index.html
For instance, gpinyin produces "chaōshì" when "chāoshì" is correct.
At first I thought this would be a big pain in the ass to fix, requiring a
refactor of the code to track more state, but then I noticed--Bernd laid a
useful foundation for a different approach.
gpinyin's subs.pl has a big list of "all" of the Mandarin syllables (without
tone marks). Puzzlingly, this is in fact a hash rather than a list, but the
value of _every_ key is simply the integer 1. (Maybe Bernd assumed a hash
would be faster for lookups--all he ever does is an existence test for a
keys.)
But whereas if I'd been reviewing his code at the time, I'd have suggested
that %syllables was thus overdesigned or prematurely optimized, today it means
we can adapt it to a useful purpose: storage of an indicator telling us the
vowel to which the tone mark should be applied.
So my proposed solution is a grind through the ~411 hash keys, applying the
rules from the site above, and recording the finding in the hash values
somehow. Many of the syllables have only one vowel, so they can be skipped or
left with some default value.
I'm not decided yet on how to encode the requisite information. One method
would be simply to record a string offset into the syllable key for where the
tone should go. This would affect all of the syllables. Bernd already has
logic for locating vowels within syllables, however. The
interesting/challenging parts of the problem are the syllables with multiple
vowels. So, instead of a "string offset", maybe a "vowel offset" should be
recorded.
More ambitiously, but perhaps excessively so, the syllables could be
categorized according to the rules (some encoding of "first vowel medial", for
example), and then the correct thing done in logic later. I do suspect this
is overkill.
I'm not working on this yet--apparently it turns out that most readers of
Pinyin not only figure out the correct reading of the syllable when the tone
mark is misplaced, but they often do so by instinct through familiarity (in
much the same process that one overlooks typos). And gpinyin has much worse
problems that need to be solved first, and for which I have fixes in various
stages of progress.
Nevertheless, typography is an exacting art and the thought of our beloved
groff system serving up the equivalent of a child's scrawl in Pinyin is
repugnant to me.
Our output should be exemplary.
_______________________________________________________
Reply to this item at:
<https://savannah.gnu.org/bugs/?60566>
_______________________________________________
Message sent via Savannah
https://savannah.gnu.org/
[Prev in Thread] |
Current Thread |
[Next in Thread] |
- [bug #60566] gpinyin: puts tone mark on the wrong vowel in syllabic vowel clusters,
G. Branden Robinson <=