bug-groff
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

RE: minor hyphenation issue


From: Barbara Beeton
Subject: RE: minor hyphenation issue
Date: Wed, 24 May 2017 12:30:03 +0000

   comments from werner:

    [...]

    Yes.  But as mentioned above, you usually have to regenerate the
    patterns completely, since adding patterns manually is a black art and
    can have unwanted side effects for other words.[*] It's *far* easier
    to add another hyphenation exception file that holds words with
    non-ASCII characters – I assume there aren't that much, right?

    [*] I speak from experience with the German hyphenation patterns that
        are generated from a list of words with about 465000 entries.  It
        really surprises me that there doesn't exist a similar effort
        (i.e., basing the patterns on a known word list) for US
        English...

if I remember correctly, kuiken's patterns were generated
using the additions from the then-current exception list that
I maintain, and nothing more than that and the original list.
the really effective additions were largely place names, which
tend to be reasonably regular in their patterns but the patterns
don't really show up anywhere else.  later additions include
names of chemical compounds, which are also usually regular;
frequently these aren't present in the unabridged (paper)
dictionary that I use as the model but are easily parsed by
analogy to a shorter word that *in* present there.  (why don't
I use an online dictionary?  I've observed that the hyphenation
of some words has changed from the print version, and I think
it's better to be consistent with what was current when the
original list was prepared.  a successor may think differently.)

unless I'm missing something important, I believe that, if words
with non-ascii letters are added to the base corpus, the original
patgen won't suffice; however, appropriate adaptations would
certainly have been made to allow german to be handled
successfully, and that version should be used.  (werner -- you
probably have more knowledge about this than I do.)

I do have a stack of proposed additions to the exception list
that I intend to vet for publication of the next edition in the
last tugboat issue for this year.  but, as usual, they won't contain
any non-ascii letters.  so that material will have to come from
somewhere else.  I'm not unwilling to collect such a list -- and
add a new section to the exception list, with appropriate
commentary -- but don't have time to dig for entries myself.
                                                -- bb



reply via email to

[Prev in Thread] Current Thread [Next in Thread]