[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
RE: minor hyphenation issue
From: |
Barbara Beeton |
Subject: |
RE: minor hyphenation issue |
Date: |
Wed, 24 May 2017 12:30:03 +0000 |
comments from werner:
[...]
Yes. But as mentioned above, you usually have to regenerate the
patterns completely, since adding patterns manually is a black art and
can have unwanted side effects for other words.[*] It's *far* easier
to add another hyphenation exception file that holds words with
non-ASCII characters – I assume there aren't that much, right?
[*] I speak from experience with the German hyphenation patterns that
are generated from a list of words with about 465000 entries. It
really surprises me that there doesn't exist a similar effort
(i.e., basing the patterns on a known word list) for US
English...
if I remember correctly, kuiken's patterns were generated
using the additions from the then-current exception list that
I maintain, and nothing more than that and the original list.
the really effective additions were largely place names, which
tend to be reasonably regular in their patterns but the patterns
don't really show up anywhere else. later additions include
names of chemical compounds, which are also usually regular;
frequently these aren't present in the unabridged (paper)
dictionary that I use as the model but are easily parsed by
analogy to a shorter word that *in* present there. (why don't
I use an online dictionary? I've observed that the hyphenation
of some words has changed from the print version, and I think
it's better to be consistent with what was current when the
original list was prepared. a successor may think differently.)
unless I'm missing something important, I believe that, if words
with non-ascii letters are added to the base corpus, the original
patgen won't suffice; however, appropriate adaptations would
certainly have been made to allow german to be handled
successfully, and that version should be used. (werner -- you
probably have more knowledge about this than I do.)
I do have a stack of proposed additions to the exception list
that I intend to vet for publication of the next edition in the
last tugboat issue for this year. but, as usual, they won't contain
any non-ascii letters. so that material will have to come from
somewhere else. I'm not unwilling to collect such a list -- and
add a new section to the exception list, with appropriate
commentary -- but don't have time to dig for entries myself.
-- bb
- Re: minor hyphenation issue, Dave Kemper, 2017/05/17
- Re: minor hyphenation issue, Werner LEMBERG, 2017/05/21
- RE: minor hyphenation issue, Barbara Beeton, 2017/05/21
- RE: minor hyphenation issue, Karl Berry, 2017/05/22
- Re: minor hyphenation issue, Dave Kemper, 2017/05/23
- Re: minor hyphenation issue, Werner LEMBERG, 2017/05/24
- RE: minor hyphenation issue,
Barbara Beeton <=
- Re: minor hyphenation issue, Werner LEMBERG, 2017/05/24
- RE: minor hyphenation issue, Barbara Beeton, 2017/05/24
- RE: minor hyphenation issue, Karl Berry, 2017/05/25