groff
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[groff] The hyphenation algorithm produces wrong results


From: Bjarni Ingi Gislason
Subject: [groff] The hyphenation algorithm produces wrong results
Date: Sun, 4 Mar 2018 00:45:44 +0000
User-agent: Mutt/1.5.20 (2009-12-10)

  With

.ll 1n
.hy 48
.hla us
.hpf hyphen.us
.hpfa hyphenex.us

  The algorithm

1) uses pattern in the wrong places, at the beginning of a word
although no period is in the pattern

2) splits off one letter at the end although I found no corresponding
pattern in the "hyphen.us" file.

Word            hyphenation     pattern

algorithm       al-go-rith-m
exists          ex-ist-s        ?
finding         find-in-g       ?
hyphenations    hy-phen-a-tion-s
missing         miss-in-g       ?
off             of-f
results         re-sult-s
splits          s-plit-s
splitting       s-plit-t-in-g
since           s-ince          s1in
subject         sub-jec-t       ?
you             y-ou            y1o

  The cases '16' and '32' (for .hy) may not add hyphenation points,
just allow already found ones, if otherwise forbidden.

  The examples "since" and "you" are from bug #52457.  Here it is the
algorithm that creates the problems (no strict adherence to the
pattern), not the value of .hy; the value 8 simply hides the bug!

  If the line length is increased to "3n" "finding" becomes "find-ing"
(same with "missing").

  The hyphenation process should strictly follow the pattern and not
the value of the .hy request (splitting "-ing" according to the value).

  The original values of .hy (4 and 8) just remove hyphenations in the
corresponding places.

  There is no '.x[13579]' pattern in the English hyphenation files!

  The value 8 should just remove the hyphenation points '^xy-z'.

  Patterns, that contain no period (.), may not be used to match the
beginning of a word (see "since" and "you").

  So the algorithm has to be fixed and tested with ".hy 1" (the current
stable version) and with ".hy 48" (development) to see if it works
correctly according to the used hyphenation pattern file.

  ".hy 8" removes a lot of valid hyphenation points, like 'de-grade'.

  The new '.hy 1' hides the bug in the algorithm as it eliminates wrong
hyphenations, caused by wrongly applying patterns that are not created
at the start of a word.

-- 
Bjarni I. Gislason



reply via email to

[Prev in Thread] Current Thread [Next in Thread]