[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
The hyphenation algorithm produces wrong results
From: |
Bjarni Ingi Gislason |
Subject: |
The hyphenation algorithm produces wrong results |
Date: |
Sun, 4 Mar 2018 00:04:43 +0000 |
User-agent: |
Mutt/1.5.20 (2009-12-10) |
With
.ll 1n
.hy 48
.hla us
.hpf hyphen.us
.hpfa hyphenex.us
The algorithm
1) uses pattern in the wrong places, at the beginning of a word
although no period is in the pattern
2) splits off one letter at the end although I found no corresponding
pattern in the "hyphen.us" file.
Word hyphenation pattern
algorithm al-go-rith-m
exists ex-ist-s ?
finding find-in-g ?
hyphenations hy-phen-a-tion-s
missing miss-in-g ?
off of-f
results re-sult-s
splits s-plit-s
splitting s-plit-t-in-g
since s-ince s1in
subject sub-jec-t ?
you y-ou y1o
The cases '16' and '32' (for .hy) may not add hyphenation points,
just allow already found ones, if otherwise forbidden.
The examples "since" and "you" are from bug #52457. Here it is the
algorithm that creates the problems (no strict adherence to the
pattern), not the value of .hy; the value 8 simply hides the bug!
If the line length is increased to "3n" "finding" becomes "find-ing"
(same with "missing").
The hyphenation process should strictly follow the pattern and not
the value of the .hy request (splitting "-ing" according to the value).
The original values of .hy (4 and 8) just remove hyphenations in the
corresponding places.
There is no '.x[13579]' pattern in the English hyphenation files!
The value 8 should just remove the hyphenation points '^xy-z'.
Patterns, that contain no period (.), may not be used to match the
beginning of a word (see "since" and "you").
So the algorithm has to be fixed and tested with ".hy 1" (the current
stable version) and with ".hy 48" (development) to see if it works
correctly according to the used hyphenation pattern file.
".hy 8" removes a lot of valid hyphenation points, like 'de-grade'.
The new '.hy 1' hides the bug in the algorithm as it eliminates wrong
hyphenations, caused by wrongly applying patterns that are not created
at the start of a word.
--
Bjarni I. Gislason
- The hyphenation algorithm produces wrong results,
Bjarni Ingi Gislason <=