[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: @Eq, column-width variable, CJK/unicode support?

From: Tobias Gerdin
Subject: Re: @Eq, column-width variable, CJK/unicode support?
Date: Mon, 27 Sep 2004 21:45:24 +0400 (MSD)

On 2004-09-23, at 12.37, David Kuehling wrote:
Hmm, something like

   {} @Scale @IncludeGraphics { ... }

should always scale to fill the currently available width (user guide
p171).  Never try this within a @Display, though.

Worked, thanks!

OK, now the biggest problem. What about CJK support?? Pretty please
with sugar on top! I'd need all three of them! Just left-to-right CJK
support would be quite enough. With CJK support I guess unicode would
come in handy as well...

I asked these questions in my first mail to this list (actually only
about the "J" in "CJK"), seems that there is nothing like this available
currently.  I also need that, but not too urgently (the project I need
that for won't have anything to print before 2006 I guess, until then
everything is XML and the jTeX output module works most of the time).
So I currently consider implementing this myself.

First I thought about implementation via Lout's filters, which would
generate Lout symbols for every CJK character which would in turn
generate postscript code, character by character.  Probably with some
scaling like `1.0f @Wide @Scale' -- per character.  *very* inefficient
though and might not work well in all situations (eg getting this into
the databases that are used for translating words like "Figure",
"Appendix" etc).  Especially I'm not sure about whether the default CJK
PostScript fonts (Ryumin-Light and GothicBBB-Medium) should be typeset
non-proportionally with all characters the same widths.  I frequently
read japanese texts that seem to be typeset with proportional fonts.

Proportional typesetting would make the filter-script much more
difficult, and getting this work nicely with the current Lout fontsize
etc is even more a hurdle.

I also considered, doing this the CJK-TeXish way: splitting japanese
fonts into subfonts,each with some 96 chars (like eg on JIS code-plane).
Then to reference a character (with the right widths from the font
metrics) could be done via some font-switching code.  Decoding japanese
input coding would still be difficult.  Two methods seem possible: (1)
again use some filter script.  (2) define each japanese character in
its, say EUC-JP code via `def'. Lout's `def' allows names with multiple
non-alphanumeric characters which should do the job.  Problem is, that
Lout may classify some charcodes > 128 as letters (latin-1 accented
characters etc, expert guide, p13) which would interfere with the
japanese charcode definitions.  Don't know wheter that can be disabled.
This is also quite a hack and will interfere with Lout's font handling
code.  It will also again lead to problems with getting japanese
characters into those standard language-dependent strings.  Another
problem is Lout's limit on the total number of fonts (256 I think).
Chinese, Japanese and Korean won't fit into 256 subfonts.  At least not
when using multiple font styles (mincho vs. gothic etc.)

Another problem is Japanese line breaking.  Some simple algorith seems
to be applicable here (with modern, proportional japanese typesetting,
older typesetting style with fullstops hanging outside right margin etc
might be more difficult to achieve).  Just define a list of characters
that are not allowed to remain as the last character of the line, and a
list of characters that must not start a line.  This seems to be
sufficient, at least for Japanese.

That algorithm can both be implemented with a filter script and even
with the `def'-style decoding: Just define all characters, that mustn't
start a line, as operators that bind with the previous character into
one unbreakable compound.

After all those considerations I'm almost at the point where I want to
badly hack the Lout source code: Make everything unicode (32bit per
char), allow UTF-8 as only input coding system.  Add Unicode->Whatever
transcoding tables for fonts, maybe add some method for defining
fontsets consisting of multiple Postscript fonts (so that eg one can
typeset Latin, Japanese, Chines and Korean with the default Roman font).
Also the hyphenation engine would need to be hacked to support those
primitive Japanese line breaking rules.

Not sure about whether vertical typesetting could be implemented easily. Well, one simple method would be rotating the font and rotating the page
in opposite direction.  Heck that's simple :).  Hacking lout's galley
flushing algorithm is definitely one of the things I do *not* want to
do.  :).

Sorry for that lenghtly vapourware description.  It might help
motivating me if I know that at least one other person requres CJK in
Lout. And knowing whether my implementation ideas seem sensible or like
nonsese to others.  If you have some time, I would definitely need help
on the "C" and "K" sides of CJK (typesetting rules, postscript font
encodings etc).

Well, the non-unicode approaches seem rather complicated and very quick-hack-ish so I the unicode-solution definately seems like the most attractive one for me, but I guess it might also end up being the most time-consuming route I guess (it being the "right" way). I guess it depends on how much assumptions the current lout source makes regarding number of bytes per character. UTF-8 input seems sufficient to me and I wouldn't bother with vertical writing.

I don't know much details about proportional vs non-proportional and line breaking. Proportional would be preferable even though I can't say I've seen a lot of Japanese texts typeset this way. Japanese line-breaking shouldn't be that complicated (never realized that there were rules actually), your algorithm should be ok.

So, if you feel you have the time to work on these, please! Nonpareil looks like what I'm looking for but it looks like it's still a few years off..

/ Tobias

reply via email to

[Prev in Thread] Current Thread [Next in Thread]