lynx-dev
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: lynx-dev hyphenation (was tech. question: translating strings)


From: Klaus Weide
Subject: Re: lynx-dev hyphenation (was tech. question: translating strings)
Date: Mon, 6 Sep 1999 12:41:05 -0500 (CDT)

[ I am only replying to some portions now; maybe more later ]

On Tue, 7 Sep 1999, Vlad Harchev wrote:
> On Mon, 6 Sep 1999, Klaus Weide wrote:
> 
> > [ last part of a series of replies ]
> > On Sun, 5 Sep 1999, Vlad Harchev wrote:
> > And that is one good reason why translation to the d.c.s. should be
> > deferred to a later stage, i.e. it should be done as late as possible
> > (GridText.c instead of SGML.c) so that various pieces of code that look
> > at the data stream can assume it is in a standard encoding.
> 
>  Better have it in wide characters rather than in utf8 then. 

Yes, that's one possibility.  And "wide characters" could be either
2 or 4 bytes.  And the format for passing text around (SGML.c ->
HTML.c ->GridText.c) doesn't have to be the same that's used for
"storing it" where memory usage is important (i.e. mostly HTLine).

> But I don't see
> any use of it, really (it would be useful for generalized 'isalpha()',
> 'tolower()', etc, but this IMO is used only in searching for strings).

There are losts of things that could become simpler if a Unicode
representation were used throughout.

> >[...]
> > > > You're using linux.  Give --enable-font-switch a try!
> 
>  I had the following problems:
>  When exiting from lynx, the something wrong went with console driver, each
> letter is doubled in height (ie each letter occupied 2 rows). When I invoke
> 'reset', the height of each letter returned to 1 row, but only the upper half
> of the display was used, while lower was also changing with some strange
> stuff. I had to reboot linux to fix this (I didn't try to set the console
> dimentsions to match real). And I have no reason to change fonts: russian,
> pseudographics and ascii symbols fit in one font.

What size is your screen (in terms of character cells)?
What is the normal zie of fonts you are using on the console?
Are you using "svgatextmode" or something similar?

Anyway all lynx does is invoke the "setfont" command with various
arguments (well that, and some escape sequences).  If that breaks your
system in the way you describe, then your font size is unusal (you'd
have to adjust the hardwired font filenames in UCAuto.c) or something
is wrong with your "setfont" command.

What I don't understand is why this happens (only?) on _exiting_ lynx,
that should just restore the original state.  You could try to run the
"setfont" invocations by hand.


> > I never believe claims that such-and-such people will not have any 
> > problems.
> 
>  But seems my statement is correct.

So is my statement that I don't believe it. :)


>  I plan to support "lang" attribute.

Ok, I thought so far you wouldn't.

> > >  And IMO, as log as UTF8 is not widely used _in_documents_ (not on 
> > > terminals),
> > > the problem with documents mixing several,say, latin-1 encoded languages 
> > > will
> > > remain.
> > 
> > What does UTF-8 in documents have to do with mixing several languages
> > that use the same repertoire in one document?  Nothing as far as I
> > can tell.  UTF-8 is just a trannsmission format.  And its slow rate
> > of adoption in the outside world has not kept lynx from using it
> > internally.
> 
>  I'm glad that you understand that UTF-8 (and UCS*) doesn't  have anything
> with "mixing several languages that use the same repertoire in one document"
> (I thought I thought that this was a solution).

Huh?  It was you who seemed to somehow seem a connection between "UTF8
in documents" (i.e. externally) and "mixing languages".  Now you seem
to change the topic to something else completely.

> The 'lang=' is for solving 
> this. Why do you push "unicode" everywhere?

It is already used in lynx for the character translations.  Whether you
know it or not, when you view a cp<something> Russian text with KOI8-R
you are using it.  Using it as a common lingua franca allows translation
between N charsets with O(N) instead of O(N**2) tables.  That alone
should be good enough reasons for using it internally.

> > >  As I said, the hyrules for these particular languages can be 
> > > concatenated to
> > > get hyrules for Cyrillic and German - they have disjoint set of character
> > > codes.
> > 
> > Merely an accident (as said elsewhere), and does it really work in your
> > approach unless you have a display character set with both LATIN
> > CAPITAL LETTER A WITH DIAERESIS and CYRILLIC CAPITAL LETTER IO?
> 
>  I assume you mean these letters have equal char.codes in d.c.s.

No, not at all!

>  If I was encountering such documents, I'd compose or choose another font -
> that means that these 2 chars will have different character codes in that
> d.c.s. 

The point was that *there is no 8-bit charset* that has them both.

>  "and like" means CJK texts (hyphenation doesn't make sense for J, but for C
> and K I don't know). As for utf8-encoded hyrules  - the hyphenation simply
> won't work or dictionary won't load by libhnj. In other words, each signle 
> byte in  hyrules denotes a single "human letter", each single byte in d.c.s.
> denotes a single "human letter" (and not part of letter) - to make direct
> table-driven translation possible.

You could change it to operate on shorts instead of bytes, right?


   Klaus


reply via email to

[Prev in Thread] Current Thread [Next in Thread]