[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: lynx-dev Emacs + UTF-8 + Lynx

From: Klaus Weide
Subject: Re: lynx-dev Emacs + UTF-8 + Lynx
Date: Thu, 25 May 2000 13:05:09 -0500 (CDT)

On 25 May 2000, Sergei Pokrovsky wrote:

> There is a problem in using Lynx inside Emacs for the Unicode package.
> Actually I am on Solaris 2.7, but I could not make its Unicode locale
> to work satisfactorily (actually, even the Unicode fonts are a
> problem).
> So I've installed the UCS package modified with oc-unicode in order to
> use the rich Unicode font by Markus Kuhn.  This works well in Emacs,

I have no experience with running lynx in emacs, or with the packages
you mention (still using Emacs 19...).  But supposedly I understand
the UTF-8 stuff in lynx (because I wrote it...).

> and also I am able to call Lynx in a UTF-8 environment via
> (let ((coding-system-for-write 'utf-8) (coding-system-for-read 'utf-8))
>         (browse-url-lynx-emacs
>          "file:///export/html/home.html")
> )

What is browse-url-lynx-emacs?  With which command line options does
it call lynx?

If you call lynx non-interactively, i.e. with -dump, and then cat
the results in a similar environment - are the characters correct?

> and initially it looks almost okay.
> But there are tree problems (as I see them at
> 1) The em-dash looks as a pseudographics (┠) or some other rubbish

Correct (what I get from lynx, shows correctly in an 'xterm -u8' window):
 (—) = (\342\200\224) = (0xE2 0x80 0x94)

Corrupted (as received here in your mail):
 (┠) = (\342\224\240) = (0xE2 0x94 0xA0)

> 3) The Cyrillyc letter SMALL R (р) always spoils the rendering
>    of itself and the following text, as at
>    <>.

&#x440; = &#1088;  =  correct UTF-8: (0xD1 0x80) 

Again, I get the correct output.

Note the common pattern: in both cases, the UTF-8 encoding contains
a 0x80 byte.  Possibly this byte value triggers some error, either
in emacs' terminal emulator or in the curses or slang library your
lynx binary is using.  I don't think the error is in the lynx code,
it's hard to see why it would get just that byte value wrong (and
if it's lynx itself, I should see the error, too.)

> 2) When the cursor passes through an anchor, some two-byte characters
>    are doubled; e.g. the line
>                   E, F, G, Ĝ, H, Ĥ, I, J, Ĵ   R, rf..rv                 
>    (where all the words except R are anchors) becomes, if I pass
>    through it with [down]
>                   E, F, G,ĜĜ, H,ĤĤ, I, J,ĴĴ   R, rf..rv                 

> I've set
> CHARACTER_SET:utf-8                                                           
> ASSUME_LOCAL_CHARSET:iso-8859-3                                               
> SHOW_CURSOR:TRUE                                                              
> (Lynx Version 2.8.2rel.1) 

The first thing to try is a newer lynx version.  Get 2.8.3rel.
There are substantial changes in how UTF-8 output is handled, they might
solve your problem 2).

> It is quite possible that the problem is not in Lynx but in the Emacs
> term emulation.  In this respect learning that Lynx does not suffer
> from the problems I've mentioned in a UTF-8 aware terminal (and
> renders the indicated URL correctly) could help me in finding the
> culprit.

I tried your example URLs in xterm -u8 and did not observe either of
your 3 problems.  However, that may depend on the terminal description
for the $TERM, as well as the curses or slang library.  (Both ncurses and
slang work for me though, under Linux.

Some glitches are nearly unavoidable.  The fundamental problem is that
lynx is using a non-UTF-8-aware library for display output, and the
library counts character positions differently than what they physically
are.  Lynx tries to work around that, but that goes only so far.

Please let us know whether 2.8.3 improves things for you.

If your problems remain, a possible workaround is a modified terminfo /
termcap description.  Basically with less escape sequences for cursor
positioning, so curses / slang has less opportunity to optimize cursor
movement (and thereby mess up UTF-8 output).  If you are interested in
pursuing that, send your terminal description (for whatever $TERM lynx
sees when invoked your way), and I can suggest some changes.


; To UNSUBSCRIBE: Send "unsubscribe lynx-dev" to address@hidden

reply via email to

[Prev in Thread] Current Thread [Next in Thread]