lynx-dev
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

lynx-dev Line breaks and Double-byte Charsets


From: Erik Peterson
Subject: lynx-dev Line breaks and Double-byte Charsets
Date: Tue, 27 Jul 1999 16:57:22 -0700

Hello,

   I write CGI programs involving Chinese.  I use lynx for some of
these programs to download a Chinese web page in a nice format
and dump it to a text file for further processing.  I call lynx like
this:

lynx -assume_charset=gb2312 -dump some_url

   I've recently noticed that when lynx formats the text and
insert line breaks, it will sometimes insert line breaks in the
middle of a double-byte character.  This messes up the
following text until the next ASCII range letter.

   I've tried this on both DOS and Unix and with the latest
release version of lynx (2.8.2) and get the same results.

One example site that has text that gets mangled is:
 http://www.voa.gov/chinese/news/mon/072699chinatradevotedivideshouse.htm

This only occurs with html files, text files that lynx leaves does
not reformat are unaffected.

  Getting this fixed would be a great help to me.  Or if I
am missing a necessary command-line switch or something,
please let me know.  If programming help is needed to fix it,
I could probably help there also.

Thank you,
Erik Peterson


reply via email to

[Prev in Thread] Current Thread [Next in Thread]