[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
LYNX-DEV Fwd: webwatch-l Strange Numbers in Lynx
From: |
Laura Eaves |
Subject: |
LYNX-DEV Fwd: webwatch-l Strange Numbers in Lynx |
Date: |
Sat, 10 May 1997 00:00:52 -0400 (EDT) |
I just received this from Lloyd Rasmussen, who said I could forward it on to
lynx-dev, in case anyone else is interested.
Thanks Lloyd!
--le
Date: Fri, 9 May 97 10:44:10 EDT
From: "Lloyd G. Rasmussen" <address@hidden>
Subject: Fwd: Re: webwatch-l Strange Numbers in Lynx
Dear Laura: Here's some stuff I dug up last week about ’. Since
it's not part of the HTML-sanctioned character set, but appears to be
mostly a Microsoft invention, it falls into the category of "do we
make this a browser that can read everything, or do we make it an HTML
validator." Discussion here on Lynx-dev last week also indicated that
these codes, when flattened from 8 bits to 7, land in the range of
control characters, which scares some programmers, I guess. I hope
you have time to check this out a little. I use Vocal-Eyes under DOS
and Window-Eyes under Win 3.1. I don't have Linux. I work in the
braille and talking book program of the Library of Congress.
----- Forwarded message begins here -----
From: Lloyd G. Rasmussen <address@hidden>
To: address@hidden
Date: Fri, 2 May 97 10:31:03 EDT
Subject: Re: webwatch-l Strange Numbers in Lynx
On Fri, 2 May 1997 05:29:28 -0700 (PDT),
Kelly Ford <address@hidden> wrote:
>If I understand you correctly, the characters I'm asking about such as
>0146 won't change no matter what character set I choose. The MSNBC site
>at http://www.msnbc.com is full of these characters. Is this something we
>should ask Microsoft to correct or is an improvement in Lynx necessary?
>
I suspect we won't be able to get Microsoft to change these. I asked
your question over on Lynxdev and didn't get much of a response.
Perusal of the comp.text.sgml newsgroup turned up a recent large
thread on this subject. Indulge me with the following two newsgroup
messages, inserted below, otherwise you can hit the Delete key now.
Basically, if we are running on code page 1252 or have the proper
graphics browser, we will see these characters properly. You will
also see these characters in some kinds of ASCII saves from MS Word.
There are a couple of web pages referenced for testing these character
entities. If the developers of Lynx can be convinced to support these
"extensions" of ISO char-sets, the problem could be fixed.
------ Forwarded message ends here ------
sholarp wrote:
> A coincidence. Just tonight I have questioned the webmaster
> at the MSNBC web site by e-mail about the use of the encoded
> character ’ (absent the semicolon) throughout that site's
> web pages.
>
> This character appears where an apostrophe, or right single
> quote, should appear. For some reason, the author of MSNBC's
> text consistently uses a nonstandard encoding (with respect to
> both ISO 8859-1 and HTML v3.2 (?)) for this character.
You are fully right and they definitely should fix this quickly!!!
The characters 128-159 are not used in ISO 8859-1 and Unicode,
the character sets of HTML. MS-Windows uses a superset of
ANSI/ISO 8859-1, known to experts as "Code Page 1252 (CP1252)",
a Microsoft specific character set with additional characters
in the 128-159 range (also know as C1 range).
All the CP1252 characters are also available in Unicode.
For example the CP1252 character 146 that you mentioned
(RIGHT SINGLE QUOTATION MARK) has the Unicode number 8217,
therefore you should use this number in order to conform to
the HTML standard. Modern HTML browser like Netscape 4.0
understand Unicode and will automatically convert the Unicode
character ’ back into the character 146 on MS-Windows
machines, and into the suitable character on other systems.
The official CP1252<->Unicode conversion table is printed in
the Unicode 2.0 standard and for instance available on
<ftp://ftp.informatik.uni-erlangen.de/pub/doc/ISO/charsets/>.
MS-Windows HTML authoring software definitely should implement the
conversion table below! Please forward this mail to the developers
of your HTML authoring tool if this is done wrong currently.
The CP1252 characters that are not part of ANSI/ISO 8859-1 and
that should therefore always be encoded as Unicode characters >255
are the following:
0x82 0x201a #SINGLE LOW-9 QUOTATION MARK
0x83 0x0192 #LATIN SMALL LETTER F WITH HOOK
0x84 0x201e #DOUBLE LOW-9 QUOTATION MARK
0x85 0x2026 #HORIZONTAL ELLIPSIS
0x86 0x2020 #DAGGER
0x87 0x2021 #DOUBLE DAGGER
0x88 0x02c6 #MODIFIER LETTER CIRCUMFLEX ACCENT
0x89 0x2030 #PER MILLE SIGN
0x8a 0x0160 #LATIN CAPITAL LETTER S WITH CARON
0x8b 0x2039 #SINGLE LEFT-POINTING ANGLE QUOTATION MARK
0x8c 0x0152 #LATIN CAPITAL LIGATURE OE
0x91 0x2018 #LEFT SINGLE QUOTATION MARK
0x92 0x2019 #RIGHT SINGLE QUOTATION MARK
0x93 0x201c #LEFT DOUBLE QUOTATION MARK
0x94 0x201d #RIGHT DOUBLE QUOTATION MARK
0x95 0x2022 #BULLET
0x96 0x2013 #EN DASH
0x97 0x2014 #EM DASH
0x98 0x02dc #SMALL TILDE
0x99 0x2122 #TRADE MARK SIGN
0x9a 0x0161 #LATIN SMALL LETTER S WITH CARON
0x9b 0x203a #SINGLE RIGHT-POINTING ANGLE QUOTATION MARK
0x9c 0x0153 #LATIN SMALL LIGATURE OE
0x9f 0x0178 #LATIN CAPITAL LETTER Y WITH DIAERESIS
Hope this helped ...
Markus
--
Markus Kuhn, Computer Science grad student, Purdue
University, Indiana, US, email: address@hidden
In article <address@hidden>,
Markus Kuhn <address@hidden> wrote:
> The characters 128-159 are not used in ISO 8859-1 and Unicode, the
> character sets of HTML. MS-Windows uses a superset of ANSI/ISO
> 8859-1, known to experts as "Code Page 1252 (CP1252)", a Microsoft
> specific character set with additional characters in the 128-159
> range. All the CP1252 characters are also available in Unicode.
> For example the CP1252 character 146 that you mentioned (RIGHT
> SINGLE QUOTATION MARK) has the Unicode number 8217, therefore you
> should use this number in order to conform to the HTML standard.
> Modern HTML browser like Netscape 4.0 understand Unicode and will
> automatically convert the Unicode character ’ back into the
> character 146 on MS-Windows machines, and into the suitable
> character on other systems.
> 0x82 0x201a #SINGLE LOW-9 QUOTATION MARK
> 0x83 0x0192 #LATIN SMALL LETTER F WITH HOOK
> 0x84 0x201e #DOUBLE LOW-9 QUOTATION MARK
etc.
Here's a translation of this table into more HTML-author friendly
terms (I've also added this table to the Web page at
http://uts.cc.utexas.edu/~churchh/latin1.html , where you can test
whether your browser understands these entities):
Windows Unicode
char. HTML code Character Description
----- ----- ---------------------
ALT-0130 ‚ Single Low-9 Quotation Mark
ALT-0131 ƒ Latin Small Letter F With Hook
ALT-0132 „ Double Low-9 Quotation Mark
ALT-0133 … Horizontal Ellipsis
ALT-0134 † Dagger
ALT-0135 ‡ Double Dagger
ALT-0136 ˆ Modifier Letter Circumflex Accent
ALT-0137 ‰ Per Mille Sign
ALT-0138 Š Latin Capital Letter S With Caron
ALT-0139 ‹ Single Left-Pointing Angle Quotation Mark
ALT-0140 Œ Latin Capital Ligature OE
ALT-0145 ‘ Left Single Quotation Mark
ALT-0146 ’ Right Single Quotation Mark
ALT-0147 “ Left Double Quotation Mark
ALT-0148 ” Right Double Quotation Mark
ALT-0149 • Bullet
ALT-0150 – En Dash
ALT-0151 — Em Dash
ALT-0152 ˜ Small Tilde
ALT-0153 ™ Trade Mark Sign
ALT-0154 š Latin Small Letter S With Caron
ALT-0155 › Single Right-Pointing Angle Quotation Mark
ALT-0156 œ Latin Small Ligature OE
ALT-0159 Ÿ Latin Capital Letter Y With Diaeresis
--
"You know they've reintroduced the death penalty for insurance company
directors?" "Really?" said Arthur, "No, I didn't. For what offense?"
Trillian frowned. "What do you mean, offense?" "I see." -- _Mostly
Harmless_ || Henry Churchyard || http://uts.cc.utexas.edu/~churchh
-- Lloyd Rasmussen
Senior Staff Engineer, Engineering Section
National Library Service for the Blind and Physically Handicapped
Library of Congress 202-707-0535
(work) address@hidden www.loc.gov/nls/
(home) address@hidden
;
; To UNSUBSCRIBE: Send a mail message to address@hidden
; with "unsubscribe lynx-dev" (without the
; quotation marks) on a line by itself.
;
[Prev in Thread] |
Current Thread |
[Next in Thread] |
- LYNX-DEV Fwd: webwatch-l Strange Numbers in Lynx,
Laura Eaves <=