lynx-dev
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

LYNX-DEV ISO-8859-2 HTML entities for HTMLDTD.c


From: Hynek Med
Subject: LYNX-DEV ISO-8859-2 HTML entities for HTMLDTD.c
Date: Sun, 9 Mar 1997 18:43:05 +0100 (MET)

I have added ISO-8859-2 entities like č to Klaus' charset patches. 
I hope I got it all right - is &udie; an 'u' with diaeresis and is 'a'
with double accents &adouble; ? Where can I get these standards? 

It's strange anyway. ć works, while &tacute; doesn't. Ą 
works, ą doesn't. Either I forgot something (like sort it) or
there's something wrong with the code.. 

An example of this: 

Document with ą Ą Č ů &udie; produces: 

  ą ¡ Č ů ü

(Aogon and udie are OK, others not.) 

Trace outpus shows: 

SGML: Unknown entity Aogon so far, checking extra...  SGML: Unknown entity
Ccaron so far, checking extra...  SGML: Unknown entity Ccaron SGML:
Unknown entity uring so far, checking extra...  SGML: Unknown entity uring
SGML: Unknown entity udie so far, checking extra... 


Attached to this mail is my patch, relative to lynx2-7 subdirectory (i.e. 
cd to lynx2-7, then patch). I can produce other entities (like ISO-8859-3
or still missing entities from ISO-8859-2 I don't know HTML name for (like
DOUBLE ACUTE ACCENT, MULTIPLICATION SIGN, DIVISION SIGN, DOT ABOVE..)) - I
have a perl script to do that from Unicode's 8859-x.TXT, I only need to
know relationships like "X WITH CIRCUMFLEX -> ◯".. 

Hynek



--
Hynek Med, address@hidden
--- WWW/Library/Implementation/HTMLDTD.c.orig   Sun Mar  9 14:45:10 1997
+++ WWW/Library/Implementation/HTMLDTD.c        Sun Mar  9 15:10:02 1997
@@ -154,14 +154,101 @@
 
 /* UC_entity_info structure is defined in SGML.h. */
 static CONST UC_entity_info extra_entities[] = {
-  {"Aogon",   0x0104}, /* TEST */ 
-  {"ccaron",  0x010d}, /* c with caron */ 
+
+/* Klaus' tests */ 
+
   {"comma",    44},    /* TEST */ 
   {"lrm",      8206},  /* left-to-right mark */ 
   {"rlm",      8207},  /* right-to-left mark */ 
-  {"zcaron",  0x017e}, /* z with caron */ 
   {"zwnj",     8204},  /* zero width non-joiner */ 
   {"zwj",      8205},  /* zero width joiner */ 
+
+/* ISO-8859-2 entities added by address@hidden
+   I'm not sure if &udie; is right for 'u' with 
+   diaeresis, and whether 'a' with double accents 
+   is really &adouble;
+*/
+
+  {"Aogon",  0x0104},  /* A with ogonek */
+  {"Lstrok",  0x0141},  /* L with stroke */
+  {"Lcaron",  0x013d},  /* L with caron */
+  {"Sacute",  0x015a},  /* S with acute */
+  {"Scaron",  0x0160},  /* S with caron */
+  {"Scedil",  0x015e},  /* S with cedilla */
+  {"Tcaron",  0x0164},  /* T with caron */
+  {"Zacute",  0x0179},  /* Z with acute */
+  {"Zcaron",  0x017d},  /* Z with caron */
+  {"Zdot",  0x017b},  /* Z with dot above */
+  {"aogon",  0x0105},  /* a with ogonek */
+  {"lstrok",  0x0142},  /* l with stroke */
+  {"lcaron",  0x013e},  /* l with caron */
+  {"sacute",  0x015b},  /* s with acute */
+  {"scaron",  0x0161},  /* s with caron */
+  {"scedil",  0x015f},  /* s with cedilla */
+  {"tcaron",  0x0165},  /* t with caron */
+  {"zacute",  0x017a},  /* z with acute */
+  {"zcaron",  0x017e},  /* z with caron */
+  {"zdot",  0x017c},  /* z with dot above */
+  {"Racute",  0x0154},  /* R with acute */
+  {"Aacute",  0x00c1},  /* A with acute */
+  {"Acirc",  0x00c2},  /* A with circumflex */
+  {"Abreve",  0x0102},  /* A with breve */
+  {"Adie",  0x00c4},  /* A with diaeresis */
+  {"Lacute",  0x0139},  /* L with acute */
+  {"Cacute",  0x0106},  /* C with acute */
+  {"Ccedil",  0x00c7},  /* C with cedilla */
+  {"Ccaron",  0x010c},  /* C with caron */
+  {"Eacute",  0x00c9},  /* E with acute */
+  {"Eogon",  0x0118},  /* E with ogonek */
+  {"Edie",  0x00cb},  /* E with diaeresis */
+  {"Ecaron",  0x011a},  /* E with caron */
+  {"Iacute",  0x00cd},  /* I with acute */
+  {"Icirc",  0x00ce},  /* I with circumflex */
+  {"Dcaron",  0x010e},  /* D with caron */
+  {"Dstrok",  0x0110},  /* D with stroke */
+  {"Nacute",  0x0143},  /* N with acute */
+  {"Ncaron",  0x0147},  /* N with caron */
+  {"Oacute",  0x00d3},  /* O with acute */
+  {"Ocirc",  0x00d4},  /* O with circumflex */
+  {"Odouble",  0x0150},  /* O with double acute */
+  {"Odie",  0x00d6},  /* O with diaeresis */
+  {"Rcaron",  0x0158},  /* R with caron */
+  {"Uring",  0x016e},  /* U with ring above */
+  {"Uacute",  0x00da},  /* U with acute */
+  {"Udouble",  0x0170},  /* U with double acute */
+  {"Udie",  0x00dc},  /* U with diaeresis */
+  {"Yacute",  0x00dd},  /* Y with acute */
+  {"Tcedil",  0x0162},  /* T with cedilla */
+  {"racute",  0x0155},  /* r with acute */
+  {"aacute",  0x00e1},  /* a with acute */
+  {"acirc",  0x00e2},  /* a with circumflex */
+  {"abreve",  0x0103},  /* a with breve */
+  {"adie",  0x00e4},  /* a with diaeresis */
+  {"lacute",  0x013a},  /* l with acute */
+  {"cacute",  0x0107},  /* c with acute */
+  {"ccedil",  0x00e7},  /* c with cedilla */
+  {"ccaron",  0x010d},  /* c with caron */
+  {"eacute",  0x00e9},  /* e with acute */
+  {"eogon",  0x0119},  /* e with ogonek */
+  {"edie",  0x00eb},  /* e with diaeresis */
+  {"ecaron",  0x011b},  /* e with caron */
+  {"iacute",  0x00ed},  /* i with acute */
+  {"icirc",  0x00ee},  /* i with circumflex */
+  {"dcaron",  0x010f},  /* d with caron */
+  {"dstrok",  0x0111},  /* d with stroke */
+  {"nacute",  0x0144},  /* n with acute */
+  {"ncaron",  0x0148},  /* n with caron */
+  {"oacute",  0x00f3},  /* o with acute */
+  {"ocirc",  0x00f4},  /* o with circumflex */
+  {"odouble",  0x0151},  /* o with double acute */
+  {"odie",  0x00f6},  /* o with diaeresis */
+  {"rcaron",  0x0159},  /* r with caron */
+  {"uring",  0x016f},  /* u with ring above */
+  {"uacute",  0x00fa},  /* u with acute */
+  {"udouble",  0x0171},  /* u with double acute */
+  {"udie",  0x00fc},  /* u with diaeresis */
+  {"yacute",  0x00fd},  /* y with acute */
+  {"tcedil",  0x0163},  /* t with cedilla */
 };
 #endif /* EXP_CHARTRANS */
 

reply via email to

[Prev in Thread] Current Thread [Next in Thread]