qexo-general
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qexo-general] How to get special characters in a query results


From: Per Bothner
Subject: Re: [Qexo-general] How to get special characters in a query results
Date: Mon, 30 Jun 2003 09:10:05 -0700
User-agent: Mozilla/5.0 (Macintosh; U; PPC Mac OS X Mach-O; en-US; rv:1.4) Gecko/20030612

Rami RIFAIEH wrote:

I am using Qexo implemantation with an XML document containing characters
such as ( é, à, ê,...)with ISO-8859-1 encoding.
 ...
Is there any special method (function) to concerve these characters in the
result document?

Preserving the non-ascii characters in the sense of emitting 'é' as 'é' but emitting '&#233' as '&#233' because the data-model does not distinguish them. It might be possible to distinguish them in the TreeList representation, but it would be difficult to get that consistent. I think we're stuck with 'é' and '&#233' being treated the same.

So the alternative is to change the output encoding. The new "serialization" spec discusses an "encoding" parameter.

A complication is this recommendation:

  It is possible that the data model will contain a character that
  cannot be represented in the encoding that the processor is using
  for output. In this case, if the character occurs in a context where
  XML recognizes character references (that is, in the value of an
  attribute node or text node), then the character should be output
  as a character reference.

The problem is determining this.  We can use FileWriter's getEncoding
method to determine if a character is supported, but I believe this
requires code that is specific to JDK 1.4.x, which I'm trying to avoid.
Plus it may be a bit complicated.  But we can hardwire a few common
encoding names.

In the short term, you can just remove these two lines in the
writeChar method in gnu.xml.XMLPrinter:

    else if (v >= 127)
      super.write("&#"+v+";");

However, if we take this out, then people who use a character
not supported any the FileWriter's encoding will get nasty errors.
Hence the existing code:  It's simple, correct, and safe.
--
        --Per Bothner
address@hidden   http://per.bothner.com/






reply via email to

[Prev in Thread] Current Thread [Next in Thread]