[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [DotGNU] Gzipped XML (was: Disadvantages of XML)
From: |
Gopal.V |
Subject: |
Re: [DotGNU] Gzipped XML (was: Disadvantages of XML) |
Date: |
Wed, 2 Jan 2002 20:07:07 +0530 |
User-agent: |
Mutt/1.2.5i |
Hi Rhys,
> Essentially, they chose 1-byte numbers for each tag type,
> and then replaced the tag structure with those numbers when
> transmitting the data over the wireless network. i.e., instead
> of sending <TABLE>, you would send "3" (or whatever the
> tag value was). This compacted the XML quite well.
That works very well if you have no attributes for tags. Add
a huffman encoding and that's optimisation.
>
> IMHO, it was a big mistake. There is a massive version bug in
> how they did it. Because version 1 clients don't understand
> version 2 tag numbers, it creates migration problems when
> moving to a new version of the standard. It also lost data:
> DTD's, comments, stylesheets, and other meta information
> were stripped from the input, which made it difficult for
> clients that may want that information.
Backward compatibility is not only a good idea, it's a
requirment.
> At the end of the day, it is easier to just gzip it and forget about
> the problem. No data loss, and roughly the same level of
> compaction. Highly redundant data like XML compresses
> very well. For example, the 6 Mb All.xml file for the C#
> library specification compresses to ~630k using gzip: about
> 10% of the original size.
I would have selected Bzip2 if given an option, but it
does not seem to supported enough on platforms (eg Win32,Mac)
(All.xml.bz2=416k). I have been using Java's GzipInputStream
for my Java+XML programs. Also the CRC32 checks built into
Gzip ensure data integrity.
Well at the end of the day, I'd rather be sleeping rather
than coding on a XML compression no one is going to use.
So we reach the consensus that Gzipped XML is our standard
and now all that remains is to use that standard somewhere ;-).
Gopal.V
--
The difference between insanity and genius is only measured by success
//===<=>===\\
|| GNU RULEZ ||
\\===<=>===//