lmi
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [lmi] an xml schema for (single|multiple)_cell_document file XML for


From: Václav Slavík
Subject: Re: [lmi] an xml schema for (single|multiple)_cell_document file XML format
Date: Mon, 12 Mar 2012 15:46:19 +0100

Hi,

On 9 Mar 2012, at 19:27, Václav Slavík wrote:
>>> We can try to produce a RELAX NG schema for them to see how it goes.
>>> Should we?
>> 
>> Yes, please.
> 
> I have some questions about the format:
> 
> (1) Are the elements under <cell> optional or required? As far as I can tell, 
> the reading code is permissive and will use defaults if a value is missing, 
> but should that be considered a valid file?
> 
> (2) Does the order of elements under <cell> matter? (The output is always 
> alphabetically sorted with current code, reading code doesn't care. It's 
> marginally simpler to write the grammar if the order matters, but both are 
> easily possible.) 
> 
> (3) Are empty class_defaults and particular_cells allowed, or do they have to 
> contain at least one cell each?

Attached are RELAX NG schema (using the more readable Compact Syntax) for .cns 
and .ill files. They only cover the latest version of the format and assume the 
following answers to my questions: (1) required, (2) significant order and (3) 
at least one child <cell> must exist. They're all easily (2) or trivially (1,3) 
changed.

I had a closer look at several RELAX NG tools; in the end, I settled on Jing 
(http://www.thaiopensource.com/relaxng/jing.html, by the same folks as Trang). 
It has the most complete implementation of RELAX NG Compact syntax, the best 
error messages and supports other schema languages too.

Other than Jing, I tried:

1. xmllint — doesn't handle RELAX NG Compact Syntax at all, only the rather 
verbose XML one. 

2. rnv — Compact Syntax only validator, implemented in C. It doesn't recognize 
all of the language (it couldn't handle the "grammar" keyword; fortunately, 
it's optional). It's error messages were either cryptic or amounted to little 
more than, paraphrased, "syntax error" or "invalid value". It didn't even 
provide useful source file locations (the worst offender was that any issue 
inside cell.rnc was reported at illustration.rnc:5, i.e. at the place where 
cell.rnc was included).


I am also attaching an example census.xsd file with XML Schema converted from 
census.rnc. It's rather large (126kB compared to <19kB of .rnc files), although 
not as large as its corresponding RELAX NG XML file (409kB). It's much less 
human-readable than the .rnc files, though. For one thing, it's heavily 
structured, verbose XML, that is inhuman in itself. But to make matters worse, 
Trang doesn't support RELAX NG external references that I rely on. So I had to 
run the .rnc files through jing -s to produce simplified versions without them 
(this is how I ended up with 409kB of .rng file) and convert that to .xsd. This 
simplification step removed (by expanding them) custom data types and 
duplicated the schema parts corresponding to <cell>, making it a poor choice 
for human reading.

The results aren't that bad if the simplification step is omitted — see 
attached illustration.xsd. I had to modify illustration.rnc to produce it, by 
removing the external reference and inlining cell.rnc. That wouldn't be a good 
idea for maintenance, cell.rnc is shared by both .cns and .ill files. If you 
think you'll need nice XML Schema files, then we can either write some custom 
script to merge cell.rnc into the other two files before passing them to trang, 
or to implemented externalRefs support directly in Trang.

Regards,
Vaclav


reply via email to

[Prev in Thread] Current Thread [Next in Thread]