[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [lmi] an xml schema for (single|multiple)_cell_document file XML for
From: |
Greg Chicares |
Subject: |
Re: [lmi] an xml schema for (single|multiple)_cell_document file XML format |
Date: |
Mon, 27 Feb 2012 15:34:30 +0000 |
User-agent: |
Mozilla/5.0 (Windows NT 5.1; rv:8.0) Gecko/20111105 Thunderbird/8.0 |
On 2010-08-09 16:28Z, Vadim Zeitlin wrote:
> On Sun, 08 Aug 2010 16:28:55 +0000 Greg Chicares <address@hidden> wrote:
>
> GC> On 2007-12-27 12:43Z, Evgeniy Tarassov wrote:
> GC> >
> GC> > The newer version of XML Schema files for cns/ill files could be
> downloaded
> GC> > from lmi project download area at savannah:
> GC> > | http://download.savannah.nongnu.org/releases/lmi/cell_document.tar.bz2
> GC>
> GC> If we were doing this all over today, would XML Schema still be a good
> choice,
> GC> or is something else like RELAX NG or Schematron clearly better now?
>
> I'm not aware of any dramatic changes in the XML validation area since the
> last 3 years so I'd be tempted to say no, i.e. that XML Schema still
> remains a decent choice because even though RELAX NG has its advantages
> over it (notably relative simplicity) it's still less standard/supported by
> various tools than it. As for Schematron, I believe it's mostly used in
> addition to either XML Schema or RELAX NG and not solely on its own anyhow.
>
> I could look more into recent developments in this area but, frankly, I
> doubt that we're going to find any earth shattering revelations. IMHO it
> would make sense to stick with XML Schema even if subjectively I like RELAX
> NG "compact syntax" (http://en.wikipedia.org/wiki/RELAX_NG#Compact_syntax)
> a lot.
I wonder whether we should reconsider that, and perhaps use RELAX NG
instead.
I imagine XML Schema is still more widely supported, but I'm not sure
that matters much to us. We can use 'xmllint' with
--relaxng schema : do RelaxNG validation against the schema
and distribute an msw 'xmllint' binary to any vendor that needs it
(they all have the ability to run msw).
I think we shouldn't routinely validate files that were created by lmi
itself: they're presumptively valid, so any accidental mistake in the
schema would be a nuisance for end users; and an extra validation step
would make files load more slowly. We should use it for files that are
created by an external vendor system (e.g., admin-system extracts that
supply information for policies already in force). Probably that could
be done by adding an 'xmllint' command to the script that retrieves the
"extract"; if we wanted to embed it in lmi for this use-case, then we
could ask for a RELAX NG capability to be added to xmlwrapp, for which
this public-domain extension might be useful:
http://www.ncbi.nlm.nih.gov/IEB/ToolBox/CPP_DOC/doxyhtml/schema_8hpp_source.html
Our practical reason for a schema is to prevent misunderstandings. If a
schema is part of the specification for externally-supplied files, then
validation failure is presumptive failure to fulfill the specification.
For example, if an element such as
<Gender>Male</Gender>
is required but missing, a schema can detect that; and it can also flag
<Gender/>
or
<Gender>None</Gender>
as invalid if allowable values are given as {"Female","Male","Unisex"}.
We wouldn't need the complex rules lmi enforces, like
if <StateOfJurisdiction> is "MT" then <Gender> must be {"Unisex"}
; such rules are manifold and complex, and in practice unlikely to be
violated.
For now, only '.cns' and '.ill' files need be covered. Obsolete historical
versions needn't be supported; we'd want schema support only for current
and future versions. (Note that '.cns' files have a container-version and
an item-version:
<?xml version="1.0"?>
<multiple_cell_document version="1"> <!-- CONTAINER OF CELLS -->
<case_default>
<cell version="6"> <!-- CELL ITEMS -->
which we would expect to increment separately).