lmi
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [lmi] an xml schema for (single|multiple)_cell_document file XML for


From: Greg Chicares
Subject: Re: [lmi] an xml schema for (single|multiple)_cell_document file XML format
Date: Mon, 27 Feb 2012 15:34:30 +0000
User-agent: Mozilla/5.0 (Windows NT 5.1; rv:8.0) Gecko/20111105 Thunderbird/8.0

On 2010-08-09 16:28Z, Vadim Zeitlin wrote:
> On Sun, 08 Aug 2010 16:28:55 +0000 Greg Chicares <address@hidden> wrote:
> 
> GC> On 2007-12-27 12:43Z, Evgeniy Tarassov wrote:
> GC> > 
> GC> > The newer version of XML Schema files for cns/ill files could be 
> downloaded
> GC> > from lmi project download area at savannah:
> GC> > | http://download.savannah.nongnu.org/releases/lmi/cell_document.tar.bz2
> GC> 
> GC> If we were doing this all over today, would XML Schema still be a good 
> choice,
> GC> or is something else like RELAX NG or Schematron clearly better now?
> 
>  I'm not aware of any dramatic changes in the XML validation area since the
> last 3 years so I'd be tempted to say no, i.e. that XML Schema still
> remains a decent choice because even though RELAX NG has its advantages
> over it (notably relative simplicity) it's still less standard/supported by
> various tools than it. As for Schematron, I believe it's mostly used in
> addition to either XML Schema or RELAX NG and not solely on its own anyhow.
> 
>  I could look more into recent developments in this area but, frankly, I
> doubt that we're going to find any earth shattering revelations. IMHO it
> would make sense to stick with XML Schema even if subjectively I like RELAX
> NG "compact syntax" (http://en.wikipedia.org/wiki/RELAX_NG#Compact_syntax)
> a lot.

I wonder whether we should reconsider that, and perhaps use RELAX NG
instead.

I imagine XML Schema is still more widely supported, but I'm not sure
that matters much to us. We can use 'xmllint' with
  --relaxng schema : do RelaxNG validation against the schema
and distribute an msw 'xmllint' binary to any vendor that needs it
(they all have the ability to run msw).

I think we shouldn't routinely validate files that were created by lmi
itself: they're presumptively valid, so any accidental mistake in the
schema would be a nuisance for end users; and an extra validation step
would make files load more slowly. We should use it for files that are
created by an external vendor system (e.g., admin-system extracts that
supply information for policies already in force). Probably that could
be done by adding an 'xmllint' command to the script that retrieves the
"extract"; if we wanted to embed it in lmi for this use-case, then we
could ask for a RELAX NG capability to be added to xmlwrapp, for which
this public-domain extension might be useful:
  
http://www.ncbi.nlm.nih.gov/IEB/ToolBox/CPP_DOC/doxyhtml/schema_8hpp_source.html

Our practical reason for a schema is to prevent misunderstandings. If a
schema is part of the specification for externally-supplied files, then
validation failure is presumptive failure to fulfill the specification.
For example, if an element such as
    <Gender>Male</Gender>
is required but missing, a schema can detect that; and it can also flag
    <Gender/>
or
    <Gender>None</Gender>
as invalid if allowable values are given as {"Female","Male","Unisex"}.
We wouldn't need the complex rules lmi enforces, like
  if <StateOfJurisdiction> is "MT" then <Gender> must be {"Unisex"}
; such rules are manifold and complex, and in practice unlikely to be
violated.

For now, only '.cns' and '.ill' files need be covered. Obsolete historical
versions needn't be supported; we'd want schema support only for current
and future versions. (Note that '.cns' files have a container-version and
an item-version:
  <?xml version="1.0"?>
  <multiple_cell_document version="1">  <!-- CONTAINER OF CELLS -->
    <case_default>
      <cell version="6">                <!-- CELL ITEMS -->
which we would expect to increment separately).



reply via email to

[Prev in Thread] Current Thread [Next in Thread]