bug-gnu-pspp
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: PSPP-BUG: [bug #40864] Implement machine-parseable data definition f


From: Müller , Andre
Subject: Re: PSPP-BUG: [bug #40864] Implement machine-parseable data definition format
Date: Thu, 23 Jan 2014 18:04:54 +0000

Dear Ben, all,

as announced, I have by now written a converter to DDI-2.5 (that's DDI 
Codebook) XML.
Thus, I can now provide the spec on how to write DDI-2.5 that validates.
An example file is attached.

The skeleton looks like this:

<?xml version="1.0" encoding="UTF-8"?>
<codeBook ID="ZA2141_v1-1-0"
        xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance";
        xsi:schemaLocation="ddi:codebook:2_5 
http://www.ddialliance.org/Specification/DDI-Codebook/2.5/XMLSchema/codebook.xsd";
        xmlns="ddi:codebook:2_5">
        <stdyDscr>
                <citation>
                        <titlStmt>
                                <titl/>
                        </titlStmt>
                </citation>
        </stdyDscr>
        <dataDscr>
                <var ID="v1" name="v1">
                        <labl>STUDY NUMBER</labl>
                </var>
                <var ID="v2" name="v2">
                        <labl>EDITION NUMBER</labl>
                        <catgry missing="N">
                                <catValu>1</catValu>
                                <labl>PRELIMINARY EDITION</labl>
                        </catgry>
                        <catgry missing="N">
                                <catValu>2</catValu>
                                <labl>1ST CODEBOOK EDITION - release as of May 
2, 2007</labl>
                        </catgry>
                </var>
        </dataDscr>
</codeBook>


Notes:
- The <stdyDscr>... bit must be included as the <titl/> tag needs to be 
present, even if empty.
- All metadata go to the <dataDscr> section. 
- Var ID could be legally filled with the Variable's Index, effectively 
numbering it.
- Missings are defined as discrete values in the optional <catgry> section. 
  (The column's data entries have to be checked against the SPSS style missing 
range definition,
  that's not necessarily labeled data. An empty label field ought to be legal.)
- the following characters are disallowed outside CDATA sections and need to be 
replaced:
        > to &gt;
        < to &lt;
        & to &amp;
        ' to &apos;
        " to &quot;

Hope that helps,
Andre Müller


Attachment: Example-DDI2_5.xml.gz
Description: Example-DDI2_5.xml.gz


reply via email to

[Prev in Thread] Current Thread [Next in Thread]