[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Fenfire-dev] PEG: TString schema

From: Tuomas Lukka
Subject: [Fenfire-dev] PEG: TString schema
Date: Tue, 30 Sep 2003 14:29:47 +0300
User-agent: Mutt/1.5.4i

PEG refstring_dtd--tjl: A DTD for refstrings

:Author:   Tuomas J. Lukka
:Last-Modified: $Date: 2002/11/14 15:40:07 $
:Revision: $Revision: 1.5 $
:Status:   Current
:Affects-PEGs: alph_lite--tjl

With Alph lite, we need to stabilize at least that data format.
There are several problems with the current XML format:

- for RICC (URN5) text spans and fake text spans, the actual
  text is not written into the XML inside them. This would be

- the element names are less than clear


- What is the name for this DTD? RefString is what it started
  as, but later it was realized that these are *not* referential
  strings but *idded* strings. 

    RESOLVED: Transcludable String, or TString for short.
    Spans are Transcludable Spans or TSpans 

- Should we have an element that surrounds a whole TString? 
  What about elements for fake spans?
    RESOLVED: We should have an **optional** surrounding
    element, to allow easy integration in different ways.  
    Using elements for fake spans is pointless
    as they are best modeled by plain strings: consider::


    In *all* semantics, these two lines should be equivalent.

- How should we define the TString DTD/Schema? DTD or Schema or other?

    RESOLVED: XML Schemas seem the best option, due to proper namespace
    support &c.

- What should be the URI for use with XML namespaces?

    RESOLVED: The URI should be, analogous to the RDF vocab

The Transcludable String XML DTD

Define a Transcludable String XML schema as follows::

    <schema xmlns="http://www.w3.org/2001/XMLSchema";

      <documentation xml:lang="en">
        Transcludable String schema v1.0.

     *    Copyright (c) 2003, Tuomas J. Lukka
     *    This file is part of Alph.
     *    Alph is free software; you can redistribute it and/or modify it under
     *    the terms of the GNU Lesser General Public License as published by
     *    the Free Software Foundation; either version 2 of the License, or
     *    (at your option) any later version.
     *    Alph is distributed in the hope that it will be useful, but WITHOUT
     *    ANY WARRANTY; without even the implied warranty of MERCHANTABILITY
     *    or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU Lesser General
     *    Public License for more details.
     *    You should have received a copy of the GNU Lesser General
     *    Public License along with Alph; if not, write to the Free
     *    Software Foundation, Inc., 59 Temple Place, Suite 330, Boston,
     *    MA  02111-1307  USA

     * Written by Tuomas J. Lukka

     * Designed by Tuomas J. Lukka and Benja Fallenstein

    <element name="tstring" type="alph:TStringType"/>
    <element name="tspan" type="alph:TSpanType"/>

    <complexType mixed="true" name="TStringType">
            <documentation xml:lang="en">

                A transcludable string, consisting of transcludable spans
                and also text content (which will not be 

                This is just a container element - the magic is in the spans.

            <element ref="alph:tspan" minOccurs="0" maxOccurs="unbounded"/>

    <complexType name="TSpanType">
            <documentation xml:lang="en">

                A transcludable span.

                Transcludable spans are spans of text that identify themselves
                through a URI and an offset.

                Basic model

                The basic model for TSpans is that there exists a single, 
unique block
                of letters denoted by the URI, and a TSpan contains a 
                span of letters from that block.

                However, to allow practical, non-centralized implementations,
                the restrictions are relaxed: the ids only need
                to be unique *with a high probability*.

                Creating TSpans

                There are two possible situations for creating TSpans: 
                creating tspans from text being typed in by a user, 
                or creating TSpans from text that already exists somewhere.

                Creating TSpans while the user types

                For the URIs, we recommend "urn-5" random IDs, or UUIDs.
                The TSpans can be generated by creating a single random id
                for the entire session and simply increasing the current offset
                by one whenever the user types a new character.

                In the resulting text, adjacent length-1 spans that have
                contiguous ids should be combined.

                Creating TSpans from text that already exists somewhere

                This is a more difficult situation, as this is a case of adding
                extra information where there used to be none. If two people
                separately do this to the same text, it can happen that 
                will not be found.

                If the text is stable and unique, we recommend using some 
                URI scheme, such as urn:sha-1 or urn:x-storm, or a permanent 
                identifier for exactly those characters, if that exists.

                If the text is changing, **in no case** should something like
                the URL of a webpage be used for the URI, as this will cause 

                Editing operations

                TSpans should never be edited except by splitting or by
                removing: changes to the text inside the span are not permitted.
                For inserting text, split the span first, then insert the text
                between the spans. For removing text, split the span 
                and remove one of the resulting spans.

                The span-splitting operation works as follows: a TSPan with uri 
X offset Y, 
                and N characters of content, 

                    (tspan uri="X" offs="Y")N chars(/tspan)


                    (tspan uri="X" offs="Y")S chars(/tspan)(tspan uri="X" 
offs="Y+S")N-S chars(/tspan)

                for some S between 0 and N, exclusive.

                Identifying transclusions

                (Regions) spans are considered to be transclusions of each 
other, if
                the URI attributes match exactly and the text with the same 
offset match.

                The simplest way to explain the idea of "same offset" is to 
split both spans
                to one-character spans: the offsets in the resulting spans will 
be consecutive,
                and if **all** the one-character spans with the same offsets 
                the two spans *overlap*.  If even one one-character span does 
not match,
                the spans will not be considered overlapping.


                The tspan element is defined through TSpanType in order to allow
                other elements to take on this type: for instance, SVG ignores 
                inside "foreign elements" unlike HTML, where the default is to 
show it.
                In HTML, using tspan thus works out all right, but in SVG the 
                would not be shown. The solution is to use the alph:uri and 
                attributes on the SVG span element.


                The idea of TSpans is to provide a simple way to get some of 
the benefits
                of Referential Fluid Media (see Nelson, "Xanalogical structure, 
                now more than ever: parallel documents, deep links to content, 
deep versioning, 
                and deep re-use", ACM Computing Surveys, 31(4es), 1999) by 
providing an *identity*
                for text.

                TSpans carry their own content and thus need no central servers
                to "resolve" the text from, and can be added to normal 
                with minimal effort.

            <extension base="string">
                <attribute name="uri" type="anyURI" use="required"/>
                <attribute name="offs" type="nonNegativeInteger" 


reply via email to

[Prev in Thread] Current Thread [Next in Thread]