[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Gzz] an over/review of bootstrap-OHS work: data models and interoperab
[Gzz] an over/review of bootstrap-OHS work: data models and interoperability
Thu, 27 Feb 2003 18:35:21 +0200 (EET)
As promised a while ago on irc, here's a brief overview or review of
potentially relevant work that I've found from bootstrap.org and looked
First a little background, then about the requirements for systems that
have been specified there and a look at implementation candidates. In the
end I try to conclude some thoughts about interoperability in the future.
Background: the line of work of mr. Engelbart
Douglas Engelbart is mostly known as the inventor of the mouse, but also
and more substantially for the whole hypermedia-groupware system around
it, called NLS (for oN-Line System), which was in use by the early '70s.
So along with Bush's earlier work on Memex and Nelson's quite simultaneous
work on Xanadu, he is a true pioneer and may be credited for the
implementation of the first computer-based hypermedia system (as Memex was
not computer based and Xanadu was not implemented in the late '60s, right?)
if interested, the career is outlined in
It should be noted, that besides deep technological inventions mr.
Engelbart has also focused on the human side since the beginning, talked
about the co-evolution of the human-tool system etc. More about this
framework for augmentation (including the idea of bootstrapping) in
Starting in the '70s the world developed quite differently: e-mail became
popular among Arpanet users and is now the dominant collaboration tool.
The integrated nature of NLS was lost there, so we live in this
disintegration. I guess Engelbart and Nelson both share, if not similar,
anyhow disappointment for the unsatisfying situation we have arrived at.
More recently, in the 1980s mr. Engelbart was working for the air
industry, where the organisation huge document repositories is essential.
By then time NLS evolved to a new system, Augment. In 1989, he and his
daughter founded the Bootstrap Institute which has since been the home for
designing a prototype open-hyperdocument system (OHS, but different from
what the open hypermedia ppl later started to refer as open hypermedia
Now bootstrap.org is the Bootstrap Alliance, which shares the vision that
solving the complex global problems in the world require
unprecented cooperation and tools to support that. The vision and mission
are defined in http://bootstrap.org/ba/index.jsp
My understanding is that they are now devoted to free software development
and that there are, albeit perhaps great differences in the background and
approach compared with the work of mr. Nelson and the ex-gzz project,
fundamental similarities in the goals too. So a closer look at that
Requirements / specifications for the open hyperdocument system (OHS)
"The Open Hyperdocument System (OHS) is a standards-based, open source
framework for developing collaborative, knowledge management applications.
Its primary objective is to support the creation, organization, and
maintenance of Dynamic Knowledge Repositories (DKR)." .. so begins the
explanation of what the OHS is at http://bootstrap.org/ohs/index.jsp
I haven't been able to find actual complete specification, and don't know
if one exists, but the plan at http://bootstrap.org/augment/BI/2120.html
is quite concrete.
There are several steps, of which first is the HyperScope that would be
used to enhance existing systems. There the idea is to have a sort of a
proxy that transfers legacy documents (e.g. Word docs) to intermediary
files (I-File), which (are XML and) can be viewed with e.g. as HTML with a
Web-browser. I imagine you guys can't think of anything more boring than
that, but let's look at it a bit closer anyhow, ok? Further requirements
are high-resolution addressability (granularity) & having different views
-- nothing new to you there, no problems either I guess? Then there's more
about legacy files&systems that I'll skip here. Back-Link Management is,
again, known and addressed by you. So for this 1st phase, there is
basically a lot of work with legacy systems and file formats that I've
understood is not in your focus, but otherwise I'm unable to find
requirements that Fenfire (here: Storm and Alph, and the I-Files specified
in RDF?) wouldn't meet. Of course I might well be wrong (inaccurate).
In phase 2, Hyperscope would be developed into a standalone user interface
providing "a basic range of functions for moving, viewing and editing"
<http://bootstrap.org/augment/BI/2120.html#3B>, with archiving, version
control etc. Sounds to me what you have been doing all along, except that
there is the requirement for enabling interlinking material both in the
system and in legacy files. Here Engelbart emphasizes the importance of
community development, <http://bootstrap.org/augment/BI/2120.html#3E>
The technical side of Phase 3 is about supporting multiple UI options,
which I think Fenfire again enables, and then there is more about the
human side (teams).
So this is where I found myself last autumn: knowing that would be
collaborating with the bootstrap-OHS from the beginning of 2003, and
seeing (perhaps in error, but still haven't noticed :) that the Gzz
project was working to meet similar goals. There is a great difference in
the attitude towards legacy systems, but as we hopefully see in the
following, it might not matter that much.
OHS candidate implementations
One central thing has been implemented and is in regular use: the purple
numbers, which simply are Augment-style addressing in HTML by having a
number after each element (e.g. heading, paragraph) which are links to
that elements URIref, see http://bootstrap.org/ohs/research.jsp#nid026 for
links to the software.
More fundamentally and coming closer to the recent decisions on the new
structure for Fenfire, there are data modelling languages that have been
proposed as standard for hyperdocuments (i.e. the I-Files):
http://www.eekim.com/ohs/papers/graphmodel/ demonstrates graph-based
models, such as RDF, concluding that SGML/HyTime Groves is the best
candidate. I haven't found the reasoning behind that, other than that Word
documents have been succesfully represented with Groves. Anyhow, Groves
is proprietary (and Fenfire has set on RDF). Oh Benja noted on irc about
this paper that the RDF critique is related to the serialization format,
i.e. when it's said that the syntax is too complicated for general use
<http://www.eekim.com/ohs/papers/graphmodel/#hid5B>. So when using editors
(or some other serialisation/syntax) this criticism does not apply.
Otherwise granularity / size&meaning of nodes is identified as an issue.
NODAL (http://nodal.sourceforge.net/) is what I've studied most closely,
as it's a recent and complete specification with some beginnings of an
implementation too. NODAL is a 'Network-Oriented Document Abstraction
Language' or a 'Filesystem for Ubiquitous Collaboration', as the white
paper is titled, <http://nodal.sourceforge.net/NODAL-WhitePaper.html>.
That paper is quite good a review of Engelbart's work and the problems of
the computer world today (like the *tyranny of format*,
http://nodal.sourceforge.net/NODAL-WhitePaper.html#h7A), in such a
definite tone that even Nelson might approve? We might well agree on what
the barriers are, so I try to summarize the actual system here:
- designed as client-server, can be implemented as peer-to-peer
- proposed as a standard for implementing Engelbart's OHS
- a document modelled as a graph of typed nodes (w/ (de)serialization)
- nodes or contents adressable within doc. context or via GUIDs
- plain text = sequence of character strings, each a *line* in the doc
- security & privacy inherent, sharing under user's control
- NODAL apps use the API, may also work as a simple file system
data model: <http://nodal.sourceforge.net/NODAL-WhitePaper.html#d16>
- individually addressable, typed nodes in a document graph
- *atomic* objects (typical primitives such as int, char, float .. name)
- *nodes* are collections of objects, w/ interf. to accessing components
- *facets* represent restrictions of value ranges of atomic types
- node types:
* struct: a set of mappings from names of fields to data types
* sequence: an ordered set of values of like type (e.g. string)
* map: unordered mapping of values of one data type to values of another
For direct addressing, every object is addressable via a URI, which is
formed by associating a unique node identifier (NID) for all in a
repository, for what there is a URL in the example
For relative addressing, there are three basic path operators -- one for
each node type. For structs, the *field* operator returns the value of the
named field for that node. For sequences, the *index* operator returns the
value of the nth element in the sequence. For a map node, the *select*
operator returns the value associated with the given value in the map.
To implement security and privacy, there is the notion of user identity
encapsulated in a User object which is identified within a repository.
Permissions are associated not with documents, but with all the actual
nodes. The minimum set of capabilities is: visibility, content access,
pedigree access, editing, permission management. Permissions can be
grouped to roles. The details of this design are yet partly open.
For editing, there are again three interfaces, one for each node type.
A cursor represents context: the particular node, user state, document
through which the node was accessed, path from document's root node and an
optionally date. The data mutation interfaces (for editing) are accessed
via the cursor interface, to be able to check permissions and maitain an
audit trail, which is called *pedigree*.
A transclusion is expressed with a *proxy node*, which refers to a node in
a different repository. Links are built with the *range* type, "which
represents the collection of nodes between two paths that are the
incluside end-points of the range" (dind't fully comprehen that yet).
Document types are identified with MIME types. Further, each such type is
associated with a NODAL schema with serialisation and de-serialisation
methods, supported by a system of server and client-side plugins.
Communication models (i.e network protocols):
- storage interface for accessing, creating and modifying content
* a depth argument: value -1 represents node's full subgraph
- transactions: sequences of operations (atomic unit of change)
- queuable request/update requests instead of RPCs that block
As examples of data models in the NODAL language, XML and image are
And there is the cvs <http://sourceforge.net/cvs/?group_id=29777>,
I couldn't make much of it though (interfaces do seem to be there).
So much for NODAL. There are related research efforts linked from
'What's in it for us?' (i.e. conclusions and future work)
If you have read this far, you might have wondered why I'm writing all
this here. Hopefully there have been interesting points, too. Trying to
gather some here:
Even though there may be many things in OHS related work that are outside
the scope of Fenfire, they may have addressed issues that you(/we) haven't
yet. For example permissions and possible need for transactions are
something at least I don't know too much about regarding (Storm, I guess).
Was there something more fundamental..?-o
But more so, it seems to be that Fenfire addresses and has solved
promisingly many central issues that keep repeating in the ba-OHS
documents. OTOH there's an unending amount of work ahead.
Perhaps there, around the Bootstrap Alliance and OHS developers, is a
community that has exceptional mindshare and interest towards these
systems and that has developed a strong vision about them. So they might
be a good community that could evaluate Fenfire products both analytically
and through severe testing. And hopefully there would be talented
developers, maybe associated with right kind of organizations, that could
help in getting it all further.
There has been frustration among potential independent OHS contributors
that nothing has really happenned there, also due to license issues (a
recent thread http://bootstrap.org/lists/ba-ohs-talk/0301/msg00010.html).
OTOH there are people working in Stanford to develope the system, who are
'coming out' at some point (working on XML based I-File specs right now).
So I hope the different efforts will be somehow in concert for the systems
to interoperate in the future. The fact that Fenfire uses RDF probably
makes it easier, but is not enough, as things will have to be built on
All that said, the job I had to collaborate with the OHS ppl and within
what I've been evaluating Fenfire for those purposes, is coming to and end
now (at the end of February). So starting next week, I'll have to evaluate
using my time on different basis. The responsibilities for the university
are mainly teaching, but otherwise I'll try to focus on research still.
Personal studies and hopefully writing thesis etc. will hopefully play a
part. Otherwise I'll be involved in implementing prototype production
systems (for the Airguitar World Championships, organized by the local
music video festival, http://www.omvf.net :) for which we have some
funding now .. that will probably Zope-based. So I don't know yet to what
extent I'll be involved with the OHS, or even Fenfire, efforts in the
near future (the coming spring/summer). Further along, hopefully a lot.
(sorry for the length, mistakes and misunderstandings, fuzzyness etc.)
- [Gzz] an over/review of bootstrap-OHS work: data models and interoperability,
Alatalo Toni <=