[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Gnu-arch-users] Re: doc formats (Miles' Awiki overview request)
From: |
Thomas Lord |
Subject: |
[Gnu-arch-users] Re: doc formats (Miles' Awiki overview request) |
Date: |
Sun, 22 Jan 2006 12:01:34 -0800 |
Miles:
> Do you have a short overview of "awiki" somewhere?
No. So here's a crude one for an audience of hackers (i.e.,
not the way you'd teach it to a non-hacker).
* Docs are Trees of Typed, Attributed Nodes
Like DOM.
XML has a fully general but heavy syntax for such nodes:
<TYPE ATTRIBUTE=VALUE ...> ...subtrees... </TYPE>
A given Awiki grammar is a more baroque, equally general
syntax for such nodes.
* Recursive Decomposition
The Awiki parser engine makes multiple passes. The first
pass over a document fragment determines the type of the
root node for that fragment, the attributes for that root
node, the *sources* for the subtrees of the fragment, and
the parsing rule to apply to that source to generate
subtrees.
For example, this section would be parsed in the first pass
to produce:
.example
type: section
attributes: none specified
title source: "Recursive Decomposition"
body source: "The Awiki parser [...] `section' node."
Appropriate parsing rules are then applied to the sources
to generate the subtrees of the resulting `section' node.
* Grammar Abstraction
Awiki recognizes that one thing is a parsing rule like:
Tree nodes are divided by [a certain number of] asterisks
in column 0 [....]
and another thing is what kind of tree we mean:
The "divide by column 0 asterisks rule" is used to divide
(sub)sections.
With one exception (see below) there are fixed number of built-in
parsing rules and those rules are reusable for many different
mappings to trees. In one case, the asterisks rule might divide
stuff up into `<section>' nodes. In another case, the asterisks
might divide stuff up in a completely different way (e.g., `<element>'
nodes in a grammar specialized for documenting the periodic table.)
The association between a parsing rule and which rules to apply to
subtree sources is variable. Normally, the asterisks rule would
next parse the source of the body text with, say, the "paragraphs and
subsections" rule but a grammar could say otherwise.
* Specific Parsing Rules
What you might want in an overview is the more conventional kind
of:
*bold* => *bold*
_italics_ => _italics_
kind of table.
I don't have such a table for you and, in truth, where I left off I
was still playing around to find a nice mix of defaults.
More interesting are the handful of general principles that apply
across all rules and, alas, I don't have those written up either.
The rules I was working on doing things like avoiding a need for
quoting, allowing arbitrary nesting, avoiding gratuitous whitespace
dependencies, etc. The general thrust is to make simple things
look completely natural and to make complex things easy to get
right. One example:
Let's suppose that `/foo/' means `<emphasize>foo<emphasize>'.
What if I want to emphasize the phrase `and/or'? Well, ok,
that's handled by whitespace rules so '/and/or/' parses just
fine if surrounded by whitespace or punctuation. What if you
want to emphasize part of an already emphasized text (nesting)?
I think repetition is a good disambiguator, at least to a
certain depth: `//the key thing is to be /really/ careful//'.
Of course that gets ugly if taken too far but, mostly one
never needs to take it that far.
My little (nascent) set of rules like that add up to something
that (a) casual users don't really have to know in depth and,
anyway, (b) is simple enough you could teach it in a class on
touch typing as an extension to general rules for technical/business
typing.
* Error Propogation
If some source (sub)text just doesn't parse then the source itself
becomes the contents of a simple `<error>' node.
Grammars can say, of a given node type, whether the type does or
does not tolerate `<error>' nodes as subtrees. So, for example,
an error-tolerant `<section>' node might contain an unparsable
`<error>' subtree in lieu of a paragraph -- rendering could display
the section mostly normally but show the errant non-paragraph in
raw-source form. On the other hand, if a node can't tolerate error
subtrees, then the outer parse reverts and the whole thing becomes
an error -- instead of a `<bibliography-entry>' node in which some
element of the entry is unparsable, the whole thing would be replaced
with an `<error>' node.
That's about it. The art is in two places: the grammar abstraction
and the handful of principles for writing new parsing rules. Oh and:
* The Escape
At the "leafs" of the grammar you can escape into entirely
different parsing techniques. For example, if an Awiki grammar
were used as the input language for a mathematica-like system
then certain sub-sources might be parsed by a conventional LALR
expression parser for mathematical expressions.
-t
[Prev in Thread] |
Current Thread |
[Next in Thread] |
- [Gnu-arch-users] Re: doc formats (Miles' Awiki overview request),
Thomas Lord <=