axiom-developer
[Top][All Lists]

## [Axiom-developer] crystal and the semantic web

 From: root Subject: [Axiom-developer] crystal and the semantic web Date: Wed, 31 Dec 2003 00:35:16 -0500

Bill,

I realize that I'll have to "sell" the crytal idea as it is new
and unknown. I've been trying to organize Axiom in various ways
to make it more useful in the long term and this idea "crystalized"
so I'm going to run with it for a while. Feel free to ignore me or

More shower committee discussion leads to the following, somewhat
more "grounded" thoughts:

COLLECT COMPILER INFORMATION

Suppose we build a library of tools whose only purpose is to present
a slice of information.

In particular, suppose we "instrument" the compiler so that it will
give us data about a given domain like:

the domain name
the domain signature
the list of exports
the list of "required" domains
etc

then we can use the compiler to automatically build databases about
the algebra. Axiom already does this with spad code (in the NRLIB
dirs). We can instrument the boot compiler in the same way.
We can even instrument the lisp compiler to dump information.
Once these databases exist and are automatically maintained we can
automatically build cross-reference and index tables. This information
can easily be gotten by using the "asq" command line program which
currently exists and queries the databases. Most of this machinery
already exists at the spad level but needs to be built at the boot
and lisp levels.

A simple crystal facet would look at each of these lists (eg the list
of domain signatures). Related simple facets could (a) select a single
domain to inspect or (b) select domains by regular expression on the
signatures or (c) show the hierarchy of domains using this domain or
(d) show the hierarchy of domains used by this domain or (e) show the
exports, etc.

Notice that all of this information is available from the compiler
databases and can be extracted by an asq command line program.
so we get
src -> compiler -> databases -> asq -> specific facet display
and since this process is driven from the facet we can merge
the crystal front end with a makefile middle and a database back end
so we get:
special information  -> makefile -> specific facet display

So where does that get us? Well, now we have a pretty little browser
that can walk all over the algebra. Useful but not very novel. The
key flaw is that it doesn't connect to anything except machine-readable
code. It does have the feature that we can write makefile-style
connectors between the database support code (asq) and the facets.
In fact, what defines a facet is two pieces of information:
the data that an asq command can extract
the relationship of commands required to construct facet data

LATEX STRUCTURE INFORMATION

Clearly we want more than just a fancy code-browser. Another need
is to connect the latex documentation to the code. Currently we
walk over a pamphlet document with noweb to create a stream which
can extract code from the document. And we have a higher level
function (booklet) that can assemble the pamphlet before extracting
information.

We can easily create a "notangle" facet to show the code, a
"noweave" facet to show the latex (possibly in an editor like texmacs),
and a dvi viewer to see the tex output facet. We can even construct
a "booklet" facet that connects parts or whole pamphlets.

But this leaves several huge and important gaps. Pamphlet files have
several semantically different sections. We need to recognize, parse,
and treat these sections differently.

For example, in src/algebra/dhmatrix.spad.pamphlet we describe the
Axiom code with long sections of mathematics. We would like to
connect the mathematics with the axiom algebra in more than an
incidental way. Having them in the same pamphlet file makes at
least a marginal connection but we need to do more.

If we look at the technologies available we could use several.
The easiest technique is to use latex tags such as \ref and \index
to mark terms that can be added to a database during search. More
generally we could invent tags that give us some connection between
the english text and the embedded algebra.

The next technique is to try to "read" the actual text by machine.
There are also natural language parsing technologies (e.g. a chart
parser) that will give us some minimal information. I don't think
the technology is good enough to be useful. I have used this idea
in the past to read english text, form a "concept" of the sentence
and automatically classify the concept in a semantic network. This
would be easier for mathematical sentences rather than general
english as you could pre-populate the semantic network with appropriate
concepts from mathematics.

The next technique is to require additional tags in the document.
This approach amounts to decorating the latex document with
type information. For example, instead of:
$x+y$
we could add the axiom types:
$\type{x}{UnivariatePolynomial} \type{+}{UnivariatePolynomial} \type{y}{UnivariatePolynomial}$
which we could use to connect back to Axiom. This assumes that we
know the Axiom types which might not be true for most mathematical
expressions encountered in text. Perhaps this might be a useful
place to consider embedding OpenMath expressions.

Another technique is to try to constrain the written expressions.  We
can require that Axiom be able to parse expressions in pamphlets. That
way we can use the compiler to tell us what types are in the text and
how they connect to Axiom. This lends itself to automation with the
current tools. It also constrains the kinds of expressions you can use
in documentation to well-formed expressions.  A fair number of math
expressions in books are not well-formed.

The struggle here is to figure out a way to move from the latex math
to the axiom math and back again. Literate programs are a weak but
widely available technique.

At a different semantic level, that of the whole document, we can use
bibliographic references to connect the documents. This could range

SPECIAL INFORMATION STRUCTURE

I'm advocating building an information source (database is really a
poor choice of words). It should be searchable using a semantic query
(parse and form a concept, "classifying" the concept and returning
nearest-neighbor concepts), using hierarchical structure (standard
hierarchical databases) and by keyword/grep/hashtable kind of
walks. Graph-relationship walks (like nearest-neighbor) should also be
possible.  Standard relational queries are the least interesting way
to search this kind of information. I see this as a highly linked
graph object (a ball of string) with pointer tables to support other
views.

GENERAL MATHEMATICS STRUCTURE

Moving away from the specifics of a particular latex document we can
ask about facets which are "structural". We'd like to be able to see
the overall structure of parts of the world, usually in some sort of
a lattice.

Lattice-style facets can be constructed (by machine) that detail the
inheritance hierarchy in Axiom and show the relationship of types and
categories.  Lattice-style facets can also be constructed (by hand)
that show the relationship of the mathematics covered by the system.
Links could be constructed (again, by hand) that relate the real
concept (e.g. Ring) to the Axiom concept (e.g RING). If these were done
as semantic network "concepts" you could walk about the system using
the lattice. The various lattices could form the skelton of a concept
framework in a semantic network (our ball of string above).

Indeed, the CATS (Computer Algebra Test Suite) eventually must extract
some sort of taxonomic organization of the mathematics used in the
test suites. This would be very similar to the NIST taxonomy for
numeric suites and would provide a concrete connection between the
mathematics and the available system functions.

PERSONAL FACETS

Eventually all of this machinery is justified by the need to organize
a very large (I'm assuming a 100x expansion of math in axiom) pile of
information so it can be used by one or a group of mathematicians to
do research.

It needs to be able to accept dynamic changes (assume hundreds of
papers are published as literate programs every year and become
available at the rate of one per day) without human intervention.
I'm assuming that literate programs get published and become
available to global computational math research pipeline which can
be reviewed and searched.

Crystal needs to be able to have several "research" threads available
at any given time. That is, I need to be able to highlight, remember,
collate, and organize existing work in multiple areas at the same
time. I want to be able to have several "papers" in progress and I
want Axiom to remember that I've visited certain areas of math
(e.g. crytography using group theory). I want the system to strip
out this part of math and Axiom and present it to me on one or
several facets so I can work on it. Eventually I want MY Axiom to
"know" (in some mechanical sense) what kinds of math I find interesting,
to search for literate papers related to that kind of mathematics, and
to extend the system to cover published results by selecting and embedding
literate programs into my copy of Axiom.

The more I work with my copy of Axiom the more specific it should become
for me. The internal semantic network should eventually develop very
clear, specific meanings (concept clusters) for terms and ideas that
are important to me. Facets should be able to be dynamically created
based around these dynamically developed concept clusters.

(As an aside eventually this clustering of concepts makes it difficult
for two Axiom systems to communicate. Consider an example of an electrical
engineer and a computer programmer. Both use the concept of a "register".
Eventually the EE's system has a concept cluster around the term "register"
that includes ideas of NAND gates, transistors, CMOS, ECL, etc. Eventually
the CS's system has a concept cluster around the term "register" that
includes ideas of register windows, base registers, stacks, etc. Notice
that the same "word" has completely different meanings to the two systems.
They will eventually become so specific that the concept nets cannot be
merged. Perhaps this is why it is so hard for geeks to talk to women :-) )

Well, I've wandered off into the weeds and it's time to work on something
useful.

t