|
From: | Tim Daly |
Subject: | Re: [Axiom-developer] Design Thoughts on Semantic Latex (SELATEX) |
Date: | Thu, 25 Aug 2016 06:17:10 -0400 |
at other areas of interest. The long term goal is to parsethe latex->axiom text available. Future test files will looktokens. They will be added as needed. The idea is to keepto disambiguate the input. The current file has no selatexImplicit is the idea that weaver will use the selatex tokensThe goal is to transform the latex into Axiom.Each line is a call of the formcontaining 620 integrals.http://axiom-developer.org/axiFor those of you at home wishing to play along, there is aselatex.test1 file atom-website/selatex.test1
weaver(latex-string,axiom-string) the problem simple by adding print-invisible sematics to thelatex-string. In the ideal case the weaver program is trivial,as is the markup. Any tradeoff should prioritize simplicity.Another priority is to align the semantic markup withAxiom domains in order to ground the semantics with code.Once all of these calls translate correctly the Axiom outputroutines need to output the latex-string with the addedsemantic markup so the mapping is bi-directional.The current file only looks at integration as I already haveNIST/CRC/etc formulas.TimOn Sun, Aug 21, 2016 at 11:02 PM, Tim Daly <address@hidden> wrote:associated with these invariants. Clearly they exist somewhere.It would be a vast improvement if there were algorithmsTables I landed on page 219, section 3.4.1.2 Graph Invariants.Dan,While paging through the CRC 31st Standard Mathematical
To "cross the gap" between tables and computational mathematicsit would be valuable to include implementations of these invariants.It is hard to walk away from that page. An Axiom implementationwould be fun to write, especially given the next section that listsdifferent kinds of graphs which, presumably, would all have theinvariants. Even better, the graph algorithms are likely goodcandidates for proof technology (ACL2 if done in Lisp, COQ ifdone in Spad). Lisp has the advantage of an ANSI standard.It seems worthwhile to take sections like this, expand themacross computational and proof tools, and publish them in aform that is generally useful. It is "nice to know" that a graphhas a radius but it would be even better if I could "just point andclick" to import the algorithm.Axiom has been pushing literate programming for years. Thetools exist to "make it so", as the saying goes.TimOn Sun, Aug 21, 2016 at 10:40 PM, Tim Daly <address@hidden> wrote:Timprototype that reproduces existing results.Anyway, at least there is a way to make a proof of conceptso it knows about the symbol as an operator. This falls underFor example, when writing y'(x)=0, Axiom needs y:=operator 'yIt is immediately obvious that this is more challenging than it seems.to write the necessary markup and weaver.matching input latex and output Axiom. The homework problem isDoing so and combining the results gives a set of examples withIt is not difficult to re-create the latex input for these examples.in various volumes and hand-translating them to Axiom input.The CATS tests in the past were created by reading the printed latextask quickly gets out of hand.Like any research problem it is a struggle to get a useful grip on this.Looking at G&R (I just ordered the latest, mine is 4th edition), the"Consideration 12: System Specific Commands"... which impliesthat the latex environment and quoting macros have to beimplemented. Sigh.There is no such thing as a simple job.On Sun, Aug 21, 2016 at 4:17 PM, Tim Daly <address@hidden> wrote:Axiom's results with respect to published texts. This has found bothAs you can see from the CATS work I've been trying to validatefor some of the work.course, that there IS an algorithmic base which does not yet existthe semantics based on some algorithmic base. This implies, ofkeeps the tables unchanged in print form. The grounded aspect keepsintended to be transparent and grounded. The transparent aspectI guess the difference in this effort is that the semantic markup isunderstatement of the decade.Dan,Welcome.Re: Howard Cohl. Yes, I'd like an introduction. It seems important tomake DLMF, CRC, and other sources contain enough semantics thatthey can be read by a computer algebra system. There are anenormous number of issues, such as what to do with functionsunknown to the CAS, which need to be thought through.I believe that NIST/CRC/G&R collections with semantic markup willhave a great normalizing effect on CAS development since it will raisecross-platform questions like "What percent of G&R do you handle?".Albert Rich (RUBI)[0] has been doing this for integration using patterns.This can only benefit computational mathematics in the long term.I've also campaigned for associating algorithms with published tables.It is important in the long term to have reference versions. The ACMused to do this years ago. I'd like to see a Gruntz algorithm for limitswhere it can be applied, for instance. It would also provide a focus on"missing algorithms" or edge cases. Davenport/Trager/Bronsteinalgorithms promise a decision procedure but there are no existingcomplete implementations. The tables could highlight missing cases,giving focus to efforts to complete the procedure.It will also put back-pressure on the tables to define different versionsof the same formulas based on domains (C, R, etc)."The GR work was more than I had anticipated"... wins the award for
The goal of this effort is to make it possible to read those
formulas directly into a CAS. Axiom is my primary target but
it should be done in a somewhat system agnostic form.
I've spent well over a year creating the computer algebra test suite.It would be so much easier and more useful if the original sourcescould be read directly.I read your paper. There is an interesting mix of syntax and semantics.Axiom bugs and misprints in published texts.The Kamke[1] suite was the first effort for differential equations.The Spiegel[2] chapter 14 on indefinite integrals for integration.The von Seggern[3] book on curves and surfaces for graphics.The Legendre and Grazini[4] on Pasta by Design for 3D graphics.The RUBI work on integration.and, currently I'm re-creating the numerics that were lost when NAGreleased the open source version, leaving me swimming throughLuke's[5] Algorithms book.which, to quote a famous phrase "was more than I had anticipated".Your Handbook of Integration[6] has a section on various known"Caveats, How an integration result may be incorrect". This raises thewonderful topic of branch cuts yet again. I did some testing and itseems that Axiom and Mathematica share one set while Maple andMaxima share another.All of which leads to a need to create better reference materials thatare generally available (unlike the ACM algorithms for non-payingcustomers) and directly useful for computational mathematics.The current plan is to take some tables, find or re-create the latex,invent a semantic markup, and then write the "weaver". At this pointthe research is still at the "proof of concept" stage. (tex formulasources are most welcome).Ultimately I'd really like to see a book of formulas and algorithmsthat I can just drag-and-drop into Axiom and be able to use themwithout lifetimes of work.Actually, that 's only the penultimate goal. I have augmentedAxiom to include proofs (ACL2,COQ) so I'd also like to see proofs,(this IS mathematics, after all) but maybe we'll leave that for
next month :-)Tim[1] Kamke. E. "Differentialgleichungen Losungsmethoden und Losungen"Chelsea Publishing Company, New York, 1959[2] Spiegel, Murray R. "Mathematical Handbook", Schaum's OutlineSeries; McGraw-Hill Book Company 1968[3] von Seggern, David "CRC Standard Curves and Surfaces",CRC Press, 1993 ISBN 0-8493-0196-3[4] Legendre, George L. and Grazini, Stefano "Pasta by Design",Thames and Hudson, 2001[5] Luke, Yudell "Algorithms for the Computation of MathematicalFunctions", Academic Press, 1977 ISBN 0-12-459940-6[6] Zwillinger, Daniel "Handbook of Integration" Jones and Bartlett,London, 1992On Sun, Aug 21, 2016 at 10:16 AM, Dan Zwillinger <address@hidden> wrote:All
I began reading this topic's emails when they first appeared, and then fell behind.
Sorry for my late comments.
I admire your efforts.
They may be somewhat related to what I have done in the past.
My experience is as follows:
(1) CRC SMTF (Standard Mathematical Tables and Formula)
I put the ~700 integrals in CRC's SMTF into a format from which
(A) they could be typeset in LaTeX
(B) they could be converted into Mathematica (which either did a symbolic or numeric computation) - and this was done
I let Richard Fateman use them for his experiments.
(2) Elsevier's GR (Gradshteyn and Ryzhik's "Table of Integrals, Series, and Products")
I put the ~12,000 (if my memory is correct) integrals into a format from which
(A) they could be beautifully typeset in LaTeX
(B) they could be converted into Mathematica - and this was NOT done
Enclosed is a PDF file describing the work and the resulting format.
A different format was used for SMTF and GR.
While the SMTF work was not too arduous, the GR work was more than I had anticipated.
The input (the previous version of GR) had little syntactic structure and it took much effort to get it into shape.
I used (many different) regular expressions (in perl) to translate the bulk of the book, and then lots of hand tuning.
While I think you are well beyond my thinking on these topics, please let me know if I can help.
I am friends with Howard Cohl at NIST, who may be the current lead for DLMF ("Digital Library of Mathematical Functions" at dlmf.nist.gov).
Let me know if you need an introduction.
Dan Zwillinger address@hidden 617-388-2382On 8/20/2016 11:30 PM, Tim Daly wrote:
$\INTEG{3x~dx}{x}$\VARIABLE to get a correct parse so the _expression_ could bemarkup can exist. Axiom, for example, would not need the \INT orparsing possible. Depending on the system, more or less parsingThe semantics markup makes the display pretty and the semantic_expression_ into an inputform for the CAS. In Axiom that would beA 'weaver' program would see the integration _expression_ asmore clever but that's not the point being made here.These trivial macros can be made less verbose and certainlybut notice that the variable of integration is in the semantic markup.(integralsign) 3x dxas $\int{3x~dx}$, that is, anAn integer 3 can be wrapped as \INT{3} but will still display as 3.without showing up in the output. This allows the semantics to carryThis allows the variable of integration to be passed in the semanticsreplaced by the first argument.delimites the characters that will occur during expansion with the #1defines the number of expected arguments. The brace argumentThis defines 4 new latex markups. The number in the square brackets\newcommand{\INTEG}[2]{\int{#1\newcommand{\POLY}[1]{#1}\newcommand{\VARIABLE}[1]{#1}\newcommand{\INT}[1]{#1}Some simple tests show that this works. Suppose selatex.sty contains:The game is to define latex markup that is transparent to the syntaxbut adds semantics for post processing.
}}
(As an aside, INT, VARIABLE, and POLY just happen to be valid
Axiom domain abbreviations, hence the name choice. This choice
of names gives grounding to the semantics.)
Notice that \INTEG takes two arguments but will display only one.
additional, non-display information needed by the CAS.
Some examples follow.
A variable x can be wrapped as \VARIABLE{x}, displayed as x.
$\POLY{\INT{3}\VARIABLE{x}}$ will display as 3*x
$\INTEG{\POLY{\INT{3}\VARIABLE{x}~dx}}{x} will be the same result
$\INTEG{\POLY{\INT{3}\VARIABLE{x}~dx}}{x}$
with all of the semantic tags. The weaver's job is to rewrite this
integrate(3*x,x)
This validates the fundamental idea.
The next step is to write a simple weaver program. The clever path
would be to embed a declarative form of the parser syntax (BNF?)
as comments in selatex.sty. That way the latex semantics and the
weaver syntax are kept in sync.
Weaver would read the BNF comments from selatex.sty and
the formula with semantic markup as input and parse the semantic
markup into inputforms. (Wish I thought of this homework problem
when I taught the compiler course :-) ).
Note that, depending on the BNF, weaver could be used to
generate output for Maxima's tree-based representation.
An alternative next step is to look at a CRC book, re-create the
syntactic latex and then create the selatex.sty entries necessary
to generate weaver input.
Infinitesimal progress, but progress non-the-less.
Tim
On Fri, Aug 19, 2016 at 12:45 AM, Tim Daly <address@hidden> wrote:
Suite" (CATS). Albert Rich has done this with RUBI and integration. That
One of the Axiom project goals is to develop a "Computer Algebra Test
work is already partially in the test suite and work has been done on the
pattern matching. Large datasets (like Kamke) are always welcome. Some,
such as Schaums were hand-developed. This is tedious.
As Axiom develops more explanations and documentation it would be
useful to execute the formulas directly so there is a local incentive to beclear about semantics.
In the long term the hope is that we can just grab formulas directly from
their sources (ala literate programming). Your work makes it plain that raw
latex does not carry sufficient semantics. There is a global question of
how to make this work. Unfortunately a general cross-platform solution
is difficult (cite Dewar/Davenport/et al. for OpenMath).
Since Axiom is literate and extracting formulas is trivial it seems that
literate markup is a natural goal. Since Axiom uses abstract algebra
as a scaffold the type tower already has a lot of axiomatic semantics.
The natural join of literate latex and abstract algebra is clearly
semantic markup, aka selatex.
===========
Consideration 10: semantic->inputform translation (weaver? :-) )
>x and not x has no particular meaning, but if x is explicitly true or false,
>Maxima simplifies it to false. If SEALATEX has a semantics -- are you
>defining yet another CAS? Or perhaps you should link it 100% to Axiom's
>semantics, which you presumably know about and can modify.
I am NOT defining another CAS. The goal is a "well-designed hack" using
universally understood latex, a latex package, and a translation program.
The selatex idea is only partially Axiom specific. \INT, for instance, seems
pretty generic. However, if the idea is to read formulas and disambiguate
a=b (boolean) vs a=b (equation) then the markup needs to be grounded
to have meaning. Axiom's domains (BOOLEAN) and (EQ) as the ground
\BOOLEAN(a=b)
\EQ(a=b)
are unambiguous relative to each other in Axiom. I don't know enough
about Maxima to understand how this might translate.
The extracted formulas with the decorated semantics still needs a
semantics->inputform (weaver) pre-processor which could be Maxima
specific. This would lead to debate about what "equality" means, of course.
Axiom has tried to create a first-order "rosetta stone" to translate between
systems (rosetta.pdf [1]) but it is too shallow to consider providing
cross-platform semantics.
=============
Consideration 11: \scope in selatex
>As far as recording stuff in DLMF -- there are presumably scope issues
>("in this chapter n,m are natural numbers....") and maybe even a need
>to make value assignments.
>I think you need to model these in SEALATEX too.
(See Consideration 6)
Clearly there are scoping issues. My current thinking is to create a
\scope markup that would manage the environment(s). This is not
a new issue (see "Lisp in Small Pieces" [0])
There seem to be three concerns.
First is the scope name, with something like 'global' as a keyword.Second is the "closure chain" of other scopes.
Third is the symbol being scoped.
\scope{name}{chain}{symbol}
The weaver program would walk this chain to create the proper
file syntax for system input.
============
Consideration 12: System specific commands \axiom
Along with the formulas it is clear that some system specific
input may be required, such as loading files, clearing workspaces,
etc. Some of these may be done in the weaver program, such as
between formulas. Others may need to be added to the semantics
block. So a markup that provides verbatim quoting per system
might be defined, e.g.
\axiom{)clear all} %clear the workspace
which would simply quote an input line.
==============
Note that so far all that is being suggested is transparent formula
markups which do not impact the presentation, some special tags
(\scope, \axiom,...) and a weaver program, along with the ability to
read the latex and extract named formulas (aka a literate program,
which Axiom already can do).
It ought to be possible (by design) to create a semantic version of
CRC that any system could import, assuming a "sufficiently clever
weaver".
On a more ambitious note, I am trying to find a way to keep the selatex
markup "hidden" in a pdf and use it as the clipboard paste when the
formula is selected. Anyone with a clue, please help.
===============
[0] Queinnec, Christopher, "Lisp in Small Pieces" ISBN 978-0521545662
(2003)
[1] Wester, Michael J. and Daly, TImothy "Rosetta"
On Thu, Aug 18, 2016 at 5:30 PM, Richard Fateman <address@hidden> wrote:
thanks for all the references :)
I'm not sure if I'm going to repeat comments I made already somewhere.
First, has Dan Zwillinger weighed in? I think that it would be useful
to see what he has done.
Next, there are ambiguities among CAS and even within a single CAS.
For example, in Macsyma/ Maxima there is generally no semantics
associated with "=" or ">". But in some contexts, there is some meaning.
x>2*y
is a tree _expression_. It is not associated with x or with y.
assume(x>2*y) does mean something ... it puts info in a database.
Somehow encoding the method to extract this information into SEALATEX
(SeLaTeX?) in a CAS-independent way -- that's quite a task. In
particular, it would seem to require an understanding of what assume()
does in Maxima, and what is() does also.
x and not x has no particular meaning, but if x is explicitly true or false,
Maxima simplifies it to false. If SEALATEX has a semantics -- are you
defining yet another CAS? Or perhaps you should link it 100% to Axiom's
semantics, which you presumably know about and can modify.
As far as recording stuff in DLMF -- there are presumably scope issues
("in this chapter n,m are natural numbers....") and maybe even a need
to make value assignments.
I think you need to model these in SEALATEX too.
Just musing about where you are heading.
RJF
On 8/18/2016 11:45 AM, Tim Daly wrote:
[1] Fateman, Richard J.some unit tests files on primitive domain markup. That should beTo validate these ideas Axiom will include an selatex.sty file and\cos, etc.Following that might be pre-existing latex functions like \int, \sum,such as \FRAC\INT or \POLY\COMPLEX\FLOAT.starts with "primitive" domains (e.g. INT), creating selatex I/O.with a stand-alone pre-processor from selatex->inputform.to expand \INT{3} into the correct domain. This could be doneinput, however, this means that the reader has to know howvariety of targets so this does not seem to be a problem. Forinput/output. Axiom allows output forms to be defined for awith CM semantics where currently we only output syntax.to change the syntactic display. They may, as noted before,information but these tags have to be carefully designed NOTIt is clear that there needs to be semantic tags that carry theit has computational mathematics (CM) semantics.Fateman [0] raised a set of issues with the OpenMath
approach. We are not trying to be cross-platform in this
effort. Axiom does provide an algebraic scaffold so it is
possible that the selatex markup might be useful elsewhere
but that is not a design criterion.
Fateman[1] also raises some difficult cross-platform issues
that are not part of this design.
Fateman[2] shows that parsing tex with only syntactic markup
succeeded on only 43% of 10740 inputs. It ought to be posible
to increase this percentage given proper semantic markup.
(Perhaps there should be a competition similar to the deep
learning groups? PhDs have been awarded on incremental
improvements of the percentage)
This is a design-by-crawl approach to the semantic markup
idea. The hope is to get something running this week that
'works' but giving due consideration to global and long-term
issues. A first glance at CRC/NIST raises more questions
than answers as is usual with any research.
It IS a design goal to support a Computer Algebra Test Suite
(http://axiom-developer.org/axiom-website/CATS ). It is very
tedious to hand construct test suites. It will be even more
tedious to construct them "second-level" by doing semantic
markup and then trying to use them as input, but the hope is
that eventually the CRC/NIST/G&R, etc will eventually be
published with semantics so computational mathematics can
stop working from syntax.
only formulas. We would like to be able to read this file so
===========
Consideration 4: I/O transparency
Assume for the moment that we take a latex file containing
require multiple semantic versions for a single syntax.
It is also clear that we would like to be able to output formulas
===========
Consideration 5: I/O isomorphism
An important property of selatex is an isomorphism with
It should be possible to read-then-write an selatex formula,
or write-then-read an selatex formula with identical semantics.
That might not mean that the I/O is identical though due to
things like variable ordering, etc.
===========
Consideration 6: Latex semantic macros
Semantic markup would be greatly simplified if selatex provided
a mechanism similar to Axiom's ability to define types "on the fly"
using either assignment
TYP:=FRAC(POLY(INT))
or macro form
TYP ==> FRAC(POLY(INT))
Latex is capable of doing this and selatex should probably include
a set of pre-defined common markups, such as
\FRINT ==> \FRAC\INT
===========
Consideration 7: selatex \begin{semantic} environment?
Currently Axiom provides a 'chunk' environment which surrounds
source code. The chunks are named so they can be extracted
individually or in groups
\begin{chunk}{a name for the chunk}
anything
\end{chunk}
We could provide a similar environment for semantics such as
\begin{semantics}{a name for the block}
\end{semantics}
which would provide a way to encapsulate markup and also allow
a particular block to be extracted in literate programming style.
===========
Consideration 8: Latex-time processing
Axiom currently creates specific files using \write to create
intermediate files (e.g. for tables). This technique can be used
to enhance latex-time debugging (where did it fail?).
It can be used to create Axiom files which pre-construct domains
needed when the input file with semantic markup is read.
This would help a stand-alone selatex->inputform preprocessor.
===========
Consideration 9: Design sketches
It is all well-and-good to hand-wave at this idea but a large
amount of this machinery already exists.
It would seem useful to develop an incremental test suite that
Once these are in place we could work on "type tower" markup
enough to start the bikeshed discussions.
Ideas? Considerations? Suggestions?
Tim[0] Fateman, Richard J.
"A Critique of OpenMath and Thoughts onEncoding Mathematics, January, 2001"https://people.eecs.berkeley.edu/~fateman/papers/openmathcri t.pdf
"Verbs, Nouns, and Computer Algebra, or What's Grammar Got to[2] Fateman, Richard J.
[Prev in Thread] | Current Thread | [Next in Thread] |