[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Axiom-developer] aldorunit, junit, axiom testing

From: daly
Subject: [Axiom-developer] aldorunit, junit, axiom testing
Date: Tue, 3 Apr 2007 16:18:01 -0500

Axiom test plans are evolving based on need but with an eye
to longer term plans to leverage the effort. I'll try to 
cover some of my thoughts on the subject here.

In the gold and prior releases the testing was based on the 
input files and detailed review of the console logs, as well
as comparisons of intermediate files after each change. This
is very time consuming and clearly I must have missed a bug
or two since the algebra in silver seems broken somehow.

Rather than releasing a full merged gold/BI/WH, which is what
I was attempting I've backed off and decided to build up the
testing harnesses so I can hopefully catch the bug automatically.

There are various levels of testing and I'm working most specifically
on regression testing since that's what this latest release requires.
I'll lay out some thoughts here but these are not firmly fixed, just
some musings about testing and how it matches various subgoals I have.


I have built a regression test harness that uses md5sum as a tool
to show me what might have changed in the build due to a change in
the source. Its only purpose is to focus attention on changes and
is not useful or informative in any other way.

I'm also in the process of rewriting the input files into a standard
form so they can be used for regression testing. Since regression 
testing is only trying to find things that differ but not find 
bugs a simple compare mechanism is sufficient. Thus I wrote a
lisp function called "regress" that will compare the results.
(diff is insufficient since we need to ignore certain things.)

The input file rewrite involves a couple steps. First I run the
input file in the "base case axiom" (this is the comparison
standard regardless of bugs). Running these input files gives 
me "stanzas" written to the output file. Thus if "foo.input"
contains the line:


and I do

  echo ")read foo.input" | axiom | >foo.output

I'll get foo.output containing:


 (1) 5
                             Type: PositiveInteger

Next I can use this for regression by rewriting the foo.output
file into a foo.input.pamphlet file. This new pamphlet file contains
the lines:

--S 1 of 1
--R (1) 5
--R                             Type: PositiveInteger
--E 1

and what we see here is an "input file" with the input line
starting with "--S" and ending with the first "--R". The
expected results are prefixed by "--R", and the test ends
with the prefixed line "--E".

Running this input file again will now give:

--S 1 of 1

 (1) 5
                             Type: PositiveInteger
--R (1) 5
--R                             Type: PositiveInteger
--E 1

and I then do:

  echo ')lisp (regress "foo.output")" | axiom >foo.regress

which gives:

regression result passed 1 of 1 file foo


regression result FAILED 1 of 1 file foo

depending on the result of the compare.


The regression tests show where the algebra that is executed
might fail from system to system. I've already uncovered a
minor library difference between redhat and fedora systems 
that manages to poke its head into the axiom output.

I'm working my way back thru the gold versions in the arch archive 
to find where things might have broken.

Since the regression tests are in standard pamphlet format they
contain explanations in the resulting .dvi files.

Since the regression tests are in standard input file format
and they contain the results, we have an executable file format
that also documents the expected results. This can be extended
in many ways in the future to give us better coverage.

There are also a lot of interesting examples of algebra in
these files that should be better explained. Hopefully we'll
go back and expand them.

UNIT TESTING...How it might work

In the recent past I posted a file called ffrac.spad. I have various
planned usage for this file as a test case for new facilities in 
Axiom, including experiments in "drag-and-drop", "unit testing",
"cohen algebra", and some tutorial work in how to program.

You can see that I've "decorated" the functions with example
axiom input lines. These show the common and expected uses for
these functions. They will eventually evolve into unit tests.

The first constraint I have for unit tests is that they should
help explain and document the domain. It is very important that
we capture the developers insights into the code via the testing
strategy. What are reasonable input forms for the domain? What
are boundary tests? How can we coerce/convert to this domain or
from this domain to others? 

The unit testing might evolve along the lines of the regression
testing, for three reasons. First, the regression tests contain
the results so they document the expected response "inline".
Thus you can read the file and see what the results should look like.

Second, having a compatible format between unit tests and regression
tests means that the unit tests can be automatically used for regression.

Third, since both use the pamphlet file format we can evolve the 
"standard" (hah) for pamphlet files so that unit/regression test
chunks follow a common pattern. This would allow us to use the same
technique that Bill Page --foisted, ahem-- pioneered for selecting
algebra files automatically. :-)

If we "standardize" the unit/regresion testing then we can leverage
this for "drag-and-drop" extensions of the algebra, testing, and
documentation machinery.


It is unclear how AldorUnit will fit in to testing in the short
term. It depends in part on "next weeks announcement" from Aldor.
If we get a free Aldor and there is a push to write/extend/rewrite
the Axiom domains in Aldor then clearly this is the likely future

Frankly I haven't spent sufficient time considering this in
detail because I'm trying to "get this release done". I managed
to let the release mechanism get out of control and that has
clearly cost me dearly.

THE FUTURE PLANS? .... guesses/musings/random nonsense

ZERO TESTS? ... the harder problem

Ideally unit tests would be written so axiom could verify
the results. Some tests might be written in the spirit of:

-- testing plus/minus inversion

but this requires a deeper knowledge of the various domains.
Yet even knowing that the expressions are equivalent and
having domain expertise doesn't guarantee that you can wrestle
axiom into admitting zero equivalence.


I'd like pamphlet files to be pure latex files. That way we
don't have to "preprocess" the files to latex them and we can
submit pamphlet files as standard tex files to conferences.

One of my long-term goals is to have an open literature in computer
algebra that includes the source code for the paper within the 
paper in a literate style. So there is a push to eliminate any
non-latex format. In the long term that implies that the noweb
syntax has to go away.

I've already written a preliminary version of noweb-style chunks
in latex so at some point the plan is to change the chunk naming
mechanism to be pure, standard latex (eg:
<<chunk name>>= becomes \begin{chunk}{chunk name}
@               becomes \end{chunk}
<<chunk name>>  becomes \use{chunk name}
and I hope to introduce this when time permits. Unfortunately
getting this release out has not been a simple job. Who knew? 


Thus one possible idea is to just use a standard chunkname for
the test/regress stanzas and have them be verbatim blocks. A
more interesting idea is to introduce a new environment that
would enable the input and output to be captured for the testing
purposes. This would just be an extension of the above idea.
Something like:


 (1) 5
                             Type: PositiveInteger

This could be reformed to either print format or extraction format
for test. Thus the unit test/regression test machinery has to fit 
smoothly into the rest of the pamphlet machinery.

In fact, if this subgoal is met there will be no distinction 
between latex and pamphlet files, thus reducing the complexity
of axiom significantly.

Alternatively we could extend noweb to "know" about the testing
chunk format so that it will automatically prefix the chunk lines
with "--S", "--R", "--E", etc. when extracting for test purposes.
This would be reasonable to do in the short term and involve less

PAMPHLET -> DRAG-and-DROP subgoal

I want people to be able to fire up axiom (likely using Doyen at
a conference) and be able to drag-and-drop a literate paper from
the conference onto a running axiom. This should:

  1) unpack the algebra
  2) unit test the algebra
  3) unpack the paper
  4) add the paper to the documentation in a deep way
     including updating the front end
  5) add the algebra to the system

A person should be able to execute the paper. If the paper
contains something like a "speed comparison" the person 
should be able to execute the paper and see his own results
for his system reflected in the paper.

Thus the unit test/regression test machinery has to fit 
smoothly into the rest of the pamphlet machinery.


The test/regressions stanzas have to fit fluidly into the whole
pamphlet file mechanism and be easy to recognize and extract.
Plus they must be "self-verifying" so that the results obtained
can be checked against the results expected. 

And we'd like the test cases to also serve as documentation of
correct use of the domain as well as its limits and assumptions.


I've mused about the long term goal of constructing a CATS
standard that could be used across all of the existing algebra
platforms. Such a standard would give credibility to the results
of existing systems and leverage the mathematical expertise toward
verifying results.

Consider a CATS document that explains some algorithm, such
as a GCD algorithm.

A CATS document would start with some area of computational 
mathematics, giving a mathematical explanation, a few math
problems that cover the area in question as well as boundary
cases. The document would then have the algorithm within it.
Ideally we'd be able to drag-and-drop this document and run
the tests. 

There are (at least) two challenges with this level of testing.

First we have the issue that the mathematics might cross 
several axiom domains. This implies that the testing is not
unit testing.

Second, we have the issue that the algorithm might cross
domains or have special cases in subdomains, etc. This would
mean that we'd have to make sure that the CATS examples 
include the special cases.


Ultimately we'd like the CATS test cases to "just run".
How they get structured within the CATS documentation, how
we capture the axiom output, and how we resolve the axiom
results with the CATS results are all things to think about.

"There is no such thing as a simple job" 

reply via email to

[Prev in Thread] Current Thread [Next in Thread]