axiom-developer
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Axiom-developer] Literate programming thoughts


From: David Bindel
Subject: [Axiom-developer] Literate programming thoughts
Date: Sun, 15 Jul 2007 10:22:35 -0400
User-agent: Thunderbird 1.5.0.12 (Macintosh/20070509)

I happened upon the discussions about literate programming on this
mailing list while figuring out why so many people were looking at my
"dsbweb" entry on my blog.  I've had some e-mail exchanges with Tim,
who suggested that I might re-post our exchanges to the
list.  Rather than re-post without context, I've decided to summarize
my thoughts on the matter in one (long) post.  I will also put a
version of this note on my blog, though it may take a week or so.

I will say in advance that different software projects have different
cultures and needs, and I know only a little about the history and
development of Axiom.  So I recognize that not all of what I write
will be relevant.  For the pieces that are (or for those that
aren't, but that you find entertaining or irritating), I welcome
your comments and criticisms.

Cheers,
David

---

I.  A little personal history

A few years ago, I was involved in SUGAR, a simulator that was meant
to play the same role for micro-electro-mechanical system designers
that SPICE plays for IC designers.  The group I worked with included
engineers (as well as other mathematicians and computer scientists)
with widely varying levels of programming experience.  For version 3.0
of this code, I decided to use noweb (on top of a mix of C, MATLAB,
and Lua codes for which a language-specific tool would have been
inappropriate).  Part of my reason for choosing this tool was
curiosity: I'd enjoyed Knuth's book on literate programming, and I
really liked Hansen's /C Interfaces and Implementations/ (also a
literate program).  I was curious how these techniques would work for
me.  The other part of the reason was that I thought a literate
programming environment would help me better convey some of the
internal structure to the others in the group.

As an experiment in literate programming, I think my effort was a
modest success.  The program was clearer than it would have been
otherwise; and by writing in that style, I did catch problems did that
I otherwise might have missed.  As an effort to convey the information
to others, the effort was a failure.  I think I was the only one to
ever read most of what I wrote or re-wrote in a literate style.

At a more concrete level, I had some problems with the structure of
noweb, with its layers of pipeline-connected filters.  I could usually
get noweb working on my system, but it broke if I moved it from the
original installation location, and it depended on the presence of
several other tools.  Getting noweb working for other people (or
getting it working for myself when I did builds on a Windows machine)
was a hassle.  With a very few exceptions, it bothered me to check
generated sources into the version control, or to send patches to
generated sources, so I wanted to have a very portable tool to take my
noweb files and produce the relevant C, MATLAB, and Lua files.
Consequently, I ended up writing a short C code to replace the tangle
phase in noweb.

I subsequently read some wry comments on various literate programming
sites to the effect that writing one's own LP tool seems like almost a
requirement for anyone who starts playing with literate programming.
Certainly this matches my experience, and it sounds as though the
Axiom group may have the same experience.


II.  Comments on documentation tools

I ultimately gave up on noweb (though if noweb 3 becomes a reality, I
may try it again).  I did try some other documentation tools, though.
I think Doxygen is cool, but it's very oriented toward C/C++.  I've
also found over time that when I read other people's C/C++ code, I
usually perfer to open the source in an Emacs session than to open the
Doxygen-generated files in a web browser.  For whatever reason, I have
found the opposite when reading Java code; I suspect this is because
of the point I alluded to earlier, that different types of
documentation are appropriate to different types of codes.

I was aware of the literate Haskell efforts, and also of documentation
tools like Schmooz.  These were, once again, language-specific tools.
But the basic ideas are independent of the language, and they are
quite straightforward.  If one is willing to give up macro expansions
(chunks) and various indexing tools (which tend to be language
specific), then a literate programming tool need do little more than
scan a file to identify which pieces should be treated as code and
which pieces as a documentation language (TeX, HTML, something else).
dsbweb is my effort at such a system, and I've summarized my rationale
for its design both on my blog and in the code.

Different documentation tools have different attributes, and I think
the ones I care about least often receive the most press.  I do not
particularly care about pretty-printing, and I mostly do not care
about cross-referencing of code names in the documentation (if such a
cross-reference is necessary, it may be a good to reorganize the
code).  I don't really care about automatically-generated class
diagrams unless I'm trying to get a feel for someone else's code, or
to reverse engineer a code that has been written obscurely.  Some
questions I *do* care about are these:

A.  What is the complexity of the meta-language?

The answer to this question really involves two features.

First, there is a question about the documentation language -- is it
HTML, LaTeX, Texinfo, or something that is supposed to generate all of
the above?  I favor LaTeX, simply because it seems to make sense for
someone writing finite elements and eigenvalue solvers to have a
documentation tool that gracefully handles mathematical notation.
Some minimal extensions (like the [[ ]] construction in noweb) can
also be handy, though there is always a danger that such constructions
will conflict with something meaningful.  (Oh, and try writing C code
with bit-shift operations using noweb, and see if either the tool or
the emacs mode doesn't get confused!)

Second, there is a question about how the documentation tool extends
the underlying programming language.  The main example of this is the
chunk system, which is ultimately a limited macro processing facility.
If you use a programming language like Pascal, a chunk-like macro
processing feature can be a blessing, because it allows one to
circumvent the strict ordering otherwise imposed by the programming
language.  I have not written Pascal code in about a decade, though,
and even then I wrote Delphi/Object Pascal -- a more forgiving dialect
than standard Pascal in so many ways.  When I write in C/C++, I have a
macro processor already, should I care to use it; and even when using
noweb, I found that if I was referenced a chunk inside a function
definition, it usually meant that I should consider rewriting the
function to be less monolithic.  For programs written in Lisp,
Haskell, or similar languages, I see few reasons for chunks at all
(feel free to disagree!).

There is a dark side to chunks, though I rarely see it discussed.  It
is the same as the dark side to other macro preprocessing systems.
The macro system typically does not enforce any of the scoping
mechanisms in the underlying language, so that a reorganization made
in the name of clarity can lead to maintenance headaches later on.

Also, there is no scoping of chunk definitions (except scoping to a
file, perhaps).  If, like me, you tend to use similar phrases to
describe similar things, this lack of scope can be a real headache.

B.  What is the canonical source?

By the canonical source, I mean the source that the user edits (as
opposed to the files that are automatically generated).  There are
really three choices here.  The first choice, which is the
conventional choice for WEB and its successors, is to generate both
the compiler/interpreter input files and the documentation files.  The
second choice, which I favor and which is the choice made by Schmooz
and by Javadoc/Doxygen/DOC++ and company, is to treate the
compiler/interpreter input files as canonical, and to generate
documentation files from structured comments.  The third choice, which
I believe Tim has proposed for Axiom, is to make the documentation file
canonical.

There are lots of tools (editors, syntax checkers, debuggers, etc)
that work on program source files.  One reason I favor putting
documentation in embedded comments is that this practice allows me to
continue using such tools without constantly switching back and forth
between a generated file and a canonical file.  For example, I find it
tremendously handy to have meaningful line numbers in the original
source file when I debug (#line helps, but is not a universal way to
achieve this for generated sources -- not all the world's a C program,
nor are all the tools completely preprocessor-aware).

There are also tools that understand LaTeX, which is itself a flavor
of programming language.  One argument I can see for putting
everything into a LaTeX file is that there would be tool support.  I
have colleagues who would undoubtedly write their code with LyX or
Scientific Workplace if they had a tool that would let them.  I'm not
convinced that this is a good thing, but to each his own.  I can
certainly see the value of having a meaningful documentation source
file line number when LaTeX told me that I was missing a math-mode
terminator or had a runaway argument.

Another argument for making the LaTeX format canonical is that you
could arrange the documentation in ways in some ways that would be
difficult to manage with noweb or related tools.  For example, if an
example or a graph that really belonged in a figure was produced by
some short code fragment, you could perhaps do something like

\begin{figure}
\input{somegraph}
\caption{
Graph produced by the function
\begin{chunk}{graphs.lisp}
  (some-example "somegraph.tex" ...)
\end{chunk}
}
\end{figure}

On the other hand, I'm not sure whether the above would count as a
good thing or an evil thing.

Using a completely separate canonical source file adds inconvenience
to using tools for either the programming language(s) or the
documentation language(s) used in the system.  On the other hand,
it also means that the meta-language provided by the documentation tool
can be more powerful.

C.  Will this tool complicate my life during the support phase?

I want people (not necessarily highly computer-literate people) to be
able to use my software without asking me a lot of questions about how
to configure, build, and use it.  Consequently, I value the fact that
dsbweb consists of a single file of portable C code, and found the
dependencies required by noweb to be a nuisance.

I'm willing to use a more complicated tool that treats the programming
language source files as canonical, but only because I figure that
someone who is just trying to get the code to run will probably be willing
to put off compiling the detailed documentation.

If I'm lucky enough to have users willing to contribute and/or
document contributions, I would like to give them a documentation
system that does not have a steep learning curve (which was one of the
things I liked about noweb).


III.  Comments on documentation habits

These thoughts are less organized and sound more self-important than I
would like, but...

Literate programming is no excuse not to strive for self-documenting
code (even if the ideal of completely self-documenting code is likely
never to be realized).  Indeed, I think one of the advantages of
literate programming practices is that it the prospect of having to
explain my code makes me write simpler, clearer code.  This may be
self-evident.

English is richer than programming languages are.  It is also more
redundant and ambiguous.  Concisely-written code fragments mean a lot
to me, and I think the goal of explaining code to humans should not be
confused with "explaining the code in English."  I mention this only
because some of the worst code I've ever read has also been the
subject of some of the most detailed English documentation (the
associated data structures were not so well documented).  Neither the
documentation nor the code for that library made much sense -- which
probably explains why it consistently computed incorrect answers.

Knuth wrote that he wants to be able to treat programs like essays.  I
think this is a good analogy, and a telling one.  An essay need not be
short -- throughout /The Structure of Scientific Revolutions/, Kuhn
refers to his book as "this essay."  But essays are typically
self-contained, and are typically written by one author on one theme.
This is not to say that large, varied literate programs are
intrinsically bad or difficult, but that there are bound to be
organizational challenges involved in going from a collection of
essays/modules to a fully integrated system.  There are organizational
challenges in organizing any large system, but I think large literate
systems pose some special challenges.  I have always found it more
difficult to organize disparate text fragments by different authors
than to organize disparate code fragments by different authors.  I
wonder sometimes whether others don't feel the same, and if this is
part of why I know so few multi-developer projects based on literate
programming techniques.




reply via email to

[Prev in Thread] Current Thread [Next in Thread]