[Axiom-developer] rhxtangle

From: Ralf Hemmecke
Subject: [Axiom-developer] rhxtangle
Date: Sun, 06 Aug 2006 02:04:59 +0200
Hi Gaby,

it turned out that it is not totally easy to mimic the behaviour of notangle, in particular if multiline chunks have to be indented correctly.

Below you find I hope you find it useful to drop the initial noweb dependency for the configuration phase of Axiom.


% Copyright (C) 06-Aug-2006  Ralf Hemmecke <address@hidden>
% noweave -delay > rhxtangle.tex
% latex rhxtangle.tex
% makeindex rhxtangle
% latex rhxtangle.tex




% Definition taken from allprose.sty
  \expandafter\def\csname x#1\endcsname{\@@xnamedef{#1}{#2}{#3}}}
    \footnote{\href{#3}{\useterm{#2}: \url{#3}}}%
    \expandafter\gdef\csname !x#1\endcsname{}%


\title{A Poor Man's NoTangle}
\author{Ralf Hemmecke}


  In order to drop the dependency of the configuration step of
  \xAxiom{} on \xNoweb{} we present a \xPerl{} program that basically
  behaves like a simple version of the \texttt{notangle} program from


\xAxiom{} is written in a literate programming style using \xNoweb{}.
That means that (nearly) all sourcefiles of \xAxiom{} are actually
\LaTeX{} files with additional code chunks that are of the form
@<<code chunk name@>>=
some code comes here
These special files are known to \xAxiom{} developers as
\defineterm{pamphlet} files and come with the extension
\texttt{.pamphlet}. The source file of this text is one example of a
\useterm{pamphlet} file.

Since also the configuration files for \xAxiom{} should be written as
\useterm{pamphlet} files, it would be nice to depend on as few
prerequisites as possible. \xNoweb{} is not by default installed on
every computer. We replace a dependency on \xNoweb{} by a dependency
on \xPerl{} since \xAutomake{} from the GNU Autotools depends on
\xPerl{}, too and writing the script is relatively easy.

\section{What do we want?}

The script should be called via
perl file.ext.pamphlet > file.ext
in order to tangle the code in the same way \texttt{notangle} does.

The output of \texttt{rhxtangle} and \texttt{notangle} should be
identical modulo the translation of tabs to spaces done by
\texttt{notangle} and modulo spaces at the end of a line.

Currently \texttt{rhxtangle} does not translate tabulators to spaces
and removes trailing spaces and tabulators.

In order to keep our script simple, we impose some restrictions to the
file format.

\item We only accept one file on the command line and no options.
\item Code chunks \emph{must} be ended by an \verb'@' sign in the
  first column.
\item Inside code chunks an \verb'@' sign is forbidden in the first
\item Double square brackets may not appear inside double angle
\item A code chunk name may not contain \verb'@<<' or \verb'@>>', not
  even if they are escaped by an \verb'@' sign.
\item Double angle brackets, i.e. \verb'@<<', that should not be
  considered as part of a code chunk use \emph{must} be escaped by
\item No translation of initial whitespace takes place, so it is
  important for Makefile to already contain tabulator characters at
  the corresponding places.  

The last item is a major difference between our implementation and

\section{How to extract code chunks?}

The input file can basically be seen as a collection of code chunks.

Our script works in two passes.

<<global variables>>
<<read the code chunks from stdin>>
<<write code chunks to stdout>>

\subsection{Reading the code chunks from standard input}

We are going to store every chunk into a hashtable that is indexed by
the chunk name.

<<global variables>>=
%chunks = ();
$chunkname = "";

Our script skips everything that is not between an initial code chunk
definition of the form \verb'@<<chunk definition@>>=' and the following
\verb'@' in the first columm.

<<read the code chunks from stdin>>=
while (<>) {
    chomp; # strip off trailing newline character
    if (/^@<<(.+)@>>=\s*$/) { #chunk definition
        $chunkname = $1;
        if (! defined($chunks{$chunkname})) {$chunks{$chunkname} = []}
    if (/^@/) {$chunkname = ""; next} #chunk end
    if ($chunkname eq "") {next} #skip non-code-chunk lines
    push @{$chunks{$chunkname}}, $_; #append $_ at the end of the list

After executing the above code, the hash variable \verb'%chunks' %$
contains all the code chunk lines where lines that came from
chunks with identical name have already been joined.

\subsection{Writing the code chunks to standard output}

Writing out the code is done recursively starting with the code chunk
\verb'@<<*@>>'. Since [[printCodeChunk]] does not add a final newline,
we do it afterwards explicitly and thus trigger a flushing of an
internal buffer.
<<write code chunks to stdout>>=
&printCodeChunk('*', '');

The function \verb'printCodeChunk' takes the name of the chunk and the
amount of whitespace that must be printed after each newline in a
multiline code chunk.

If \texttt{notangle} sees the use of a code chunk in a line it
replaces that with the text from that code chunk. Where a trailing
newline in the code chunk text is removed.

For single line chunk text the replacement is simple. For multiline
text, \texttt{notangle} adds spaces after a newline so that the chunk
is basically shifted as a whole.

If the first \verb'<' of the code chunk use starts in column $n$ and
the corresponding code chunk represents multiple lines, then after
each newline $n$ spaces are added.

Let us take the following file, which we name \file{example.nw}.

notangle example.nw > ex.1
yields the following output.

For a Makefile one usually adds the command line switch \verb'-t8'.
The command
notangle -t8 example.nw > ex.2
yields the following text, where we have replaced tabulator characters
by arrows in order to show them here. (Note that there appears
\emph{no} tabulator character in the third line.
|------> TextC22TextC21
|------>       TextC22
|------>       TextC21
|------>       TextC22TextC21
|------>|------>     TextC22
|------>   TextC12Text2

Our program behaves differently. Instead of adding a fixed amount of
spaces \texttt{rhxtangle} concatenates initial whitespace that appears
in the input line and only adds spaces to account for positioning
the second use of \verb'@<<C1@>>' in the file \file{example.nw}.

perl example.nw > ex.3
results in a file that is identical to \file{ex.1}.

In the function [[printCodeChunk]] we use an auxiliary function
[[printout]] which delays the actual output and removes the escape
character \verb'@' and trainling spaces.


The function [[printCodeChunk]] is called recursively for each use of
a code chunk inside a line.
sub printCodeChunk {
    my($chunkname, $indentation) = @_;
    my($nextIndentation, $chars, $rest) = ($indentation, "", "");
    my($chunkNameUse) = ("");
    my(@lines, $line);
    if (! defined($chunks{$chunkname})) {
        print STDOUT "\nrhxtangle: Undefined chunkname @<<$chunkname@>>\n";
        die "rhxtangle: Undefined chunkname @<<$chunkname@>>\n";
    @lines = @{$chunks{$chunkname}};
    if (scalar(@lines) == 0) {return}

    <<handle first of the @lines array and prepare for next>>

    while (scalar(@lines) > 0) { #more than one line left
        &printout("$line\n$indentation"); #print leftover from last round
        <<handle first of the @lines array and prepare for next>>
    &printout($line); #print leftover from last round
@ %def printCodeChunk
Note that there is no newline printed at the end of [[printCodeChunk]].

The following code chunk treats exactly one input line.
It scans for code chunk uses and replaces them by the corresponding
text by recursively calling the function [[printCodeChunk]].

<<handle first of the @lines array and prepare for next>>=
$line = shift @lines;
$nextIndentation = $indentation;
while ($line =~ /^(.*?)(.)?@<<(address@hidden)@>>(.*)/) {
    $chars = "$1$2";
    $chunkNameUse = $3;
    $rest = $4;

    <<replace non-tabulator characters in 'chars' by spaces>>
    $nextIndentation .= $chars;

    if ($2 eq "@") { # the @<< is escaped --> no chunk use
        $nextIndentation .= "  "; # for @<<
        $line = "$chunkNameUse@>>$rest";
    $chars = "@<<$chunkNameUse@>>";
    <<replace non-tabulator characters in 'chars' by spaces>>
    &printCodeChunk($chunkNameUse, $nextIndentation);
    $nextIndentation .= $chars;
    $line = $rest;

<<replace non-tabulator characters in 'chars' by spaces>>=
$chars =~ s/[^\t]/ /g;

The [[printout]] function first buffers strings that it gets as input
until a newline character is detected. The newline character triggers
the actual output.

Before the line is actually written to standard output, trailing
whitespace are removed and escaped sequences are resolved.

<<global variables>>=
$printoutBuffer = '';

sub printout {
    my($str) = @_;
    while ($str =~ /(.*?)\n(.*)/) {
        $printoutBuffer .= $1;
        $str = $2;
        <<flush printoutBuffer>>
        print "\n";
    $printoutBuffer .= $str;
@ %def printout

<<flush printoutBuffer>>=
$printoutBuffer =~ s/\s*$//;
$printoutBuffer =~ s/@(@<<|@>>)/$1/g;
print $printoutBuffer;
$printoutBuffer = '';


Testing \texttt{rhxtangle} basically means to compare its output with
the output of \texttt{notangle}. There should be only spacing
differences for relevant files.
So we produce below a short script that could be extracted via
notangle >
and run via

That program lists relevant files and compares the output with that
ouf \texttt{notangle}.

In fact, if the function [[tab2spc]] is applied in the function
[[printout]] before actually writing to stdout, the script
\texttt{rhxtangle} would be closer to \texttt{notangle} called without
the option \verb'-t8'.

We believe however that \texttt{rhxtangle} is good enough as a ``poor
man's replacement for \texttt{notangle}''.

@files=`find . -name '*.pamphlet'`;

for $f (@files) {
    chomp $f;
    $f =~ s/\.pamphlet$//;
    if ($f =~ /\.bib$/) {next}
    #print ":: $f\n";
    if ($f =~ /Makefile/) {$opt="-t8"} else {$opt=''}
    @no = `notangle $opt     $f.pamphlet`;
    @rh = `perl $f.pamphlet`;
    if (scalar(@no) != scalar(@rh)) {
        print "Different number of lines. [$f]\n";
    while (scalar(@no) > 0) {
        $n = shift @no; $n =~ s/\s*$//;
        $r = shift @rh; $r =~ s/\s*$//;
        if ($n ne $r) {
            $nn = &tab2spc($n); $n =~ s/[\t]/_/g;
            $rr = &tab2spc($r); $r =~ s/[\t]/_/g;
            if ($nn ne $rr) {
                $nn =~ s/[\t]/_/g;
                $rr =~ s/[\t]/_/g;
                print "[[[$f]]]\n";
                print "n $nn\n";
                print "r $rr\n";

sub tab2spc {
    my($s) = @_;
    my($p) = index($s, "\t");
    while($p != -1) {
        # $q = (8 - ($p % 8)); print "$q <-- $p\n";
        $sp = ''; for (1 .. (8 - ($p % 8))) {$sp .= " "}
        $s =~ s/\t/$sp/;
        $p = index($s, "\t");
@ %def tab2spc

% We want the name of the section and the hypertarget command on the
% same page, but \printindex issues \clearpage. Thus we do it by hand
% her and redefine \clearpage.

