[Axiom-developer] rhxtangle

axiom-developer

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Axiom-developer] rhxtangle

From:	Ralf Hemmecke
Subject:	[Axiom-developer] rhxtangle
Date:	Sun, 06 Aug 2006 02:04:59 +0200
User-agent:	Thunderbird 1.5.0.5 (X11/20060719)

Hi Gaby,

it turned out that it is not totally easy to mimic the behaviour ofnotangle, in particular if multiline chunks have to be indented correctly.

Below you find rhxtangle.pl.pamphlet. I hope you find it useful to dropthe initial noweb dependency for the configuration phase of Axiom.


Ralf

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%
% rhxtangle.pl
% Copyright (C) 06-Aug-2006  Ralf Hemmecke <address@hidden>
%
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%
% noweave -delay rhxtangle.pl.pamphlet > rhxtangle.tex
% latex rhxtangle.tex
% makeindex rhxtangle
% latex rhxtangle.tex
% 
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

\documentclass{article}
\usepackage{axiom}
\usepackage{noweb}
\usepackage{makeidx}
\makeindex
\usepackage{hyperref}

\newcommand{\file}[1]{\textsf{#1}}
\newcommand{\email}[1]{\url{#1}}

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

% Definition taken from allprose.sty
\makeatletter
\def\xnamedefstyle{\textsc}
address@hidden
  address@hidden
  address@hidden@\xnamedefstyle{#1}]}}
address@hidden
  \expandafter\def\csname x#1\endcsname{\@@xnamedef{#1}{#2}{#3}}}
\def\@@xnamedef#1#2#3{%
  address@hidden
    \defineterm{#2}%
    \footnote{\href{#3}{\useterm{#2}: \url{#3}}}%
    \expandafter\gdef\csname !x#1\endcsname{}%
  }{\useterm{#2}}}
\def\rhxterm{%
  address@hidden
  \def\useterm##1{##1}%
  \def\defineterm##1{##1}%
}
\makeatother
\IfFileExists{rhxterm.sty}{\usepackage{rhxterm}}{\rhxterm}
  

\xnamedef{Automake}{http://www.gnu.org/}
\xnamedef{Axiom}{http://www.axiom-developer.org/}
\xnamedef{Noweb}{http://www.eecs.harvard.edu/~nr/noweb}
\xnamedef{Perl}{http://www.perl.org}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

\title{A Poor Man's NoTangle}
\author{Ralf Hemmecke}

\begin{document}
\maketitle

\begin{abstract}
  In order to drop the dependency of the configuration step of
  \xAxiom{} on \xNoweb{} we present a \xPerl{} program that basically
  behaves like a simple version of the \texttt{notangle} program from
  \xNoweb{}.
\end{abstract}

\tableofcontents

\section{Introduction}
\xAxiom{} is written in a literate programming style using \xNoweb{}.
That means that (nearly) all sourcefiles of \xAxiom{} are actually
\LaTeX{} files with additional code chunks that are of the form
\begin{verbatim}
@<<code chunk name@>>=
some code comes here
@@
\end{verbatim}
These special files are known to \xAxiom{} developers as
\defineterm{pamphlet} files and come with the extension
\texttt{.pamphlet}. The source file of this text is one example of a
\useterm{pamphlet} file.

Since also the configuration files for \xAxiom{} should be written as
\useterm{pamphlet} files, it would be nice to depend on as few
prerequisites as possible. \xNoweb{} is not by default installed on
every computer. We replace a dependency on \xNoweb{} by a dependency
on \xPerl{} since \xAutomake{} from the GNU Autotools depends on
\xPerl{}, too and writing the script is relatively easy.



%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\section{What do we want?}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

The script should be called via
\begin{verbatim}
perl rhxtangle.pl file.ext.pamphlet > file.ext
\end{verbatim}
in order to tangle the code in the same way \texttt{notangle} does.

The output of \texttt{rhxtangle} and \texttt{notangle} should be
identical modulo the translation of tabs to spaces done by
\texttt{notangle} and modulo spaces at the end of a line.

Currently \texttt{rhxtangle} does not translate tabulators to spaces
and removes trailing spaces and tabulators.

In order to keep our script simple, we impose some restrictions to the
file format.

\begin{enumerate}
\item We only accept one file on the command line and no options.
\item Code chunks \emph{must} be ended by an \verb'@' sign in the
  first column.
\item Inside code chunks an \verb'@' sign is forbidden in the first
  column.
\item Double square brackets may not appear inside double angle
  brackets.
\item A code chunk name may not contain \verb'@<<' or \verb'@>>', not
  even if they are escaped by an \verb'@' sign.
\item Double angle brackets, i.e. \verb'@<<', that should not be
  considered as part of a code chunk use \emph{must} be escaped by
  \verb'@'.
\item No translation of initial whitespace takes place, so it is
  important for Makefile to already contain tabulator characters at
  the corresponding places.  
\end{enumerate}

The last item is a major difference between our implementation and
\texttt{notangle}.





%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\section{How to extract code chunks?}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

The input file can basically be seen as a collection of code chunks.

Our script works in two passes.

<<*>>=
<<global variables>>
<<read the code chunks from stdin>>
<<write code chunks to stdout>>
@





%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\subsection{Reading the code chunks from standard input}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

We are going to store every chunk into a hashtable that is indexed by
the chunk name.

<<global variables>>=
%chunks = ();
$chunkname = "";
@
%$

Our script skips everything that is not between an initial code chunk
definition of the form \verb'@<<chunk definition@>>=' and the following
\verb'@' in the first columm.

<<read the code chunks from stdin>>=
while (<>) {
    chomp; # strip off trailing newline character
    if (/^@<<(.+)@>>=\s*$/) { #chunk definition
        $chunkname = $1;
        if (! defined($chunks{$chunkname})) {$chunks{$chunkname} = []}
        next;
    }
    if (/^@/) {$chunkname = ""; next} #chunk end
    if ($chunkname eq "") {next} #skip non-code-chunk lines
    push @{$chunks{$chunkname}}, $_; #append $_ at the end of the list
}
@
%$

After executing the above code, the hash variable \verb'%chunks' %$
contains all the code chunk lines where lines that came from
chunks with identical name have already been joined.











%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\subsection{Writing the code chunks to standard output}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

Writing out the code is done recursively starting with the code chunk
\verb'@<<*@>>'. Since [[printCodeChunk]] does not add a final newline,
we do it afterwards explicitly and thus trigger a flushing of an
internal buffer.
 
<<write code chunks to stdout>>=
&printCodeChunk('*', '');
&printout("\n");
<<printCodeChunk>>
@

The function \verb'printCodeChunk' takes the name of the chunk and the
amount of whitespace that must be printed after each newline in a
multiline code chunk.

If \texttt{notangle} sees the use of a code chunk in a line it
replaces that with the text from that code chunk. Where a trailing
newline in the code chunk text is removed.

For single line chunk text the replacement is simple. For multiline
text, \texttt{notangle} adds spaces after a newline so that the chunk
is basically shifted as a whole.

If the first \verb'<' of the code chunk use starts in column $n$ and
the corresponding code chunk represents multiple lines, then after
each newline $n$ spaces are added.

Let us take the following file, which we name \file{example.nw}.
<<example.nw>>=
@<<*@>>=
Text1
     @<<C1@>>@<<C1@>>Text2
      Text3
@@
@<<C1@>>=
TextC11
    @<<C2@>>@<<C2@>>
TextC12
@@
@<<C2@>>=
TextC21
TextC22
@@
@

Running
\begin{verbatim}
notangle example.nw > ex.1
\end{verbatim}
yields the following output.
\begin{verbatim*}
Text1
     TextC11
         TextC21
         TextC22TextC21
               TextC22
     TextC12TextC11
               TextC21
               TextC22TextC21
                     TextC22
           TextC12Text2
      Text3
\end{verbatim*}

For a Makefile one usually adds the command line switch \verb'-t8'.
The command
\begin{verbatim}
notangle -t8 example.nw > ex.2
\end{verbatim}
yields the following text, where we have replaced tabulator characters
by arrows in order to show them here. (Note that there appears
\emph{no} tabulator character in the third line.
\begin{verbatim*}
Text1
     TextC11
         TextC21
|------> TextC22TextC21
|------>       TextC22
     TextC12TextC11
|------>       TextC21
|------>       TextC22TextC21
|------>|------>     TextC22
|------>   TextC12Text2
      Text3
\end{verbatim*}

Our program behaves differently. Instead of adding a fixed amount of
spaces \texttt{rhxtangle} concatenates initial whitespace that appears
in the input line and only adds spaces to account for positioning
the second use of \verb'@<<C1@>>' in the file \file{example.nw}.

Executing
\begin{verbatim}
perl rhxtangle.pl example.nw > ex.3
\end{verbatim}
results in a file that is identical to \file{ex.1}.


In the function [[printCodeChunk]] we use an auxiliary function
[[printout]] which delays the actual output and removes the escape
character \verb'@' and trainling spaces.

<<printCodeChunk>>=
<<printout>>
@


The function [[printCodeChunk]] is called recursively for each use of
a code chunk inside a line.
<<printCodeChunk>>=
sub printCodeChunk {
    my($chunkname, $indentation) = @_;
    my($nextIndentation, $chars, $rest) = ($indentation, "", "");
    my($chunkNameUse) = ("");
    my(@lines, $line);
    if (! defined($chunks{$chunkname})) {
        print STDOUT "\nrhxtangle: Undefined chunkname @<<$chunkname@>>\n";
        die "rhxtangle: Undefined chunkname @<<$chunkname@>>\n";
    }
    @lines = @{$chunks{$chunkname}};
    if (scalar(@lines) == 0) {return}

    <<handle first of the @lines array and prepare for next>>

    while (scalar(@lines) > 0) { #more than one line left
        &printout("$line\n$indentation"); #print leftover from last round
        <<handle first of the @lines array and prepare for next>>
    }
    &printout($line); #print leftover from last round
}
@ %def printCodeChunk
%$
Note that there is no newline printed at the end of [[printCodeChunk]].


The following code chunk treats exactly one input line.
It scans for code chunk uses and replaces them by the corresponding
text by recursively calling the function [[printCodeChunk]].

<<handle first of the @lines array and prepare for next>>=
$line = shift @lines;
$nextIndentation = $indentation;
while ($line =~ /^(.*?)(.)?@<<(address@hidden)@>>(.*)/) {
    $chars = "$1$2";
    $chunkNameUse = $3;
    $rest = $4;

    &printout($chars);
    <<replace non-tabulator characters in 'chars' by spaces>>
    $nextIndentation .= $chars;

    if ($2 eq "@") { # the @<< is escaped --> no chunk use
        &printout("@<<");
        $nextIndentation .= "  "; # for @<<
        $line = "$chunkNameUse@>>$rest";
        next;
    }
    $chars = "@<<$chunkNameUse@>>";
    <<replace non-tabulator characters in 'chars' by spaces>>
    &printCodeChunk($chunkNameUse, $nextIndentation);
    $nextIndentation .= $chars;
    $line = $rest;
}
@
%$

<<replace non-tabulator characters in 'chars' by spaces>>=
$chars =~ s/[^\t]/ /g;
@
%$

The [[printout]] function first buffers strings that it gets as input
until a newline character is detected. The newline character triggers
the actual output.

Before the line is actually written to standard output, trailing
whitespace are removed and escaped sequences are resolved.

<<global variables>>=
$printoutBuffer = '';
@

<<printout>>=
sub printout {
    my($str) = @_;
    while ($str =~ /(.*?)\n(.*)/) {
        $printoutBuffer .= $1;
        $str = $2;
        <<flush printoutBuffer>>
        print "\n";
    }
    $printoutBuffer .= $str;
}
@ %def printout

<<flush printoutBuffer>>=
$printoutBuffer =~ s/\s*$//;
$printoutBuffer =~ s/@(@<<|@>>)/$1/g;
print $printoutBuffer;
$printoutBuffer = '';
@
%$





%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\section{Tests}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

Testing \texttt{rhxtangle} basically means to compare its output with
the output of \texttt{notangle}. There should be only spacing
differences for relevant files.
So we produce below a short script that could be extracted via
\begin{verbatim}
notangle -Rrhxtangletest.pl rhxtangle.pl.pamphlet > rhxtangletest.pl
\end{verbatim}
and run via
\begin{verbatim}
perl rhxtangletest.pl
\end{verbatim}

That program lists relevant files and compares the output with that
ouf \texttt{notangle}.

In fact, if the function [[tab2spc]] is applied in the function
[[printout]] before actually writing to stdout, the script
\texttt{rhxtangle} would be closer to \texttt{notangle} called without
the option \verb'-t8'.

We believe however that \texttt{rhxtangle} is good enough as a ``poor
man's replacement for \texttt{notangle}''.

<<rhxtangletest.pl>>=
<<tab2spc>>
@files=`find . -name '*.pamphlet'`;

for $f (@files) {
    chomp $f;
    $f =~ s/\.pamphlet$//;
    if ($f =~ /\.bib$/) {next}
    #print ":: $f\n";
    if ($f =~ /Makefile/) {$opt="-t8"} else {$opt=''}
    @no = `notangle $opt     $f.pamphlet`;
    @rh = `perl rhxtangle.pl $f.pamphlet`;
    if (scalar(@no) != scalar(@rh)) {
        print "Different number of lines. [$f]\n";
    }
    while (scalar(@no) > 0) {
        $n = shift @no; $n =~ s/\s*$//;
        $r = shift @rh; $r =~ s/\s*$//;
        if ($n ne $r) {
            $nn = &tab2spc($n); $n =~ s/[\t]/_/g;
            $rr = &tab2spc($r); $r =~ s/[\t]/_/g;
            if ($nn ne $rr) {
                $nn =~ s/[\t]/_/g;
                $rr =~ s/[\t]/_/g;
                print "[[[$f]]]\n";
                print "n $nn\n";
                print "r $rr\n";
            }
        }
    }
}
@

<<tab2spc>>=
sub tab2spc {
    my($s) = @_;
    my($p) = index($s, "\t");
    while($p != -1) {
        # $q = (8 - ($p % 8)); print "$q <-- $p\n";
        $sp = ''; for (1 .. (8 - ($p % 8))) {$sp .= " "}
        $s =~ s/\t/$sp/;
        $p = index($s, "\t");
    }
    $s;
}
@ %def tab2spc

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
% We want the name of the section and the hypertarget command on the
% same page, but \printindex issues \clearpage. Thus we do it by hand
% her and redefine \clearpage.
{\let\rhxclearpage\clearpage%
\clearpage%
\renewcommand{\clearpage}{\def\clearpage{\rhxclearpage}}%
\hypertarget{sec:Index}{}%
\printindex
}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\end{document}

[Prev in Thread]

Current Thread

[Next in Thread]

[Axiom-developer] rhxtangle, Ralf Hemmecke <=

Prev by Date: RE: [Axiom-developer] tex to mathml
Next by Date: [Axiom-developer] pamphlet problems
Previous by thread: [Axiom-developer] tex to mathml
Next by thread: [Axiom-developer] pamphlet problems
Index(es):
- Date
- Thread