[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: multiline output variables.

From: Dan Manthey
Subject: Re: multiline output variables.
Date: Mon, 24 Jan 2005 17:00:59 -0500

Well, this took me longer to get to than I had hoped, but it came out
pretty well:

I've rewritten my solution from scratch in order to clean it up and
document it.  Find the diff from 2.59b attached.

Some things are worth nothing:

I've made a new macro that is just the max number of sed commands that can
be safely used, and written things in terms of it.

I realized that not only needn't config.status do the job of breaking the
sed program up into fragments, it needn't do the job of escaping the
results either.  Now, at ./configure time, the exact sed program
fragments, fully escaped, are output into verbatim here documents (i.e.
ones whose terminator is quoted) in config.status.

It was suggested that grep -c be used to make sure that no extra
delimiters were found in the sed program.  grep -c counts lines with
matches, not actual matches, so I wrote a wacky sed script to do the job.
Does somebody have a better portable solution to this?

Rather than counting the delimiters to just _notice_ when an output
variable containing the delimiter would foul up the escaping mechanism, I
use it to instead modify the delimiter and redo the whole process.  It's
now guaranteed to always work, regardless of the contents of the

None of the escaping rigamarole is needed for _AC_SUBST_FILES, since
the values of such output variables don't end up inside of sed s///.  I
therefore don't escape them at all.  Note that this means that if an
AC_SUBST_FILE'd variable yielded a filename with a comma or backslash in
it, the sed script now does not have those characters escaped.  Is this a
problem?  Did the old behavior even yield valid sed code in those rare
cases where such a value resulted?

There is one sed program that is applied prior to those generated to deal
with output variables.  These deal with things like @address@hidden  I have left
this entirely unchanged.  Since these things probably should never be able
to have multiline values, I figure this is no loss.

There are two more important issues with this code, which I haven't
addressed in this patch:


If you AC_SUBST_FILE(foo) and AC_SUBST_FILE(bar), then an input file with
a line with "@bar@@foo@" can generate the contents of the two files in
either order, depending on order in which @foo@ and @bar@ are
interpolated.  I think the current behavior is to first output the file
for the variable first AC_SUBST_FILE'ed, which may well be a different
order than that in which the output variables appear in the input file.

This seems like a bug to me.  I tried to figure a way to interpolate the
variables in the order that they appear, but I think this is impossible
with portable sed code unless you're willing to insert some spurious
newlines around the instances of the output variables (clearly not

On the other hand, does anyone actually use AC_SUBST_FILE'd variables in
any way except to put them on a line by themselves?  Note that at least
some seds (perhaps all?) actually insert the file entirely before the line
with the output variable.  So "address@hidden@shop" becomes
"fish\nfile\nbaitshop", not "fish\nbaitfile\nshop".  This seems
sufficiently wacky to me that I expect no one uses it this way.

If indeed everybody uses these on lines by themselves, could we require
that?  This would have the advantage (perhaps small) that the newline
following the output variable could be deleted.  This is the behavior I
would expect if I had an output variable set to /dev/null, and in any
other case, the file will provide its own terminal newline.

Recursive output variable:

If there are less than 48 output variables, they are all recursively
expanded.  That is if shell variable foo is the string "@bar@", the
generated file ultimately holds the value of bar, not "@bar@".  This is
perhaps desirable and perhaps not.  It does of course make it possible to
form loops that cause generation of the output file to never complete.

However, if more output variables are defined, then more than one sed
program is needed to apply all the interpolations.  If the first program
contains the definition of @bar@ above, and the second one has @foo@, now
@foo@ is _not_ recursively interpolated.  Again, it's probaly fine to not
recursively interpolate, but we now have two different behaviors,
depending not on choices about the variables, but on something far more
obscure, and not documented for the autoconf user (The 48-variable limit
is a detail of how _AC_OUTPUT_FILES is implemented.).

It is possible, but irritating to always recursively interpolate: the file
is generated from its inputs by applying all the sed programs.  This
result then has all the sed programs applied again.  If the result of the
second application is the same as the first, interpolation is complete;
otherwise, the second result replaces the first and the programs are
applied again and again until the results change no more.  This scheme
can also allow the contents of files included by AC_SUBST_FILE to have
output variables interpolated.

Precluding recursively interpolation seems more difficult, since this
requires either that sed be used to somehow only process the unprocessed
portion of each line (a moderate pain in the rear made vastly worse by my
just-added support of multiline output variables) or that the values of
output variables be escaped.

Quadrigraph processing might be an obvious means of the latter, but it is
ineffective.  Consider that changing "@foo@" to "@@&address@hidden@&t@@" still
leaves "@foo@" as a substring, and "@f@&address@hidden@&address@hidden@" leaves 
"@f@" and "@o@"
as substrings.  It would seem that a new syntax would be needed, such as
"@foo@" -> "@@=f@@=o@@=o@@".  However, even with an effective escape
mechanisim, those escapes would need to be applied to every @ character in
the output variable values.  Note also that such an escape could not start
or end with @.  Consider "@address@hidden@" with "@foo" -> "address@hidden" and
"@varnonvar@" -> something else.

I don't have a solution to this, other than the current one of ignoring it
until somebody actually has a problem.  At the very least however, it
should be documented that the behavior can be very unpredictible.


Attachment: status.diff
Description: Text document

reply via email to

[Prev in Thread] Current Thread [Next in Thread]