[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: m4sugar and m4 1.6, bison

From: Eric Blake
Subject: Re: m4sugar and m4 1.6, bison
Date: Fri, 11 Jul 2008 18:39:16 -0600
User-agent: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv: Gecko/20080421 Thunderbird/ Mnenhy/

Hash: SHA1

[adding m4-discuss]

According to Ralf Wildenhues on 7/11/2008 5:05 PM:
| I personally think it's a shame to add known-suboptimal interfaces
| like m4_prepend at all, but oh well.  (I think in a better world,
| a programming language would teach users to use good algorithms by
| only providing those interfaces...)

Maybe the compromise would be provide it in m4sugar.m4, but remove mention
of it in NEWS and autoconf.texi, leaving bison to use it as an
undocumented macro.  At any rate, updating bison to newer m4sugar requires
this interface, unless we rewrite bison to not need prepending.

On the other hand, my claim that only m4_append can beat quadratic is only
true in the current m4 implementation of all macro definitions being
stored as a contiguous block of memory.  Add another layer of indirection,
by storing macro definitions as a vector of substrings, and even
m4_prepend can be sped up by choosing a data structure for the vector that
allows insertion at the front of the vector in O(1) time of one more
substring - but the time and memory expense of another layer of
indirection in the general case would probably outweigh the speedup of the
m4_prepend usage.

|> In the future, we may want to move m4sugar (or even autom4te, but
|> probably not m4sh) into the M4 package rather than Autoconf,
| Hmm, I'm not too happy the way that Autoconf will probably have to
| depend upon really new M4 again.  I mean, the M4 interface could
| have been stable for more than a decade ...

It may be quite distant future.  And I imagine it would be more like
gnulib - the bulk of m4sugar is purely textual (the exception being
quadrigraphs, which are only supported by the perl wrapper called
autom4te).  So autoconf could still ship m4sugar, but set it up as yet
another file synced from an upstream location.  At any rate, it's not
anywhere close on my radar to attempt such a move.

|> m4_PACKAGE_* macros remain undocumented (autoconf users should use the
|> documented AC_AUTOCONF_VERSION instead of m4_PACKAGE_VERSION),
| Wishful thinking.  The currently best way to be backward compatible to
| 2.59 and upward compatible is to use m4_PACKAGE_VERSION.  Please don't
| turn your head away from that fact (GCC uses m4_PACKAGE_VERSION).
| AFAICS your patch doesn't break that though; good.

Good points.  Another reason that migrating m4sugar into m4 will take some
time (at least until autoconf 2.59 is no longer widely used in active

|> In the process of adding m4_prepend, I noticed a missed optimization
|> in m4
| Thanks for tackling that!

Don't count your chickens yet!  I haven't coded it; but have the vague
idea that with some form of reference counting, it should be possible to
pass around pointers to the former reference when using the defn builtin,
then use pointer comparison in the define builtin to see if the user is
attempting an append operation instead of a strict redefinition (and
prepends count as a strict redefinition).  M4 is already doing some
reference counting, as of 1.4.5 (so that f(undefine(`f')) works without
dereferencing freed memory), but I would have to hook that into the input
rescanning engine alongside $@ back-references.

|> In the patch below, I
|> used the term 'linearithmic' rather than 'loglinear' for the notion of
O(n log
|> n); I found both terms in wikipedia [2], and while I am personally more
|> familiar with 'loglinear', the wikipedia search for 'n log n' turned up
|> former [3].
| FWIW I too am more familiar with loglinear or "almost linear".

Truth be told, I'm not sure whether the complexity is O(n), O(n log n), or
something else also of a smaller magnitude than O(n^2).  If all you are
doing is appending one byte at a time, the naive (or m4_prepend)
implementation of O(n^2) (O(n) insertions, each with a cost of O(n) to
move the previous n elements into new storage), and with appends, the cost
is provably amortized O(n): with geometric growth, insertion is normally
O(1) [dump at the end], but worst case O(n) [realloc and copy the n
previous elements], with worst case occurring once every n inserts;
leaving an amortized O(n) actions with a cost of O(n)/n for a total cost
of O(n).  But in the case of m4_append, with n being the number of
invocations of the macro, we can insert more than 1 byte at a time, so we
no longer guarantee n iterations between the more expensive O(n)
insertion.  Thus, the amortized insertion cost is no longer O(n)/n, but
O(n)/(fraction of n), and the trick is determining whether that works out
to something like amortized O(log n) or whether it is still a constant
factor (and thus still an amortized O(1)).

|> + (m4_prepend_uniq, m4_prepend_uniq_w): Add new macros, for
|> + completeness.
| :-/
| I would have much rather liked a couple of interfaces that would let M4
| do:
| - add argument to a (comma-separated?) list, in loglinear time,

This should already be possible.  You can always uses m4_append with comma
as the separator (or even m4_dquote the elements to m4_append, then
m4_unquote the entire one-argument list into a sequence of arguments).
And if you already have a list of arguments, just append the next argument.

| - sort (and uniquify) all arguments given, in loglinear time.

VERY difficult to implement in m4 1.4.x.  Lexicographic comparison, and
thus sorting, is not a native m4 operation yet (although there has been
discussion on m4-discuss at adding it for at least m4 2.0).  The only way
I can think of doing it is by writing a loop to compare one-byte
substrings at a time, being careful to use m4_changequote so that one-byte
[ or ] in isolation don't cause grief.  And I don't know whether it would
be more efficient to have a table of 255 bytes to integers, or even a
table of 255^ comparisons of two one-byte arguments (although the latter
gets expensive in space).

| Requiring a new-enough M4 for fast implementations is ok if Autoconf
| otherwise provides slow ones.

Sure - and m4 2.0's faster implementations are still a way off.

| (Aside, if M4 had means to generate a hash from a list of arguments,
| which it has BTW, then an interface similar to m4_append_uniq could
| probably even be made loglinear.)

Ah - you're talking about determining uniqueness within a list of
arguments, rather than in a single string containing substrings separated
by an arbitrary separator.  Yes, if you are building a list of arguments,
determining uniqueness can be done with a foreach loop over each argument
using m4_if to check for equality.  But I don't know if that is more
efficient algorithmically than using m4_append_uniq to build a string
rather than a list of arguments, and it does not work for any separator
other than comma (as converting other separators to comma to create a list
of arguments is itself O(n) on the length of the string being converted).

|> --- a/doc/autoconf.texi
|> +++ b/doc/autoconf.texi
|> @@ -11055,6 +11055,12 @@ to grow strings without duplicating substrings.
|> Additionally,
|>  Also, @code{m4_append_uniq} warns if @var{separator} is not empty, but
|>  occurs within @var{string}, since that can lead to duplicates.
|> +Note that @code{m4_append} can scale @dfn{linearithmically} (ie., O(n
| Please use 'i.e.,', dunno if some texinfo markup is suitable for the
| complexity term.

I'm not quite sure what markup to use, but yes, I was wondering about that.

|> +log n) in complexity notation),
| Actually this statement is pretty vague (which might be on purpose?),
| in that it does not tell what n is.  The number of times m4_append was
| called?  The total length of all strings?

I was writing this in terms of the number of m4_append calls, and not the
length of the final string.

|> -m4_include([m4sugar/version.m4])
|> +# If m4sugar/version.m4 is present, then define version strings.  This
|> +# file is optional, provided by Autoconf but absent in Bison.
|> +m4_sinclude([m4sugar/version.m4])
| Does the Autoconf build fail if the file happens to be absent?

Hmm.  AC_PREREQ fails if version.m4 is absent, and we have tests of
AC_PREREQ.  I'm not sure if the build itself fails, or only the testsuite,
but the testsuite should already catch the inadvertent deletion of the
Makefile-generated version.m4.  Not that I specifically tested this,
though (besides, make is pretty good about regenerating version.m4 as needed).

- --
Don't work too hard, make some time for fun as well!

Eric Blake             address@hidden
Version: GnuPG v1.4.9 (Cygwin)
Comment: Public key at home.comcast.net/~ericblake/eblake.gpg
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org


reply via email to

[Prev in Thread] Current Thread [Next in Thread]