m4-discuss
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: next_char inlining


From: Eric Blake
Subject: Re: next_char inlining
Date: Fri, 29 Feb 2008 06:26:36 -0700
User-agent: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.8.1.12) Gecko/20080213 Thunderbird/2.0.0.12 Mnenhy/0.7.5.666

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

According to Bruno Haible on 2/29/2008 12:33 AM:
|> | +         if (curr_quote.len1 == 1 && curr_quote.len2 == 1 && !input_change
|> | +             && isp)
|>
|> Not quite right.  The condition here needs to be that quote_age is
|> non-zero (certain 1-byte quote combinations don't optimize well according
|> to the semantics required by M4, such as when they overlap with comment
|> delimiters; the calculation of quote_age already took this into account).
|
| I don't understand. In the current code, quote_age is taken is only taken
| into account by the INPUT_CHAIN part of next_char and next_char_1. If a test
| of quote_change is missing here in next_token, it is also missing in
next_char
| and next_char_1. - Or do you mean that a new boolean variable
| 'singlechar_quotes' could be introduced, which shall always be equivalent to
| curr_quote.len1 == 1 && curr_quote.len2 == 1, and which shall be updated by
| set_quotes?

The master branch already has such a variable - is_single_quotes.
However, I was more worried about when the user does changequote(",") -
there, the algorithm must prefer " as end quote rather than " as start of
a nested quote, and I'm not sure your code did that.  I'm also worried
about if the user does changequote(`,', ") or some other horrendous
action, where "," is now a quote start rather than argument separator.  So
yes, either quote_age should be sufficient (it is only non-zero when both
quote delimiters are 1 character; but even when both delimiters are length
1, quote_age might be zero, although those are the corner cases that won't
occur in normal user's code, so they aren't worth optimizing anyway), or a
new variable which is a superset of quote_age but still applicable to the
optimization is appropriate.

On the other hand, with an interface to read a buffer, and its pair to
consume an arbitrary amount of that buffer, from the input block
primitives, it is easier to match multi-byte delimiters in place - you
still search for the leading byte of any given delimiter using the
optimized buffer read/buffer consume functionality, then fall back to
next_char and MATCH only at candidate positions (rather than calling MATCH
at every position).

|
|> ~ I'm also considering adding a placeholder input block that always results
|> in CHAR_EOF, so that isp is guaranteed to be non-null and we have one less
|> branch in this loop.
|
| Tried this already (see attached patch). It does not provide a noticeable
| speedup: 41.0 sec instead of 41.1, that's less than 0.5%. You get more
| speedup (about 1%) by use of __builtin_expect (also attached).

__builtin_expect is gcc specific, while a dummy isp block is not.  I'll
probably make the change to a dummy block anyway.  But adding
__builtin_expect into m4 source code may have other benefits; I'll
consider it.

|> It seems a shame that there isn't really a function optimized for
|> searching for the first of two characters among a fixed-size memory block.
|
| Yes. We have strcspn but it is not well optimized for the case of only 1
or 2
| delimiter chars.

Also, strcspn won't allow you to search for NUL, and requires that the
haystack is NUL-terminated rather than fixed length memory.

|
|> /* Return the address of the first character of either C1 or C2, treated
|> as unsigned int, that occurs within the first N bytes of S; else return
|> NULL if neither character occurs.  */
|> void *memchr2 (void const *s, int c1, int c2, size_t n);
|
| Yes, I see where this function would come in handy.

OK, I'll tackle the gnulib module for this soon (I'm also in the middle of
working on improving the gnulib strtod module for C99 compliance).

- --
Don't work too hard, make some time for fun as well!

Eric Blake             address@hidden
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.5 (Cygwin)
Comment: Public key at home.comcast.net/~ericblake/eblake.gpg
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iD8DBQFHyAgM84KuGfSFAYARAhJ0AJ48FJQ425zXG0kS06yln2Ru6+e+8ACgv0Tu
qnonNhDjEzeeJ0sq7ikKo0g=
=sRp8
-----END PGP SIGNATURE-----




reply via email to

[Prev in Thread] Current Thread [Next in Thread]