[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: next_char inlining (was: Re: M4 1.4.10b [beta] released)
From: |
Bruno Haible |
Subject: |
Re: next_char inlining (was: Re: M4 1.4.10b [beta] released) |
Date: |
Fri, 29 Feb 2008 08:33:27 +0100 |
User-agent: |
KMail/1.5.4 |
Eric Blake wrote:
> First off, you don't appear to have copyright assignment on file for M4
> yet, and this is not a trivial patch. Care to follow through with that?
Yes, I'll do this ASAP.
> Next, I'd like to finish merging the argv_ref into both branch-1_4 and
> master before applying your patch, while it's still fresh on my mind.
Sure, algorithmic changes should go in first.
> | + if (curr_quote.len1 == 1 && curr_quote.len2 == 1 && !input_change
> | + && isp)
>
> Not quite right. The condition here needs to be that quote_age is
> non-zero (certain 1-byte quote combinations don't optimize well according
> to the semantics required by M4, such as when they overlap with comment
> delimiters; the calculation of quote_age already took this into account).
I don't understand. In the current code, quote_age is taken is only taken
into account by the INPUT_CHAIN part of next_char and next_char_1. If a test
of quote_change is missing here in next_token, it is also missing in next_char
and next_char_1. - Or do you mean that a new boolean variable
'singlechar_quotes' could be introduced, which shall always be equivalent to
curr_quote.len1 == 1 && curr_quote.len2 == 1, and which shall be updated by
set_quotes?
> ~ I'm also considering adding a placeholder input block that always results
> in CHAR_EOF, so that isp is guaranteed to be non-null and we have one less
> branch in this loop.
Tried this already (see attached patch). It does not provide a noticeable
speedup: 41.0 sec instead of 41.1, that's less than 0.5%. You get more
speedup (about 1%) by use of __builtin_expect (also attached).
> Basically, rather than calling next_char() all the time, I envision:
>
> /* Return a pointer into the buffer available from the current input
> block, and set *LEN to the length of the result. If the next character to
> be parsed cannot be represented as an unsigned char (such as CHAR_EOF), or
> if the input block does not have read-ahead data buffered at the moment,
> return NULL, and the caller must fall back to using next_char(). */
> char *curr_buf (size_t *len);
>
> /* Discard LEN bytes from the current input block. LEN must be less than
> or equal to the previous length returned by a successful call to
> curr_buf(). */
> void consume (size_t len);
Very good. The entire idea of my patch was to
1. avoid a function call for every character,
2. treat consecutive characters which are all alike regarding syntax in a
single operation.
Inlining next_char and next_char_1 was a way to do it; these new primitives
are another way to do it, and a better one - since they don't require as much
code duplication.
Now I see where freadptr() and freadseek() come in handy...
> | + else if (chain->type == CHAIN_STR && chain->u.u_s.len > 0)
> | + {
> | + unsigned char curr_quote_1 =
> | + to_uchar (curr_quote.str1[0]);
>
> Unnecessary cast. char is assignable to unsigned char without issues.
Yes, but I prefer casts to be explicit rather than implicit.
> It seems a shame that there isn't really a function optimized for
> searching for the first of two characters among a fixed-size memory block.
Yes. We have strcspn but it is not well optimized for the case of only 1 or 2
delimiter chars.
> /* Return the address of the first character of either C1 or C2, treated
> as unsigned int, that occurs within the first N bytes of S; else return
> NULL if neither character occurs. */
> void *memchr2 (void const *s, int c1, int c2, size_t n);
Yes, I see where this function would come in handy.
Bruno
patch2
Description: Text Data
patch4
Description: Text Data
- M4 1.4.10b [beta] released, Eric Blake, 2008/02/25
- Message not available
- Re: M4 1.4.10b [beta] released, Eric Blake, 2008/02/26
- Re: M4 1.4.10b [beta] released, Bruno Haible, 2008/02/26
- Re: M4 1.4.10b [beta] released, Bruno Haible, 2008/02/28
- Re: M4 1.4.10b [beta] released, Eric Blake, 2008/02/28
- Re: next_char inlining (was: Re: M4 1.4.10b [beta] released),
Bruno Haible <=
- Re: next_char inlining, Eric Blake, 2008/02/29
- Re: next_char inlining, Eric Blake, 2008/02/29
- Re: next_char inlining (was: Re: M4 1.4.10b [beta] released), Bruno Haible, 2008/02/29