[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: bug#10953: Potential logical bug in readtokens.c
From: |
Paul Eggert |
Subject: |
Re: bug#10953: Potential logical bug in readtokens.c |
Date: |
Tue, 06 Mar 2012 21:33:02 -0800 |
User-agent: |
Mozilla/5.0 (X11; Linux i686; rv:10.0.2) Gecko/20120216 Thunderbird/10.0.2 |
On 03/06/2012 03:32 PM, Eric Blake wrote:
> Why not just strchr instead of building up an isdelim bitmap?
strchr would not be right, since '\0' is valid in data and
as a delimiter.
No doubt you meant 'memchr'; but using 'memchr' would slow
down readtoken by about a factor of two. I got this result by
timing the following benchmark on gcc-4.6.1.tar (uncompressed)
on Fedora 15 x86-64 with GCC 4.6.2:
#include <stdio.h>
#include <readtokens.h>
struct tokenbuffer t;
int main (void)
{
for (;;)
{
size_t s = readtoken (stdin, " \t\n", 3, &t);
if (s == (size_t) -1)
return 0;
}
}
On this benchmark, the relative speeds (user+sys CPU time ratios,
bigger numbers are better) are:
0.54 readtoken with memchr
1.00 current readtoken (with non-thread-safe byte array)
1.13 proposed readtoken (with thread-safe bitset)
So the proposed patch is a performance win even in non-thread-safe use.
> And why
> are we calling getc() one character at a time, instead of using tricks
> like freadahead() to operate on a larger buffer?
>
> Also, is readtoken() intended to be a more powerful interface than
> strtok, in which case we _do_ want to be non-threadsafe, and to have a
> readtoken_r interface that is the underlying threadsafe variant that can
> benefit from caching?
I haven't thought about these issues, but surely they are
independent of the proposed patch.