coreutils
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Another patch, for discussion tho


From: Bruce Korb
Subject: Re: Another patch, for discussion tho
Date: Sat, 21 Apr 2012 10:14:31 -0700
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:11.0) Gecko/20120328 Thunderbird/11.0.1


So after futzing with timing a bit, I figured out the following:

1. These pre-computed tables _can_ out perform "strpbrk".  But
   only if the skipped over character count is approximately in
   the range of a dozen or two.  After that, single instruction
   testing and hand crafted assembly code beat it out. (These
   generated tables require a load, mask and test instead of
   just a load and test.)

2. The setup for a single character strpbrk break-on string is
   *MUCH* larger than the setup cost for a two-or-more character
   string.  Likely, someone is trying to optimize the setup and
   the setup is efficient enough that this optimization pessimizes.

3. It was never about efficiency of execution anyway.  It is quite
   unlikely that time-critical code is going to be scanning over
   strings anyway.  If they must, then use strpbrk/strcspn.
   Maybe for really critical scanning code, variants of those
   could split the interface into setup_strpbrk and run_strpbrk.

   I suppose, in retrospect, I could do the same thing and
   achieve the same efficiency.  "SETUP_whatever_SCAN()"
   populates an array of bytes that merely need to be tested
   for "true" and "false" instead of masking.  Entirely doable,
   but not today.

This whole thing _is_ about efficiency -- but efficiency of
expression, and also flexibility.  (Change the characters
in a classification and the main code now accepts the new
character set without alteration.  E.g. add '$' to the set
of name characters for "C" and now you are VMS compatible.)

So where would the right place be for a beast like this?



reply via email to

[Prev in Thread] Current Thread [Next in Thread]