bug-grep
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

bug#22357: Algorithm switching and -F...


From: sur-behoffski
Subject: bug#22357: Algorithm switching and -F...
Date: Tue, 13 Dec 2016 13:58:30 +1030
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:45.0) Gecko/20100101 Thunderbird/45.5.1

Strictly speaking, -F describes how the program should interpret the
input pattern(s):  Basically, it turns off most, if not all, special
characters (e.g. '^', '$', '+', '?', '.' etc.) in the pattern.

The word-boundary flag (-w) is, to some extent, adding back in some
of the non-trivial cases that previously could be ignored when -F
was used.

Second-guessing the algorithm that's selected underneath is fraught
with hazards.  There's already all sorts of heuristics in place.

For example, the simple Tuned-Boyer-Moore search in kwset.c was
modified a few versions ago by Norihiro, to monitor the progress
being made by the T-B-M search, and, if progress was poor, the loop
tried memchr and/or memchr2 to see if it could skip bytes more
quickly.  If a candidate was found, the T-B-M search resumed, but
always with the heuristic looking over its shoulder.

There's also a heuristic at the next higher level, in dfa.c:  It
scans the pattern for "must-have" strings ("musts"), and then
simply chooses the longest such string.  The longest string might
have a speed advantage in the T-B-M skip search (more bytes
skipped per test), but, for some pathological data cases, the skip
size might always be small, whereas choosing one of the shorter
strings might lead to larger skips per test, and so run faster.

cheers,

sur-behoffski (Brenton Hoff)
Programmer, Grouse Software





reply via email to

[Prev in Thread] Current Thread [Next in Thread]