[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: Scanner takes a long time to build.
From: |
Hans Aberg |
Subject: |
Re: Scanner takes a long time to build. |
Date: |
Tue, 30 Jul 2002 18:43:47 +0200 |
At 17:27 +0200 2002/07/30, Akim Demaille wrote:
>>>>>> "Vern" == Vern Paxson <address@hidden> writes:
>Vern> My main current project these days is Bro, a network intrusion
>Vern> detection system. It's written in C++. For its regular
>Vern> expression matching, I took flex's generator and turned it into
>Vern> a set of classes that can be called at run-time.
...
>Vern> At first, I ran it like flex does - build the entire DFA when
>Vern> Bro starts up (and I added an on-disk state cache to speed this
>Vern> up for subsequent runs). But this ran into problems when I
>Vern> wanted very large patterns. So I changed it to build the
>Vern> matcher incrementally (which actually was an easy change at this
>Vern> point). The real surprise was that this was *faster* than
>Vern> running from the precompiled DFA loaded out of the disk cache!
...
>Vern> My best guess is that this is due to better memory locality -
>Vern> and the increase was slight - but way better than suffering
>Vern> degradation, which is what I had expected.
>
>Yes, locality is certainly the main culprit, and disk access too.
>Have a look at valgrind: it is a precious tool to find cache misses
>and so forth. It helps understanding how bad huge tables are.
>
>It also reminds me of some GCC people saying that precompiled headers
>are not always a win, because disk access are so slow, that it is
>commonly faster to read the headers, parse them, take the proper
>actions, rather that loading a file that snapshots the result of the
>whole sequence!
This may be a factor in forwarding more dynamic approaches: For example,
like compiling into smaller tables for a non-deterministic
(finite/pushdown) automaton, and then add code to the parser that
translates it into a DFA as needed.
Incidentally, for Unicode, it may mean that the code is not necessarily
going to be slower, because the code-set is to large for use with
non-compacted tables anyway.
My guess is that this phenomenon has in part to do with the architecture of
todays computers, where the CPU has a large number of registers, running at
a speed several times higher than memory access: Under such circumstances,
it is going to be faster to load small segments of active code into the CPU
than doing several static array memory accesses.
Hans Aberg
Re: Scanner takes a long time to build., Scott A Crosby, 2002/07/27
Re: Scanner takes a long time to build., Vern Paxson, 2002/07/27
Re: Scanner takes a long time to build., Vern Paxson, 2002/07/30