[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Extracting subexpressions, and performance considerations

From: Simon Richter
Subject: Extracting subexpressions, and performance considerations
Date: Tue, 25 Jun 2019 12:36:59 +0200
User-agent: Mutt/1.5.21 (2010-09-15)


I'm trying to build a logfile parser. The humble beginning is

%option noyywrap
SPACE           \x20
LPAREN          \x28
RPAREN          \x29
COLON           \x3a
LBRACKET        \x5b
RBRACKET        \x5d
PATH            [_A-Za-z0-9:\\.-]+
INTEGER         ([1-9][0-9]*|0)
SEVERITY        (note|warning|error)
MESSAGE         [^\n]*
MSVC_TAG        [A-Z]+[1-9][0-9]*
\n*                     /* ignore */
.                       /* ignore */

This runs very slowly, only a few 100 kB/s on an E5 at 2.1GHz, which seems
related to the negative character class for MESSAGE -- restricting the
character set here speeds things up considerably. Is that a known
restriction, or have I stumbled on a bug here?

I'd also like to split up the line afterwards, but only if it matched as a
whole. The manual seems to suggest using a separate exclusive state and
yyless(0) to reparse, is there a better way to extract subexpressions?


reply via email to

[Prev in Thread] Current Thread [Next in Thread]