[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
improvement
From: |
Nathan Moore |
Subject: |
improvement |
Date: |
Wed, 31 Aug 2005 21:45:44 -0400 |
User-agent: |
Mozilla Thunderbird 1.0.6 (X11/20050716) |
Hi,
I have recently been using flex (2.5.4) for a C type language. I looked
through the output and noticed that:
"+" |
"-" |
"!" { SINGLE_CHAR_OP_ACTION }
will produce 2 fall through cases in the big switch/case, while
"+"|"-"|"!" { SINGLE_CHAR_OP_ACTION }
will only produce one which is obviously better. It will be smaller
code without any losses in performance,
which can't help put improve performance of the generated lexer b/c it
will be more cache/swap friendly. (right?)
This would not be an issue, except that I don't think it is possible for
a compiler to optimize away the positions in
the jump table for the switch/case b/c the cases coming out of tables
confuse it and it would be very reluctant to
go changing values in the tables anyway.
While making flex recognize that the above examples are the same would
make the best performing lexer have
a more readable specification, it would also make
<X>foo |
bar |
<Y>"cookie monster" { WHATEVER }
come out better as well, and these cannot be transformed into a one
liner (can they?). But either way, it would
be more readable to keep them on separate lines.
These could also improve equivalence class shrinkage (if I understand it
right) b/c it would make it clear to
flex that in the "+"|"-"|"!" example, if those chars were not used
elsewhere, that they could be in an equivalence class.
This improvement would probably also propagate into other consolidating
actions.
(based on the current output, I'm assuming that FLEX is not the best at
determining that certain states are equivalent.)
Another improvement would be if FLEX could tell which rules are
responsible for another rule not being able to
accept any tokens. I really haven't applied my brain to this one that
hard (though I have applied my brain to trying
to figure out which one was blocking it) so I'm not sure how hard it
would be figure this out, but I think it should be
pretty easy.
I ended up having to chop my specification down to just a few entries
that I suspected might cause the problem and
remove/swap them until I found it. Of course these were the hardest to
read regexes in the whole specification, so
debugging them was a bitch and a half.
Another improvement would be to have more than one action possible for
each match.
I can't think up a good syntax for this, but with output like:
switch(matched_rule | flag) /* flag tells which of the multiple action
to do */
{
case 0: DO_SOMETHING; YY_BREAK;
case 0|FLAG: DO_SOMETHING_ELSE; YY_BREAK;
...
}
In the above, the FLAG might be one, so flag would either be 0 or 1, and
matched_rule
(not the right name for it, I know) would be << 1 what it would have
been without this
feature. Or you could use a high bit for it (1<<(ffs(FINAL_STATES)+1)).
This would save user code from having to do extra branches which could
be moved up into
the branch that flex does anyway. It would also allow for the same
action to be taken on
some matched rules no matter what flag was by just generating a fall
through or set an optional
prologue to the main action without that much of a performance hit
(needs goto and label in
action code which the C compiler would probably eat and turn into a fall
through).
Nathan
[Prev in Thread] |
Current Thread |
[Next in Thread] |
- improvement,
Nathan Moore <=