[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: Is Bison the right tool for swish-e?
From: |
Hans Aberg |
Subject: |
Re: Is Bison the right tool for swish-e? |
Date: |
Fri, 4 Jun 2004 12:28:56 +0200 |
At 12:58 -0700 2004/06/03, Bill Moseley wrote:
>I'm the maintainer of swish-e, a search engine for small collections of
>files. Currently the query parser is hand-written C and is in bad need
>of a rewrite. Unfortunately, I don't have a lot of free time, so I
>wanted to ask here if bison would work for us before starting out.
As somebody else replied, it looks as though Flex/Bison would work well for
your task at hand.
>I have no experience with bison or yacc, but I'm wondering if I should
>change that. I've read a few yacc/bison tutorials, but haven't written
>any code. It's all still a bit fuzzy.
One easy way to get started is to write say a simple calculator, and then
successively modify it. I'll attach the examples from the Aho-Seti-Ullman
book.
You can get more help here, and in the
Help-flex mailing list
address@hidden
http://mail.gnu.org/mailman/listinfo/help-flex
Make sure to cc all your replies to the respective list.
Also have a look at the newsgroup comp.compilers, and its FAQ, published
there monthly.
>My main concern is one irregularity in our query syntax. Swish-e indexes
>things by "metaname", which really means an index is made up of a number
>of fields or columns. So, searches are really limited to a given
>metaname:
>
> title=bison # limit searches to the "title" metaname
> description=(foo OR bar) # similar (case of the operator is not
>important)
You need to decide how spaces and newlines are significant. For example, is
title =
bison
the same as the thing above?
If # is the comment character, the lexer will zip it out.
>Swish-e has a default metaname where things get indexed if not specified
>like above. It's called "swishdefault". So a plain query like:
>
> foo
>
>is the same as
>
> swishdefault=foo
In addition to keynames like "title", etc, your Flex lexer (.l file) needs
to return a type NAME. It then needs to _copy_ (important -- a lot of
peopele forget this) the identified string into the datatype handed over to
the Bison parser. Your .y grammar may then look like say
%token TITLE
%token DESCRIPTION
%token ASSIGNMENT "="
%token SWISHDEFAULT
...
%%
...
metaname_index_list:
{...} /* If the empty list is OK */
| metaname_index {...}
;
metaname_index:
TITLE "=" NAME
| DESCRIPTION "=" expression
| swishdefault
...
;
swishdefault:
SWISHDEFAULT
| SWISHDEFAULT "=" NAME
;
>So these two queries are the same:
>
> title=foo description=(foo OR bar) OR (baz AND woo)
>
> title=foo AND description=(foo OR bar) OR \
> (swishdefault=baz AND swishdescription=woo)
These are simple variations of the calculator precedences, using logical
operators instead.
>We also want to be able to use the standard + and - operators:
>
> foo OR bar NOT baz
> foo OR bar -baz # same thing
You need to decide whether -baz and - baz are the same thing. If so, you
merely add grammar rules:
%token MINUS "-"
...
%%
...
not:
NOT
| "-"
>We also have phrases by using double-quotes and the boolean operators
>can be turned into search words by placing them in a phrase:
>
> foo or bar # search for docs with either word
> foo "or" bar # search for docs with all three words
> "foo or bar" # find the phrase of all three words
Here you need making the lexer to read a type STRING. The Flex manual has
an example how to recognize C strings. Then the .y grammar will merely have:
%token STRING
%token OR
%token AND
/* token precedence */
%left OR
%left AND
...
%%
expression:
word_list
| word_list OR word_list
| word_list AND word_list
| "(" word_list ")"
...
;
word_list
word
| word_list word
;
word:
NAME
| STRING
;
>Does this all seem like something bison can do? I assume so, but I just
>wanted to run it by experienced users before I start.
So it looks as though it would pass fairly easy. A thing that cannot be
determined except by experimentation, is if you get an ambiguous grammar.
LexCalc.l
Description: Text document
LexCalc.y
Description: Text document
Hans Aberg