help-bison
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Is Bison the right tool for swish-e?


From: Hans Aberg
Subject: Re: Is Bison the right tool for swish-e?
Date: Fri, 4 Jun 2004 12:28:56 +0200

At 12:58 -0700 2004/06/03, Bill Moseley wrote:
>I'm the maintainer of swish-e, a search engine for small collections of
>files.  Currently the query parser is hand-written C and is in bad need
>of a rewrite.  Unfortunately, I don't have a lot of free time, so I
>wanted to ask here if bison would work for us before starting out.

As somebody else replied, it looks as though Flex/Bison would work well for
your task at hand.

>I have no experience with bison or yacc, but I'm wondering if I should
>change that.  I've read a few yacc/bison tutorials, but haven't written
>any code.  It's all still a bit fuzzy.

One easy way to get started is to write say a simple calculator, and then
successively modify it. I'll attach the examples from the Aho-Seti-Ullman
book.

You can get more help here, and in the
  Help-flex mailing list
  address@hidden
  http://mail.gnu.org/mailman/listinfo/help-flex
Make sure to cc all your replies to the respective list.

Also have a look at the newsgroup comp.compilers, and its FAQ, published
there monthly.

>My main concern is one irregularity in our query syntax.  Swish-e indexes
>things by "metaname", which really means an index is made up of a number
>of fields or columns.  So, searches are really limited to a given
>metaname:
>
>    title=bison  # limit searches to the "title" metaname
>    description=(foo OR bar)  # similar (case of the operator is not
>important)

You need to decide how spaces and newlines are significant. For example, is
     title  =
       bison
the same as the thing above?

If # is the comment character, the lexer will zip it out.

>Swish-e has a default metaname where things get indexed if not specified
>like above.  It's called "swishdefault".  So a plain query like:
>
>    foo
>
>is the same as
>
>    swishdefault=foo

In addition to keynames like "title", etc, your Flex lexer (.l file) needs
to return a type NAME. It then needs to _copy_ (important -- a lot of
peopele forget this) the identified string into the datatype handed over to
the Bison parser. Your .y grammar may then look like say

%token TITLE
%token DESCRIPTION
%token ASSIGNMENT "="
%token SWISHDEFAULT
...
%%
...
metaname_index_list:
        {...} /* If the empty list is OK */
    | metaname_index {...}
;
metaname_index:
     TITLE "=" NAME
   | DESCRIPTION "=" expression
   | swishdefault
   ...
;
swishdefault:
     SWISHDEFAULT
   | SWISHDEFAULT "=" NAME
;

>So these two queries are the same:
>
>    title=foo description=(foo OR bar) OR (baz AND woo)
>
>    title=foo AND description=(foo OR bar) OR \
>        (swishdefault=baz AND swishdescription=woo)

These are simple variations of the calculator precedences, using logical
operators instead.

>We also want to be able to use the standard + and - operators:
>
>    foo OR bar NOT baz
>    foo OR bar -baz  # same thing

You need to decide whether -baz and - baz are the same thing. If so, you
merely add grammar rules:
%token MINUS "-"
...
%%
...
not:
    NOT
  | "-"

>We also have phrases by using double-quotes and the boolean operators
>can be turned into search words by placing them in a phrase:
>
>    foo or bar      # search for docs with either word
>    foo "or" bar    # search for docs with all three words
>    "foo or bar"    # find the phrase of all three words

Here you need making the lexer to read a type STRING. The Flex manual has
an example how to recognize C strings. Then the .y grammar will merely have:
%token STRING
%token OR
%token AND
  /* token precedence */
%left OR
%left AND
...
%%
expression:
    word_list
   | word_list OR word_list
   | word_list AND word_list
   | "(" word_list ")"
   ...
;
word_list
     word
   | word_list word
;
word:
     NAME
   | STRING
;

>Does this all seem like something bison can do?  I assume so, but I just
>wanted to run it by experienced users before I start.

So it looks as though it would pass fairly easy. A thing that cannot be
determined except by experimentation, is if you get an ambiguous grammar.

Attachment: LexCalc.l
Description: Text document

Attachment: LexCalc.y
Description: Text document

  Hans Aberg

reply via email to

[Prev in Thread] Current Thread [Next in Thread]