grammatica-users
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Grammatica-users] Fuzzy tokenizer.


From: Matti Katila
Subject: Re: [Grammatica-users] Fuzzy tokenizer.
Date: Fri, 1 Jul 2005 11:45:30 +0300 (EEST)

On Wed, 29 Jun 2005, Per Cederberg wrote:

> Grammatica currently doesn't support productions that match
> a null input. Hence it cannot handle grammars that match an
> empty input. Normally this isn't such a big deal, but it
> might of course be annoying.

I'm thinking whether it would be useful to return special EOF token when
end is reached. Then a start production could be written as:

S = atoms* EOF;


> It would be great if you create a new version with the
> features you mention.

48 hours later first working version is at hand. Basicly I made big
modifications to RecursiveDescentParser and Tokenizer. Hmm, and
sligthly modified Parser too. One new class was created to have a
workaround for ignore tokens which was a bit troublesome problem.

I still don't give you the URI since the files need some cleaning and
that's why I'm writing currently. So, how to allow grammatica to use
different implementations, e.g., by declaring 'TOKEN_CONTEXT_SENSITIVE =
"true"' in your grammar?

Currently generated parsers extend RecursiveDescentParser. It would be
straighforward to change the generated parser to extend another Parser
implementation. Other option I can see is to write and extend
DelegateParser which would delegate all method invokations to delegated
parser but that's a lot of work and smells like glue.

Second question, what would be the proper package for the new type of
parser? Perhaps it would make sense to make some refactoring. For example
like this:

net.percederberg.grammatica.parser:
   Parser.java
   ParseException.java
   ParserCreationException.java
   ParserLogException.java
   Analyzer.java
   Node.java
   Production.java
   ProductionPattern.java
   ProductionPatternAlternative.java
   ProductionPatternElement.java
   LookAheadSet.java
   Token.java
   TokenPattern.java

   create abstract Tokenizer

net.percederberg.grammatica.parser.ll:
  perhaps create abstract RecursiveDescentParser which only checks
  production rules.

   LookAheadRecursiveDescentParser.java
   LongestFindTokenizer.java

 new context sensitive classes:
   ContextSensitiveRecursiveDescentParser.java
   ExpectTokenizer.java
   TokenStack.java - ignore token problem workaround.

Hmm, actually I don't like this since new comers find it hard to look in
all classes in parser.ll package and that's why a better option from that
point of view might be:

net.percederberg.grammatica.parser.ll:
  create abstract RecursiveDescentParser which only checks
  production rules.

net.percederberg.grammatica.parser.ll.traditional:
   LookAheadRecursiveDescentParser.java
   LongestFindTokenizer.java

net.percederberg.grammatica.parser.ll.sensitive:
 new context sensitive classes:
   ContextSensitiveRecursiveDescentParser.java
   ExpectTokenizer.java
   TokenStack.java - ignore token problem workaround.


> Good luck with the project!

Thanks!

This far it has been a pleasure to read and modify your source code. It's
very well written and follows nice programming style. What kind of
developing tools are you using? And I didn't except that almost everything
is documented in the source =)


   -Matti




reply via email to

[Prev in Thread] Current Thread [Next in Thread]