grammatica-users
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Grammatica-users] Fuzzy tokenizer.


From: Per Cederberg
Subject: Re: [Grammatica-users] Fuzzy tokenizer.
Date: Fri, 01 Jul 2005 15:02:30 +0200

On fri, 2005-07-01 at 11:45 +0300, Matti Katila wrote:
> I'm thinking whether it would be useful to return special EOF token when
> end is reached. Then a start production could be written as:
> 
> S = atoms* EOF;

Well, that is an option. Have to watch out for namespace clashes
though, as the user might have already defined a token "EOF". I
have basically dodged this whole issue as one can always work-
around this if it is sufficiently important (after all, not so
many people parse empty input in the first place).

> 48 hours later first working version is at hand. Basicly I made big
> modifications to RecursiveDescentParser and Tokenizer. Hmm, and
> sligthly modified Parser too. One new class was created to have a
> workaround for ignore tokens which was a bit troublesome problem.

Cool!

> I still don't give you the URI since the files need some cleaning and
> that's why I'm writing currently. So, how to allow grammatica to use
> different implementations, e.g., by declaring 'TOKEN_CONTEXT_SENSITIVE =
> "true"' in your grammar?

Actually, this is what the GRAMMARTYPE parameter should be used
for. So you just have to invent a good name for this types of
grammars, like "CONTEXTSENSITIVE" or some cryptical abbreviation:

GRAMMARTYPE = "CS"

> Currently generated parsers extend RecursiveDescentParser. It would be
> straighforward to change the generated parser to extend another Parser
> implementation. Other option I can see is to write and extend
> DelegateParser which would delegate all method invokations to delegated
> parser but that's a lot of work and smells like glue.

Extending RecursiveDescentParser seems reasonable to me (without
having seen the code).

> Second question, what would be the proper package for the new type of
> parser? Perhaps it would make sense to make some refactoring. For example
> like this:
> ...
>    create abstract Tokenizer

I think Tokenizer would be better off as an interface.

> 
> net.percederberg.grammatica.parser.ll:
>   perhaps create abstract RecursiveDescentParser which only checks
>   production rules.
> 
>    LookAheadRecursiveDescentParser.java
>    LongestFindTokenizer.java
> 
>  new context sensitive classes:
>    ContextSensitiveRecursiveDescentParser.java
>    ExpectTokenizer.java
>    TokenStack.java - ignore token problem workaround.

Or just have them all in the same package for now. There
aren't that many classes after all.

> Hmm, actually I don't like this since new comers find it hard to look in
> all classes in parser.ll package and that's why a better option from that
> point of view might be:
> 
> net.percederberg.grammatica.parser.ll:
>   create abstract RecursiveDescentParser which only checks
>   production rules.
> 
> net.percederberg.grammatica.parser.ll.traditional:
>    LookAheadRecursiveDescentParser.java
>    LongestFindTokenizer.java
> 
> net.percederberg.grammatica.parser.ll.sensitive:
>  new context sensitive classes:
>    ContextSensitiveRecursiveDescentParser.java
>    ExpectTokenizer.java
>    TokenStack.java - ignore token problem workaround.

Or we just move up some more shared stuff into the Parser
class. With additional protected methods that subclasses
can call if they are interested.

> This far it has been a pleasure to read and modify your source code. It's
> very well written and follows nice programming style. What kind of
> developing tools are you using? And I didn't except that almost everything
> is documented in the source =)

Thanks! I try my best to follow a consistent code style. That's
why I'm using Eclipse with a style check plugin to keep me
reminded of all the details.

/Per






reply via email to

[Prev in Thread] Current Thread [Next in Thread]