Re: AW: [Grammatica-users] Tokenizer problem

grammatica-users

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: AW: [Grammatica-users] Tokenizer problem

From:	Per Cederberg
Subject:	Re: AW: [Grammatica-users] Tokenizer problem
Date:	Fri, 01 Jul 2005 17:43:27 +0200

Some types of validations are better performed outside of the
grammar, and I think this is one of those cases. So I'd solve
this with the following grammar:

  %tokens%
  SPACE   = " "
  STRING  = <<[A-Z]+>>

  %productions%
  Input = StationId " " Phenomens ;
  StationId = STRING ;
  Phenomens = STRING ;

Then I'd perform the various semantical validations in the
Analyzer:

  public class MyAnalyser extends WhateverAnalyzer {
      protected Node exitStationId(Production node)
          throws ParseException {

          String value = ((Token) node.getChildAt(0)).getImage();
          if (value.length() != 4) {
              throw new ParseException(ParseException.ANALYSIS_ERROR,
                                       "station id must be 4 chars",
                                       node.getStartLine(),
                                       node.getStartColumn());
          }
          node.addValue(value);
      }
      ...
  }

It just makes more sense to move this type of domain
knowledge out of the grammar, as you can then add new
station id:s without having to change the grammar itself.

If all you wish to parse are simple strings similar to these,
one might also consider just using a regular expression (as
Grammatica is really more suitable for more complex grammars):

  [A-Z]{4} [A-Z]{2}([A-Z]{2}([A-Z]{2})?)?

Cheers,

/Per

On fri, 2005-07-01 at 16:04 +0200, HECKHAUSEN Ralf wrote:
> Well, I now understand how the problem is caused. Please give me a hint how
> to solve the following:
>  
> %header%
> GRAMMARTYPE = "LL"
> %tokens%
> STATION_ID = <<[A-Z]{4}>>
> SPACE = " "
> DZ = "DZ"
> RA = "RA"
> SN = "SN"
> %productions%
> INPUT = STATION_ID SPACE PHENOMEN [PHENOMEN [PHENOMEN]];
> PHENOMEN = DZ | RA | SN; // real list has 22 items
>  
> ABCD DZRASN is not parsed correctly, because DZRA is returned as STATION_ID
> token. 
> Defining
> "PHENOMEN = DZ | RA | SN | STATION_ID;"
> is not a solution in this case, as it would allow invalid input.
>  
> Defining STATION_ID as LETTER LETTER LETTER LETTER would fail on stations
> containig on of the phenomens.
>  
> Cheers.
> Ralf

[Prev in Thread]

Current Thread

[Next in Thread]

AW: [Grammatica-users] Tokenizer problem, HECKHAUSEN Ralf, 2005/07/01
- AW: [Grammatica-users] Tokenizer problem, HECKHAUSEN Ralf, 2005/07/01
  - Re: AW: [Grammatica-users] Tokenizer problem, Per Cederberg <=

Prev by Date: AW: [Grammatica-users] Tokenizer problem
Next by Date: Re: [Grammatica-users] Fuzzy tokenizer.
Previous by thread: AW: [Grammatica-users] Tokenizer problem
Next by thread: AW: [Grammatica-users] Fuzzy tokenizer.
Index(es):
- Date
- Thread