[Grammatica-users] Want to Use the PArser as non-deterministic for Natur

Hi There

I am working in natural language processing (personal) project, to play around with syntactic, semantic and morphologic processing.

I’ve assembled an “interface” grammar using also the clever “calculator example” and extended it to deal with floating point, number lists, phone numbers and making embedded math inside normal speech.

Next step is to understand English/spelled words and compile them as numbers to allow spoken math interpretation.

For this I need a variant tokenizer, (I think)

I want also to parse several “part-of-speech” segments for NL in order to get a correct grammar testing using EBNF and C# under .NET framework.

There are lots of mutual excluding parts when defining the different “tokens” as words, and the dictionary is not able nor practical to be loaded as EBNF, also the natural grammar is heavily context or inter-token dependant, having no unique results for a word. Instead of this it may yield a set of possible types, which pick is dependent on other “connected” words and context.

To allow this (I guess) I must make the tokenizer somewhat context-dependent and tokenize several alternate ways using a recursive pattern scanning, allowing it to explore the combinations or word-functions that best fits a production.

I think this can be done adding a structure-layer on top of the Token / Tokenizer classes, producing a callback or event to allow external classes and methods to operate and get the context data for this token, and finally there must be a trial-error or scoring to select the most appropriate token which fulfills the production(s).

I have already successfully coded several classes class which checks the functions of a word as a set of types, using affix-reduction, dictionary seek and intelligent de-stemming.

Any suggestion or clue?

Thanks anyway for providing a good (and free) starting point for parsing jobs.

Andrés Hohendahl

From:	Andres Hohendahl
Subject:	[Grammatica-users] Want to Use the PArser as non-deterministic for Natural Language Processing
Date:	Fri, 8 Jul 2005 12:05:32 -0300