help-bison
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: bison for nlp


From: r0ller
Subject: Re: bison for nlp
Date: Mon, 12 Nov 2018 11:43:48 +0100 (CET)

Hi Akim,

Sorry for the delay, I had to go through my own code to be able to answer your 
question about the tokens:) But to begin with your first observation, you're 
right: I should wrap that conditional ternary op for logging.

After going through the code, I concluded that currently it's only used to make 
sure that a constant (or unknown word) is mapped to the same symbol in each 
language having the token value 1. But it could be solved in a different way 
e.g. that each language can define its own symbol for constants in a newly 
introduced (customizing) db table. As you can guess, currently if the program 
bumps into a token with value 1, then it assumes that it's an unknown 
word/morpheme/constant. It seemed ok 8 years ago, but now it has it 's price to 
turn back the wheels. However, I think I'll do it even though it seems that 
individual numbering does not cause any problem as there are only two numbers 
to avoid conflicts (0 and 256). The other place where it'd be used is the 
symbol prediction (where I need to remap a token to a symbol) in case of an 
error but that method is currently not called at all as it does not yet work 
well and now I just return the bison error message about the expected symbols. 
Mine would have the additional functionality on top that it'd not only tell 
what's syntactically expected but would return a subset of those symbols which 
are semantically expected.

Concerning the c++ bison wrapper, what I mentioned is simply that I read 
somewhere an article in 2010 when I started the project which made that 
statement and I didn't even validate it. But now I'm pretty curious about its 
c++ features so I'll definitely go through the documentation you sent and try 
to turn mine into a c++ parser:)

Best regards,
r0ller

-------- Eredeti levél --------
Feladó: Akim Demaille < address@hidden (Link -> mailto:address@hidden) >
Dátum: 2018 november 9 06:13:29
Tárgy: Re: bison for nlp
Címzett: r0ller < address@hidden (Link -> mailto:address@hidden) >
 
Hi!
> Le 7 nov. 2018 à 10:09, r0ller <address@hidden> a écrit :
>
> Hi Akim,
>
> The file hi_nongen.y is just left there as the last version that I wrote 
> manually:) If you check out any other hi.y files in the platform specific 
> directories (e.g. the one for the online demo is 
> https://github.com/r0ller/alice/blob/master/hi_js/hi.y but you can have a 
> look in hi_android or hi_desktop as well) you’ll see how they look like 
> nowadays.

You have tons of

logger::singleton()==NULL?(void)0:logger::singleton()->log(2,"vm is NULL!");

you could introduce logger::log, or whatever free function,
that does that for you instead of having to deal with that
in every call site.

> Numbering tokens was introduced in the very beginning and has been questioned 
> by myself quite a many times if it's still needed. I didn’t give a hard try 
> to get rid of it mainly due to one reason: I want to have an error handling 
> that tells in case of an error which symbols could be accepted instead of the 
> erroneous one just as bison itself does it but in a structured way (as bison 
> returns that info in an error message string).

Where are these numbers used?

> Though, I could not come up with any better idea when it comes to remapping a 
> token to a symbol. As far as I know bison uses internally the tokens and not 
> the symbols for the terminals and it's not possible to get back a symbol 
> belonging to a certain token. That's it roughly but I'd be glad to get rid of 
> it. However, if it's not possible and poses no problems then I can live with 
> it. By the way, are there any number ranges or specific numbers that are 
> reserved?

Some numbers are reserved, yes: 0 for eof and 256 for error (per POSIX). For 
error, Bison can accommodate if you use 256. EOF must be 0.

> Not using the C++ features of bison has historical reasons: I started writing 
> the project in C and even back then I used yacc which I later replaced with 
> bison. When I started to shift the project to C++ I was glad that it still 
> worked with the generated C parser and since then I never had time to make 
> such an excursion but it'd be great. I also must admit that I wasn't really 
> aware of it. The only thing I read somewhere was that bison has a C++ wrapper 
> but have never taken any steps into that direction.

I don’t know what you mean here: this is bison itself, there’s
no need for a wrapper, and the deterministic parser itself is
genuine C++, not C++ wrapping C. The GLR parser in C++ though _is_
a wrapper for the C GLR parser.

> Now I think I'll find some time for it -at least to check it out:) Could you 
> give me any links pointing to any tutorial or something like that? It’d be 
> very kind if you could help me in taking the first steps, thanks!

I would very like to have your opinion on the open section of the
documentation about C++. It’s recent, and it probably needs polishing.
https://www.gnu.org/software/bison/manual/bison.html#A-Simple-C_002b_002b-Example
 


reply via email to

[Prev in Thread] Current Thread [Next in Thread]