help-bison
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Flex and bison howto for C++ users


From: oleg smolsky
Subject: Flex and bison howto for C++ users
Date: Tue, 18 Feb 2003 09:57:13 +1300

Hello all,

a while ago after googling and bugging people on this list I figured out how 
to get bison and a C++ compiler talk to each other. It turns out that there 
is no documentation that covers this process, and it is asked repeatedly on 
this list. So, I have written a howto on this.

Please find attached howto.html that describes the following:
-- creating a small grammar
-- creating a simple scanner
-- declaring a common class with attributes that are c++ objects
-- getting bison to use the new lalr c++ skeleton
-- getting flex to use objects allocated by bison
-- compiling all this stuff with cygwin under windows

All healthy constructive criticism is welcome :)

Best regards,

Oleg Smolsky
Software Design Authority
Allied Telesyn Research




NOTICE: This message contains privileged and confidential
information intended only for the use of the addressee
named above. If you are not the intended recipient of
this message you are hereby notified that you must not
disseminate, copy or take any action in reliance on it.
If you have received this message in error please
notify Allied Telesyn Research Ltd immediately.
Any views expressed in this message are those of the
individual sender, except where the sender has the
authority to issue and specifically states them to
be the views of Allied Telesyn Research.

Flex and bison howto for C++ users, v1.0

This howto accumulates all tweaks, fixes and hacks required to implement a bison/flex command parser using C++. It assumes some basic grammar, parsing and regular _expression_ knowledge.

Note, the approach described here concentrates on building a parser winthin windows environmet, however all ideas and tools are still applicable to unix/g++. One just need to remove a few lines of code, such as #include <windows.h> :)

Written by Oleg Smolsky <address@hidden>, February 2003

The task

Lets imagine that we are building a debugger similar to NuMega SoftICE. It contains multiple windows and has a command driven, textual interface. The most basic of commands might be load <filename> and display <address>. So, lets imlement a system in C++, so that all scanning and parsing is done by using flex and bison respectevely.

Requred software

cygwin
install the latest version of cygwin and make sure that PATH environment variable contains cygwin's bin directpory. E.g. c:\something\cygwin\bin. This way it is possible to call bash, flex and bison directly from a command prompts.
bison
make sure that you install bison version 1.875 during the cygwin install.
flex
install default flex that comes with cygwin. My version is 2.5.4
C++ compiler
you can use g++ or Microsoft VC++ (cl) with projects or makefiles

Core files' definition

Once all required tools are setup, we can discuss core files required to solve our task:
parser.y
the specification for the parser. Process it with bison:
bash -c "bison -d -S lalr1.cc -o parser.cpp parser.y"
The way it works is this: you call bash with a command from your makefile or VC++ project. Once bash is operational, it can execute the given unix command that within the cygwin environment. This is important for tools such as bison, because it needs to access it's skeleton via a unix path: \usr\share\bison\lalr1.cc
scanner.l
the specification of the scanner. Process it with flex:
bash -c "flex -oscanner.cpp scanner.l"

scanner.l

Lets consider the content of scanner.l The first section specifies a block of code to go to almost the very top of the .cpp file. It includes some standard c++ headers and declears a few macros. The following section defines a couple of terminals: T_DISPLAY and T_LOAD as well as a few non-terminals: NT_DECNUMBER, NT_HEXNUMBER and NT_STRING.
The tricky part here is the fact that yylex() will be called with a pointer to an instance of a user defined class decleared in parser.y. This way we can store data in multiple formats within this object.

%{

#include <iostream>
#include <string>
#include "parser.hpp"

extern void yyerror(const char* s);

#define YY_DECL int yylex(yystype *p)

#define ASSERT(condition)	if (!(condition)) _asm int 3;
%}

%%
    

"display" {
	return T_DISPLAY;
}

"load" {
	return T_LOAD;
}

"\r\n" {
	return T_CRLF;
}


[0-9]+ {
	p->m_sVersion = yytext;
    
	char	*pcLast = NULL;
	p->m_dwVersion = strtoul(yytext, &pcLast, 16);

	ASSERT(pcLast != NULL && *pcLast == 0);
	
	return NT_DECNUMBER;
}

[0-9a-fA-F]+ {
	p->m_sVersion = yytext;
                        
	char	*pcLast = NULL;
	p->m_dwVersion = strtoul(yytext, &pcLast, 16);

	ASSERT(pcLast != NULL && *pcLast == 0);

	return NT_HEXNUMBER; 
}

address@hidden&*()_+[\]{}?/.>,<'";:\\|]+ {
	p->m_sVersion = yytext;
	return NT_STRING;
}

%%

parser.y

The first section of the file includes appropriate C++ headers and defines the main fundamental class, that will be used for scanning and parsing. A pointer to an instance of this class will be passed into the scanning routine, so the members can be filled in appropriately to the token type.

The second section defines the grammer required to parse our sophisticated commands and handlers that call appropriate engine routines.

%{
#include <cstdlib>
#include <string>
#include <vector>
#include <deque>

#include <stdarg.h>

#define WIN32_LEAN_AND_MEAN
#include <windows.h>

typedef unsigned long			dword;
typedef unsigned short			word;
typedef unsigned char			byte;

#include "EmEngine.h"

class part
{
public:
    std::string     m_sVersion;
    dword           m_dwVersion;
	
    int             last_line, last_column;
};

#define YYSTYPE     part
typedef part        yystype;
typedef part        yyltype;

typedef char        yysigned_char;
 
int yylex(yystype *p);

%}

%token          T_DISPLAY T_LOAD T_CRLF
                NT_DECNUMBER NT_HEXNUMBER NT_STRING

%start  command_list

%%
 
command_list:   
	/* empty */ 
	| command_list display_command
	| command_list load_command 
	| command_list error
	{
		// explicit handler is not required -- Parser::error_() is called automatically
	};


d_command: T_DISPLAY address T_CRLF
	{
		g_engine.Display($2.m_dwVersion);
	}
	;
	
load_command: T_LOAD filename T_CRLF
	{
		g_engine.Load($2.m_sVersion);
	}
	;
	
address:	NT_HEXNUMBER | NT_DECNUMBER;
filename:	NT_DECNUMBER | NT_HEXNUMBER | NT_STRING;

%%

Source files

Once you have managed to setup the builds, above commands would produce the following files: parser.cpp parser.hpp location.hh stack.hh and scanner.cpp These files need to be compiled as part of your project. If you are using a VC++ project or makefile, make sure that the compiler option called "precompiled headers" is switched off for these files.

Additional code

We now have a working scanner and parser, but there are still several questions to be answered:

Here is a block of code that declares a paser instance, and implements a simple error handler. error_() is called when a given string is not part of the specified language.

#include "precompiledpp.h"
#include "EmEngine.h"
#include "EmMonitor.h"
#include "parser.hpp"

yy::Parser	parser(true);

namespace yy
{
    void Parser::error_()
    {
        g_monitor.AddToOutput("Unrecognised command.");
    }

    void Parser::print_()
    {
    }
}

int isatty(int i)
{
        return 0;
}

Now, this block defines a funtion that feeds a new command to the scanner. Imageine, that this routine is executed immediately after the user has typed a command.

void ParseMessage(std::stirng sCommand)
{
	static yy_buffer_state  *pBuffer;

	pBuffer = yy_scan_bytes(sCommand.c_str(), sCommand.size());
	pBuffer->yy_at_bol = 1;
	yy_switch_to_buffer(pBuffer);

	parser.parse();

	yy_delete_buffer(pBuffer);
} 

Odds and ends

There is another block of code that one would need in order to compile the scanner. The hack described above depends on the structure called yy_buffer_state that is copied from the flex generated code. Note, it might vary from version to version.

Put this block of code into scanner.h and #include it:

typedef unsigned int yy_size_t;

struct yy_buffer_state
{
    FILE *yy_input_file;

    char *yy_ch_buf;        /* input buffer */
    char *yy_buf_pos;       /* current position in input buffer */

    yy_size_t yy_buf_size;

    int yy_n_chars;
    int yy_is_our_buffer;
    int yy_is_interactive;
    int yy_at_bol;
    int yy_fill_buffer;
    int yy_buffer_status;
};

yy_buffer_state  *yy_scan_bytes(const char *bytes, int len);

void             yy_switch_to_buffer(yy_buffer_state *new_buffer);
void             yy_delete_buffer(yy_buffer_state *buffer);

reply via email to

[Prev in Thread] Current Thread [Next in Thread]