help-bison
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Regular Expression String Search With Bison


From: Ricardo Grant
Subject: Regular Expression String Search With Bison
Date: Tue, 12 Mar 2019 02:07:26 -0400
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:60.0) Gecko/20100101 Thunderbird/60.5.1

Hello,

I am struggling to understand bison, and parser in general, so I hope I can get some help. To understand better I decided to try to create a program the understands simple regular expressions, and is able to show correct sub string matches. Here is the grammar for the language I would like to make:

<regex>   ::= <term> '|' <regex>
          | term
<term>    ::= <term> <factor>
          | <factor>
<factor>  ::= <base> '*'
          | <base>
<base>    ::= '(' regex ')'
          | char

I have attempted to make smaller code to explain my issue, since using words alone is a bit difficult. Some of it may be nonsensical:

%{
  #include <stdio.h>
  #include <string.h>
  #include <stdbool.h>

  #define BUF_MAX 31
int yylex (void);
  void yyerror (char const *);

  char regxp[BUF_MAX];
  char text[BUF_MAX];
  char input[BUF_MAX]

  int i = 0;
struct regxp
  {
    bool star;
    char str[BUF_MAX];
  };
%}

%parse-param {char* regex}
%define api.value.type union
%token <char> CHAR
%type  <struct regxp> regex term factor base

%%
  input:
    input line
  | %empty
  ;

  line:
    regex '\n' { printf ($1.str); }
  | error '\n' { yyerrork; }
  | '\n'
  ;

  regex:
    regex term { strncpy ($1.str, $2.str); $$ = $1; }
  | term { $$ = $1; }
  ;

  term:
    term factor { strncpy ($1.str, $2.str); $$ = $1; }
  | factor { $$ = $1; }
  ;

  factor:
    base '*' { $$ = $1; $$.star = true; }
  | base { $$ = $1; sscanf (input, "%c", &$1.str) }
  ;

  base:
    '(' regex ')' { $$ = $1; }
  | CHAR { input[++i] = $1; }
%%

int
main (int argc, char const **argv)
{
  if (argc < 3)
    return 1;
  else
    strncpy (argv[2], regxp[], BUF_MAX);
  return yypase (argv[1]);
}

What exactly happens for the semantic variables, especially $$? My smallest element is a character, but I have to idea how to go from a character to a base token in code. So I think there will have to be some way of passing information around to the string in the regexp type.

In my main section, I just want to get a regular expression and a string to test against, like ./prog ab* ababb.


reply via email to

[Prev in Thread] Current Thread [Next in Thread]