[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Faster token table lookups

From: Hans Aberg
Subject: Re: Faster token table lookups
Date: Fri, 22 Dec 2000 11:21:31 +0100

At 11:42 +0100 0-12-20, Akim Demaille wrote:
>> It would merely be a convenience to those that want to make a
>> lot of name table lookups. I think there is a suggestion one can
>> implement it in the lexer function yylex: instead of returning a
>> macro number directly, it makes a table lookup, and returns that
>> number to Bison.
>Then why not, send a patch :)

I made that patch: The patched file "output.c" is attached. (Instructions
for putting it in, see below.)

I realized that it was unnecessary to sort the names, but merely an array
with the typecodes, and a comparison function using the unsorted lookup
"tags" table expanding to the lookup names. Then I applied a heapsort to
that, with max number of comparisons O(n log n), where n is the table size.
Writing it out then was simple; just a modification of the script Bison
uses to wrote out the unsorted table.

- It also writes out two functions that can be used to find the typecodes:
  int yytypecode(const char* s);
  int yytypecode_cache(const char* s, int* cache);
using fast binary search, at most 1 + log_2 n comparisons.

- These functions will translate to Yacc numbers if those are used; with
%raw, they will produce Bison's internal numbers.

- If the name-prefix is changed, the "yy" of these functions will change
accordingly (by two extra written out macros).

- If the name is not found, both functions return -1.

- One simply uses these functions as replacements for the token macros:
  %token identifier
is declared, one can in the lexer write
  return yytokencode("identifier");
And if
  %token "=>"
one can write
  return yytokencode("\"=>\"");
including the extra as the name on the table us exactly "=>" and not =>.
A character code '+' would be reached with
  return yytokencode("\'+\'");

The yytypecode_cache function can be used to cache the lookup number so it
only lookup the number once: If the "cache" argument is NULL or have a
dereferenced value < -1, a lookup is made just as in the case of
yytokencode(). If is != NULL and the dereferenced value < -1, the lookup
value will be written into it. And if it is != NULL and the dereferenced
value >= -1, it simply returns that value.

So one simply uses it as follows:
  int identifier_ = -2; /* Define a cache */
  return yytypecode_cache("identifier", &identifier_);

Then, the first time yytypecode_cache is called, it will make a table
lookup, but on subsequent calls, it will merely use the lookup number found

- In order to implement the patch, one should add between output_stos() and
output_rule_data() the following
  #define SWAP(x, y) ...
  static void adjust(int tc[], int m, register int n) ...
  static void output_name_typecode_table(void) ...
In output_rule_data(), one should add the
call right after
    /* add a NULL entry to list of tokens */
  fprintf (ftable, ", NULL\n};\n");
And in output_headers() one should add the lines
  fprintf (ftable, "#define yytypecode %stypecode\n", spec_name_prefix);
  fprintf (ftable, "#define yytypecode_cache %stypecode_cache\n",
in the "if (spec_name_prefix)" conditional.

Attachment: output.c
Description: Text document

  Hans Aberg

reply via email to

[Prev in Thread] Current Thread [Next in Thread]