[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Lexingx rules for the single quote character
From: |
Lipping Joonas |
Subject: |
Lexingx rules for the single quote character |
Date: |
Tue, 4 Mar 2014 08:56:10 +0000 |
I've been investigating the second issue listed at
http://wiki.octave.org/Projects#Interpreter , which states that "if (expr)
'this is a string' end" should be tokenized as IF expr STRING END. Currently,
the first single quote character is being translated to the token HERMITIAN,
that is, the "if" condition is not expr itself but its hermitian. After that,
"this", "is", "a" and "string" are understood as variable names, and the single
quote after that is taken to begin a singly quoted string. The dual nature of
the single quote leads to some ambiguities, like:
if (A - B) ' != C D' end
Are we comparing the hermitian of A - B to C and potentially setting the answer
to equal the hermitian of D, or are we checking the truth value of A - B and
returning a string? You don't run into this in Matlab so much, because it does
not allow spaces preceding the hermitian postfix: if there is a space before
the single quote, it is either a string terminator or bad syntax. But in
Octave, the hermitian postfix tolerates whitespace. The line of code above also
illustrates that it is problematic to try to solve this by tokenizing "if
(expr) ' " (with a space) to IF expr SQ_STRING_START and "if (expr)' " (no
space) to IF expr HERMITIAN, as that would impose non-obvious rules on what the
"if" expression is allowed to look like. For instance,
if ((A + B) ' != B) C end
should intuitively be the same when the "readability" parens around the if
expression are removed,
if (A + B) ' != B C end
yet the new rule would make the latter statement erroneous. Some advantage
might be gained by looking forward and checking whether the single quote
characters add up later, but that is only a heuristic improvement. Making the
restriction apply to ALL parenthetic expressions could work a bit better,
though it has the downside of non-optional parens (and there is probably at
least one person somewhere whose code it would break). In this case, it might
also be necessary to apply the same restriction to hermitians, so that in
A'
A''
A'''''
A '''''''
(A)'
(A)'''''
all single quotes are hermitians but in
A' '
A ''''' '
(A) '
the rightmost single quote is a string terminator. Otherwise,
if (A + B)' 'hello' end
will confuse the lexer.
What do we want to do here? I have some code here which causes all single
quotes that come after ")" or HERMITIAN tokens with spaces in between to be
treated as string terminators, and preliminary inspection indicates that it
doesn't cause any new test failures, but I need to poke around a bit more to be
sure. I could write some tests to accompany it and submit a patch.
[Prev in Thread] |
Current Thread |
[Next in Thread] |
- Lexingx rules for the single quote character,
Lipping Joonas <=