[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: lexer.ll: Warn about non-UTF-8 characters (issue 5505090)
From: |
dak |
Subject: |
Re: lexer.ll: Warn about non-UTF-8 characters (issue 5505090) |
Date: |
Mon, 02 Jan 2012 09:11:24 +0000 |
On 2012/01/01 22:06:52, Keith wrote:
On 2012/01/01 10:12:27, dak wrote:
Sorry, I wasn't making much sense. As a reader I want to *recognize*
what the
but switch/case is doing rather than trying to figure it out. Maybe :
// Test if these bytes are a UTF-8 encoding of a Unicode character,
// and warn if not. Trap overly-long UTF-8 encodings, but we don't
// need to worry about finer details like some filters do.
What are the finer details? UTF-16 surrogates? I was just too lazy to
figure out the patterns for them.
because your test is almost but not quite equivalent to the regex in
the back of
the Flex manual.
I can't find a regex there. Got a link or section name? Is this a
released version?
http://codereview.appspot.com/5505090/