[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [PATCH] ensure that the regexp [b-a] is diagnosed as invalid
From: |
Jim Meyering |
Subject: |
Re: [PATCH] ensure that the regexp [b-a] is diagnosed as invalid |
Date: |
Wed, 03 Feb 2010 17:49:13 +0100 |
Eric Blake wrote:
> Jim Meyering <jim <at> meyering.net> writes:
>> It adds a test to gl_REGEX that ensures that re_compiler_pattern
>> diagnoses [b-a] as invalid when using RE_SYNTAX_POSIX_EGREP.
>
> Where does POSIX state that this is invalid?
Thanks for looking.
I too verified (before embarking) that POSIX does not declare it invalid,
merely unspecified. However, since gnulib's regex has rejected such
ranges for a long time and sed, awk, perl, etc. act that way, I think
it's the way to go.
Note also that glibc's code appears to try to implement the same
behavior (though conditional upon RE_NO_EMPTY_RANGES, which nearly
everyone uses), but somehow that code does not function properly:
start_collseq = lookup_collation_sequence_value (start_elem);
end_collseq = lookup_collation_sequence_value (end_elem);
/* Check start/end collation sequence values. */
if (BE (start_collseq == UINT_MAX || end_collseq == UINT_MAX, 0))
return REG_ECOLLATE;
if (BE ((syntax & RE_NO_EMPTY_RANGES) && start_collseq > end_collseq, 0))
return REG_ERANGE;
I've just filed this glibc bug:
http://sourceware.org/bugzilla/show_bug.cgi?id=11244
> So far, I can only see that it is
> undefined, but have not found any hard requirements that it be a failure.
>
> http://www.opengroup.org/onlinepubs/9699919799/basedefs/V1_chap09.html
>
> 9.3.5 RE Bracket Expression, step 7: "The starting range point and the ending
> range point shall be a collating element or collating symbol.... If the
> represented set of collating elements is empty, it is unspecified whether the
> expression matches nothing, or is treated as invalid."
>
> That said, forcing a hard failure is probably the best QoI implementation of
> undefined behavior.