bug-gnu-emacs
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

bug#34641: rx: (or ...) order unpredictable


From: Mattias Engdegård
Subject: bug#34641: rx: (or ...) order unpredictable
Date: Sun, 24 Feb 2019 19:40:33 +0100

The rx (or ...) construct sometimes reorders its subexpressions, which makes 
its semantics unpredictable. For example,

(rx (or "ab" "a") (or "a" "ab"))
=>
"\\(?:ab?\\)\\(?:ab?\\)"

The user reasonably expects (or e1 e2) to translate to E1\|E2, where ei 
translates to Ei, or a semantic equivalent. Not having this control makes rx 
useless or dangerous for many purposes.

The reason for the reordering is the use of regex-opt behind the scenes. 
Whether rx is the place to do this kind of optimisation is a matter of opinion; 
mine is that it belongs in the regexp engine, together with other, more 
aggressive optimisations (DFA, native-code generation, etc) could be performed 
as well.

We could determine whether any string is a prefix of another. If not, 
regexp-opt should be safe to call. Alternatively, this check could be done in 
regexp-opt (activated by a flag). That would be my preferred short-term 
solution.

(Speaking of regexp-opt, it has another bug that does not affect rx: it returns 
the empty string if given an empty list of strings. The correct return value is 
a regexp that never matches anything. Fix it, document it, or turn it into an 
error?)






reply via email to

[Prev in Thread] Current Thread [Next in Thread]