[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
bug#37659: rx additions: anychar, unmatchable, unordered-or
From: |
Mattias Engdegård |
Subject: |
bug#37659: rx additions: anychar, unmatchable, unordered-or |
Date: |
Tue, 11 Feb 2020 13:57:27 +0100 |
22 okt. 2019 kl. 19.33 skrev Paul Eggert <eggert@cs.ucla.edu>:
> Moreover, if greed is the longstanding tradition for regexp-opt, shouldn't
> plain "or" be greedy, to be consistent with other operators?
Having second thoughts, I've come to believe that Paul may have been right
after all. We might just as well let plain 'or' (alias '|') match as much as
possible when it is able to do so. In particular, we should guarantee that this
will happen when all arguments are strings, as used to be the case.
Initially I thought it was a bug that (or "a" "ab") was optimised into "ab?" on
the grounds that this made the behaviour unpredictable: when matching the
string "abc", (or "a" "ab") matched "ab", whereas (or "a" "ab" space) would
match "a". However, the current 'fixed' code isn't necessarily more useful.
Since the change was introduced in Emacs 27 which has not yet been released, I
suggest the attached patch for emacs-27. It reverts the use of regexp-opt with
KEEP-ORDER = t. What do you think? It would solve the problem without
introducing new constructs, and without running the risk of introducing subtle
errors in existing rx expressions.
(In fact, if we do not do this in Emacs 27, we'd have to add a NEWS entry to
warn users about the change.)
A further improvement would be to ensure that nested all-string 'or' forms
would have the same property, and that expansion of user-defined forms would be
transparent. In other words, that
(rx-let ((x (or "abc" "de")))
(rx (or "a" x (or "ab" "def"))))
would be equivalent to
(rx "abc" "ab" "a" "def" "de")
I'll prepare a patch for this QoI improvement, but the attached patch should be
required no matter what.
0001-rx-Use-longest-match-for-all-string-or-forms-bug-376.patch
Description: Binary data
- bug#37659: rx additions: anychar, unmatchable, unordered-or,
Mattias Engdegård <=
- bug#37659: rx additions: anychar, unmatchable, unordered-or, Eli Zaretskii, 2020/02/11
- bug#37659: rx additions: anychar, unmatchable, unordered-or, Mattias Engdegård, 2020/02/11
- bug#37659: rx additions: anychar, unmatchable, unordered-or, Paul Eggert, 2020/02/11
- bug#37659: rx additions: anychar, unmatchable, unordered-or, Mattias Engdegård, 2020/02/12
- bug#37659: rx additions: anychar, unmatchable, unordered-or, Mattias Engdegård, 2020/02/13
- bug#37659: rx additions: anychar, unmatchable, unordered-or, Paul Eggert, 2020/02/13
- bug#37659: rx additions: anychar, unmatchable, unordered-or, Mattias Engdegård, 2020/02/13
- bug#37659: rx additions: anychar, unmatchable, unordered-or, Eli Zaretskii, 2020/02/13
- bug#37659: rx additions: anychar, unmatchable, unordered-or, Mattias Engdegård, 2020/02/13
- bug#37659: rx additions: anychar, unmatchable, unordered-or, Eli Zaretskii, 2020/02/14