[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: rx.el sexp regexp syntax (WAS: Off Topic)
From: |
Pierre Neidhardt |
Subject: |
Re: rx.el sexp regexp syntax (WAS: Off Topic) |
Date: |
Fri, 25 May 2018 18:47:59 +0200 |
User-agent: |
mu4e 1.0; emacs 26.1 |
Alan Mackenzie <address@hidden> writes:
>> rx.el is one of the best concepts I've discovered in a long time.
>> It's another instance of "Don't come up with a new (mini)language when
>> Lisp can do better": it's easier to learn, more flexible, easier to
>> write, much easier to read and as a consequence much more maintainable.
>
> Much easier than what? Than the putative mini-language that doesn't get
> written?
I meant that in my opinion rx is easier to write than regexps. That it
is not popular is the root of the question here.
>> I think it's high time we moved away from traditional regexps and
>> embraced the concept of rx.el. I'm thinking of implementing it for
>> Guile.
>
> There's nothing stopping anybody from using rx.el. However, people have
> mostly _not_ used it. The "I think it's high time ...." suggests in
> some way forcing people to use it. Before mandating something like
> this, I think we should find out why it's not already in common use.
Sorry if you felt I was forcing, that wasn't my intention. I was
referring to the long period regexps have been around.
I thought the reason it's not already in common use had already been
discussed: it's barely referenced anywhere, it needs more advertising.
Correct me if this is wrong.
>> At the moment the rx.el implementation is built on top of Emacs regexps
>> which are implemented in C. I believe this does not use the power of
>> Lisp as much as it could.
>
> But would any alternative use the power of regexps?
Yes, rx.el is a drop-in replacement of regexps. What do you mean?
> Emacs has a (moderately large) cache of regexps, so that building the
> automatons is done very rarely. Possibly just once each for each
> session of Emacs.
That's the whole point: if possible (see below), remove the requirements
for regexp cache management.
>> In high-level languages, automatons are automatically cached to save the
>> cost of building them.
>
> Emacs Lisp does this too.
I did not exclude it :)
>> The rx.el library/concept could alleviate this issue altogether: because
>> we express the automaton directly in Lisp, the parsing step is not
>> needed and thus the building cost could be tremendously reduced.
>
>> So the rx.el building steps
>
>> rx expression -> regexp string -> C regexp automaton
>
>> could boil down to simply
>
>> rx automaton
>
> I don't see what you're trying to save, here. At some stage, the regexp
> source, in whatever form, needs to be converted to an automaton.
Yes, that's what I meant with "rx automaton". My suggestion (not
necessarily for Emacs Lisp) is to remove the step that converts the rx
symbolic automaton to a string, and the conversion from a string to the
actual automaton.
> Are you suggesting here building an interpreter in Lisp directly to
> execute rx expressions?
Yes, but maybe in Guile or some other Lisp. Don't know if it's feasible
in Emacs Lisp.
>> It would be interesting to compare the performance. This also means
>> that there would be no need for caching on behalf of the supporting
>> language.
>
> I will predict that an rx interpreter built in Lisp will be two orders
> of magnitude slower than the current regexp machine, where both the
> construction of an automaton, and the byte-code interpreter which runs
> it are written in C (and probably quite optimised C at that).
Obviously, and this is the prime reason why the author of rx.el
implemented it on top of C regexp. My point was that with a fast Lisp
(or a specifically designed C support), a Lisp automaton would be just
as fast: the Lisp code would directly map the equivalent C automaton.
Again, I have no clue if that's doable in Emacs Lisp.
> I can't get excited about rx syntax, which I'm sure would be just as
> tedious, and possibly more difficult to read than a standard regexp.
Have you used rx? The whole point of the library is to increase
readability, and it does a great job at it in my opinion.
> Analagously, as a musician, I read standard musical notation (with
> sets of five lines and dots) far more easily and fluently than I could
> any "simplified" system designed for beginners, which would be bloated
> by comparison.
rx.el is meant to be "simplified for beginners". You could also reverse
the analogy in saying that regexps are the "simplified version for
beginners"... The analogy does not map very well.
A better analogy would be the mapping between assembly and the
hexadecimal codes of CPU instructions: I don't think many people find
hexedecimal codes more explicit than assembly verbs and symbols
(although most assembly languages abuse abbreviations, but the
intention is there).
> Regular expressions can be difficult. I don't believe this difficulty
> lies, in the main, in the compact notation used to express them. Rather
> it lies in the concepts and the semantics of the regexp elements, and
> being able to express a "mental automaton" in regexp semantics.
The semantic between rx and regexp does not differ. It's purely
syntactical.
Let's consider some points:
- rx can be written over multiple lines and indented. This is a great
readibility booster for groups, which can be _grouped_ together with
linebreaks and indentation.
- rx does not require escaping any character with backslashes. This
is always a great source of confusion when switching from BRE to ERE,
between different interpreters and when storing regexp in Lisp strings
where backslashes must be escaped themselves for instance.
- Symbols with non-trivial meanings in regexp (e.g. \<, :, ^, etc.) have
a trivial _English_ counterpart in rx: (respectively "word-start",
nothing, "line-start" _and_ "not").
- No more special-case symbols like "-" for ranges or "^" (negation when
first character in square brackets). Thus less cognitive burden.
- The "^" has a double-meaning in regexp: "line-start" and "not".
The list goes on.
--
Pierre Neidhardt
signature.asc
Description: PGP signature
- rx.el sexp regexp syntax (WAS: Off Topic), Noam Postavsky, 2018/05/24
- Re: rx.el sexp regexp syntax (WAS: Off Topic), Van L, 2018/05/24
- Re: rx.el sexp regexp syntax (WAS: Off Topic), Richard Stallman, 2018/05/24
- Re: rx.el sexp regexp syntax (WAS: Off Topic), Pierre Neidhardt, 2018/05/25
- Re: rx.el sexp regexp syntax (WAS: Off Topic), Alan Mackenzie, 2018/05/25
- Re: rx.el sexp regexp syntax (WAS: Off Topic),
Pierre Neidhardt <=
- Re: rx.el sexp regexp syntax, Eric Abrahamsen, 2018/05/25
- Re: rx.el sexp regexp syntax, Pierre Neidhardt, 2018/05/25
- Re: rx.el sexp regexp syntax, Eric Abrahamsen, 2018/05/25
- Re: rx.el sexp regexp syntax, Clément Pit-Claudel, 2018/05/25
- Re: rx.el sexp regexp syntax, Eric Abrahamsen, 2018/05/25
- Re: rx.el sexp regexp syntax, Michael Heerdegen, 2018/05/25
- Re: rx.el sexp regexp syntax, Eric Abrahamsen, 2018/05/25
- Re: rx.el sexp regexp syntax, Stefan Monnier, 2018/05/27
- Re: rx.el sexp regexp syntax, Pierre Neidhardt, 2018/05/28
- Re: rx.el sexp regexp syntax, Stefan Monnier, 2018/05/28