[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: How to grok a complicated regex?

From: Marcin Borkowski
Subject: Re: How to grok a complicated regex?
Date: Sat, 14 Mar 2015 00:16:50 +0100

On 2015-03-13, at 23:46, Emanuel Berg <> wrote:

> Marcin Borkowski <> writes:
>> so I have this monstrosity [note: I know, there are
>> much worse ones, too!]:
>> "\\`\\(?:\\\\[([]\\|\\$+\\)?\\(.*?\\)\\(?:\\\\[])]\\|\\$+\\)?\\'"
>> (it's in the org-latex--script-size function in
>> ox-latex.el, if you're curious).
>> I'm not asking “what does this match” – I can read
>> it myself. But it comes with a considerable effort.
> I dare say most people (even programmers) cannot read
> that so if you can that's great. As a math

Really?  It's not /that/ difficult.  You only need enough coffee (or
tea, in my case), time and motivation.  You don’t need a genius, or even
IQ higher than, say, 90 or so.  It's not really /difficult/.
Intimidating, yes.  Boring, possibly.  Laborious (and mechanical), yes.
But not /difficult/.

> professional you are of course aware of the discipline
> called automata theory that deals with such things.

Well, as an analyst working in metric fixed point theory, that's just
it.  I'm /aware/ of automata theory – (almost) nothing more. ;-)

> Perhaps relational algebra might help to, if the data
> in the sets are strings. But automata theory should be
> it even more.
> Also, remember you don't have to understand those
> expressions. Often they are setup incrementally. They
> only need to be correct. The computer understands them
> - the programmer only understands the purpose, and the
> latest edition. Kind of risky, perhaps not what I math
> person would be appealed by, but I've constructed many
> that way so I know that method works.

That reminds me of the von Neumann quote: “In mathematics, you don’t
/understand/ things – you just /get used/ to them.”

>> Are you aware of any tools that might help to
>> understand such regexen?
> I have seen tools with which you can construct such
> expressions and they output figures, states,
> transitions, and so on. I wonder how advanced
> expression they can deal with? But if you get the
> basics right, it should be just basic building blocks
> that stick together and from there on the sky is the
> limit.
> Instead the problem is, as I see it: will those
> figures, balls and arrows, tagged with preconditions,
> postconditions, everything you can think of, will that
> actually be *clearer*?

As we both point out, I’m not talking about changing the representation,
but about making the existing one (which I agree is not /that/ bad) more
comprehensible.  Font lock, grouping and unescaping backslashes would be
definitely helpful.

OTOH, I can imagine that some kind of diagrams might be helpful for
someone.  The point is, in the end you have to read/write these regexen
in their normal form anyway, so why not train yourself to understand
their “default” representation instead of adding the burden of
translationg between representations?

> If I were to do it (which I am not thanks god) my
> answer would be *no*. The only way I could do it would
> instead be the opposite. Train the brain with such
> expressions - exactly as they are - day in, day out,
> until they are second nature.
> Example: a C++ OO project with classes and everything.
> Silly inheritance and interfaces. Some people would
> consider those pretty darn difficult to understand.
> But to the seasoned C++ programmer (no exaggerating
> here, a few years of focused training is enough) those
> programs are clear. For those guys, giving up writing
> C++ code and instead using some other representation
> (be it graphical or not) would be to in one stroke
> cripple their skills.
> So no, I think that representation is the best there
> is. To translate it back and forth would not only be

I’m not sure whether it’s the best – but it’s a standard (more or less,
Emacs’ regexen are not really “standard” by today’s, well, standards –
but hardly anything about Emacs is “standard” or “typical”, so who

> very difficult to do - and even if possible, which of

I disagree.  I don’t think that such a translator would be a difficult
one to write.

If only I was a student again, with plenty of spare time, I might have
taken the challenge and tried to write one in TeX, so that some TeX
macro, given an (Emacs) regex would produce a nicely typeset diagram.

Wow, what a nice project for a bachelor’s thesis.  Wait a minute.
Ohboyohboyohboy.  I have to put this in my faculty’s database of
potential topics.  Poor students... ;-)

(BTW, I did once write a poor man’s parser in pure TeX; since there were
no regex engine written in TeX back then (now there is one!), I had to
craft a simple automaton myself.  Not an extremely pleasant work...)

> course it is, because a representation is just a
> representation of I don't know how many possible - I
> don't see the end result being any more clear: on the
> contrary, most likely.
> What I would do - try to get it more readable by using
> classes, string classes (do they exist?), and even
> more advanced constructs if necessary - as in this
> simple example:
>     (defconst stop-char-default "\\([[:punct:]]\\|[[:space:]][[:alnum:]]\\)")
> How do you define those? Can you identify any which
> aren't there, but could/should be?
> Example: say there is a class called "delimiters"
> which contain [, (, {, <, >, }, ), and ]. Can you
> split that up, in "opening-delimiters" and closing
> ditto?
> Second, exactly you mentioned - the font lock issue -
> work on that.
> You do know, of course, of
>     font-lock-regexp-grouping-construct
>     font-lock-regexp-grouping-backslash
> Are there more of those, that you can identify, and
> add?

There could be quite a few.  (As Alexis pointed out, a tool I was
writing about seems to exist – if it’s not satisfactory, I could think
about extending it somehow.  Not very probable, though – I’m too busy
now.  If only someone could be paying me for goofing around and playing
with Emacs hacks...)

Thanks for your input, and best regards!

Marcin Borkowski
Faculty of Mathematics and Computer Science
Adam Mickiewicz University

reply via email to

[Prev in Thread] Current Thread [Next in Thread]