[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: pygment regex question

From: Jean Abou Samra
Subject: Re: pygment regex question
Date: Fri, 25 Nov 2022 21:39:28 +0100
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:102.0) Gecko/20100101 Thunderbird/102.5.0

Le 25/11/2022 à 20:28, Luca Fascione a écrit :
On Fri, 25 Nov 2022, 18:11 Jean Abou Samra, <> wrote:

    What makes you think Pygments can’t do this? You can do


Nothing but my not remembering lookaheads/lookbehinds, which I may argue aren't very commom constructs. In fact aside from PERL I'm not even sure what precedent they have (no python doesn't count). Besides, this has nothing to do with pygments, this is the regex matching engine that does its thing, pygments just gratefully receives the benefit.

Well, reusing a feature found in the underlying tools is not bad
design, it is good design that shares functionality instead of
reinventing the wheel.

(Sorry, I co-maintain Pygments, which is why I am a bit sensitive
to this "bad design" criticism.)

    and things like that. You could also arrange so that the regex
    parsing a pitch leaves you in a state of the lexer where something
    special will happen for \d+

This does sound like pygments code. Interesting, I wasn't aware you could mess with the state of the lexer to that depth.

Hrrm... It's not an advanced feature, it's really the basic way
Pygments lexers work. You have a set of states, the lexer has a
state stack, each state tries regex-based rules in turn and a rule
adds to or removes from the stack. This example would be done as

tokens = {
    "root": [
        (r"\w+", Token.Pitch, "after_note"),
    "after_note": [
        (r"\d+", Token.Duration, "#pop"),

In simple cases (if there is no complex stuff in the "after_note"
state), you can get also along with

tokens = {
    "root": [
        (r"(\w+)(\d*)", bygroups(Token.Pitch, Token.Duration)),

which in hindsight may be closer to what you were thinking
of originally.

    However, durations don’t always follow a pitch, as in

    \tuplet 3/2 8. { … }

    which is the reason why we don’t want to do that.

Does Lilypond's parser even know that's a duration? Isn't that just a bare string that \tuplet internally interprets as a duration?

\tuplet is defined (in ly/ as

tuplet =
#(define-music-function (ratio tuplet-span music)
   (fraction? (ly:duration? '()) ly:music?)

When the parser sees "8", it notes that this could
be either a number of a duration, so it tries the
different variants against the predicate ly:duration?
The function receives an argument of the right
type thanks to the predicate it declares for this

If you wanted to do that in Pygments, you would have
to know the signature of every LilyPond music function
and which predicates match numbers or durations,
not to mention the problem of user-written functions.

When implementing this kind of simplistic syntax highlighting (like, ones not assisted by being aware of the semantics of the language, like you'd have in Visual Studio or Qt Creator, say) there's always this problem of how much of the common libraries you reimplement by hand, I'm not sure how Frescobaldi does its thing, for example, a lot of it seems quite magic to me (or the result of a huge labour of love... I mean, that program is just brilliant).

Anyways whatever Frescobaldi does, I wonder if we could mimic for Pygments...

What Frescobaldi does is here:

1500+ lines of code, obviously a lot of work and dedication.
Nevertheless, it has to make assumptions too. For example, if
you enter this in Frescobaldi:

\version "2.22.2"

  \barNumberCheck 1
  \tweak duration-log 2 c'1

... you will notice that the "1" after \barNumberCheck is highlighted
in the same color as the duration in "c'1", in spite of it being
a number like the "2" in "\tweak duration-log 2 ..."

On the reasons not to reuse Frescobaldi's code for syntax
highlighting in the documentation, see


Attachment: OpenPGP_signature
Description: OpenPGP digital signature

reply via email to

[Prev in Thread] Current Thread [Next in Thread]