help-gnu-emacs
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Is it valid to use the zero-byte "^@" in regexps?


From: Thorsten Jolitz
Subject: Re: Is it valid to use the zero-byte "^@" in regexps?
Date: Wed, 18 Jun 2014 12:22:35 +0200
User-agent: Gnus/5.13 (Gnus v5.13) Emacs/24.3 (gnu/linux)

Nicolas Richard <theonewiththeevillook@yahoo.fr> writes:

> Thorsten Jolitz <tjolitz@gmail.com> writes:
>> To rule out a fundamental problem - is it valid to have the zero-byte
>> (inserted with C-q C-@) appear in a regexp like this? 
>>
>> ,--------------------------------------------------------
>> | "^#\\+begin_src[[:space:]]+emacs-lisp[^^@]*\n#\\+end_src"
>> `--------------------------------------------------------
>
> I don't see why it wouldn't be valid, but I don't know. If it is
> desirable is another question : it would be better to search for the
> beginning, then search for the end with another regexp.

That what I did initially, and what is of course much easier, but took
twice (?) as long too ...

>> If so, this regexp should reliably match any 
>>
>> ,-----------------------
>> | #+begin_src emacs-lisp
>> |  [...]
>> | #+end_src
>> `-----------------------
>
> From the first occurrence of
> #+begin_src emacs-lisp
> ;; after point to the last occurence of
> #+end_src
> in the buffer. If there's more than one, they'll be part of the match
> too. e.g. if there's another block in the same document :
> #+begin_src sh
> echo whatever.
> #+end_src
> it'll be part of the match too. If you don't want that, make the star
> non-greedy by appending a question mark to it:
> (re-search-forward
> "^#\\+begin_src[[:space:]]+emacs-lisp[^^@]*?\n#\\+end_src")

yes, thanks for the hint, in my real sources I do use the non-greedy *?
(otherwise it would not work), but forgot about it when writing the
mail.

>> no matter whats inside the block, right?
>
> Except NUL characters of course.

i.e. zero-byte "^@"?

But Emacs can differentiate between NUL characters and the @ character -
or not? NUL chars have blue fonts, and message-mode complains when
trying to send them via email, but e.g. this mail has many @ chars that
are just normal text (just like my test-file) and they are recognized as
such.

Often, but not always, the not matched source-blocks contain @
characters (but not NUL chars). The strange thing is that the failed
matching happens with these blocks being part of a really big
testfile. When I isolate and copy them to a temp buffer and try to match
them there, it just works.

That makes testing/bisecting a bit difficult - whenever I find the
problem and isolate it, its gone ...

Therefore my question - is this technique with negated zero-bytes in
regexps supposed to work, or maybe problematic from the beginning?

-- 
cheers,
Thorsten




reply via email to

[Prev in Thread] Current Thread [Next in Thread]