[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: Regex parenthesis handling bug
From: |
Neil Jerram |
Subject: |
Re: Regex parenthesis handling bug |
Date: |
22 Oct 2001 23:32:23 +0100 |
User-agent: |
Gnus/5.0808 (Gnus v5.8.8) Emacs/20.7 |
>>>>> "Gary" == Gary Houston <address@hidden> writes:
>> From: Neil Jerram <address@hidden> Date: 21 Oct 2001
>> 11:18:50 +0100
>>
>> According to the libc manual, all libc regex settings default
>> to the Emacs behaviour, so unquoted parens should match literal
>> parens in the match string, while quoted parens indicate
>> grouping.
>>
>> In Guile, it doesn't work like this...
>>
guile> (string-match "\\(x\\)" "x")
>> $3 = #f
Gary> The libguile functions use POSIX interfaces with
Gary> REG_EXTENDED defined, which makes unquoted parens into match
Gary> delimiters. Adding the REG_BASIC flag should reverse it. I
Gary> can't find anything about Emacs behaviour in the glibc
Gary> (2.2.2) manual, [...]
Sorry, I think I was misremembering /usr/include/regex.h, which says:
/* The following bits are used to determine the regexp syntax we
recognize. The set/not-set meanings are chosen so that Emacs syntax
remains the value 0. The bits are given in alphabetical order, and
the definitions shifted by one from the previous bit; thus, when we
add or remove a bit, only one other definition need change. */
typedef unsigned long int reg_syntax_t;
But it turns out that the setting of re_syntax_options doesn't apply
to POSIX regcomp, but to the GNUish re_compile_pattern function. If I
write a new `make-emacs-regexp' primitive using re_compile_pattern
rather than regcomp, it gives the desired behaviour.
That still leaves a problem for non-glibc systems, where Elisp regex
support is concerned, but I guess that can be solved by including
source code from Emacs where necessary.
(Note that the available flags for `make-regexp' don't give me what I
want in general:
(string-match-basic "\\(x\\)" "x") => #("x" (0 . 1) (0 . 1))
is good (i.e. Emacs-compatible), but
(string-match-basic "ba+c" "abaaac") => #f
is not.)
Thanks,
Neil