[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Changeset] Re: Aw: Re: regexp: matching expressions b4 and after ..
From: |
Jaroslav Hajek |
Subject: |
Re: [Changeset] Re: Aw: Re: regexp: matching expressions b4 and after .... |
Date: |
Wed, 10 Sep 2008 13:07:39 +0200 |
On Tue, Sep 9, 2008 at 6:37 PM, John W. Eaton <address@hidden> wrote:
> On 9-Sep-2008, David Bateman wrote:
>
> | Ben Abbott wrote:
> | > On Tuesday, September 09, 2008, at 09:41AM, "David Bateman"
> <address@hidden> wrote:
> | >
> | >> Grrrr, its more annoying than I thought. PCRE CAN do arbitrary length
> | >> lookahead, but not arbitrary length lookbehind. Thus "(?[a-z]*)" is ok
> | >> but "(?<[a-z]*)" isn't. I'd hoped to replace this with
> | >> "(?<[a-z]{0,MAXLENGTH})" but the variable but not arbitrary length is
> | >> not ok either. What I'd have to do is replace it with
> | >>
> | >> ((?<[a-z]{0})(?<[a-z]{1})...(?<[a-z]{MAXLENGTH}))
> | >>
> | >> which used the alternate operator and MALENGTH+1 copies of the
> | >> lookbehind expression to get the effect. This seems to be a ridiculous
> | >> amount of extra crap in the pattern space to get this functionality. Is
> | >> it worth supporting arbitrary length lookbehind expressions like
> | >> "(?<[a-z]*)" if this is what is needed to get it to work with PCRE? Is
> | >> it worth supporting it but limits max_length, and print a warning? If so
> | >> what value should be the limit?
> | >>
> | >> Frankly I wonder how mathworks got this to work as they appear to be
> | >> using the Boost regex library which also doesn't support arbitrary
> | >> length lookbehind expressions....
> | >>
> | >> D.
> | >>
> | >
> | > David,
> | >
> | > Have you tried the example in Matlab?
> | >
> | > Using 2007b, It does *not* work for me. My 2008a/b is busy running some
> simulations, so I can't try it there until later.
> | >
> | >
> | >>> g='x^(-1)+y(-1)+z(-1)=0';
> | >>> regexprep(g,'(?<=[a-z]*)\(\-[1-9]*\)','\_minus1')
> | >>>
> | > ans =
> | > x^_minus1+y_minus1+z_minus1=0
> | >
> | > If I understand correctly the result should be
> | >
> | > ans =
> | > x^(-1)+y_minus1+z_minus1=0
> | >
> | > Correct?
> | >
> | > Ben
> | >
> | >
> | >
> | >
> |
> | The message
> |
> |
> http://groups.google.com/group/comp.soft-sys.matlab/browse_thread/thread/babf37252132fd99/250b037e60b345ff?lnk=gst&q=lookbehind#250b037e60b345ff
> |
> | seems to imply that mathworks have their own regexp engine and that
> | lookbehind is inefficient. I therefore don't consider it that much of an
> | issue to duplicate the lookbehind pattern in the pattern space and so
> | propose the attached changeset that replaces "(?>=[a-z]*)" with
> | "((?>=[a-z]{0})|(?>=[a-z]{1})|...(?>=[a-z]{10}))" before calling PCRE on
> | it. It also issues a warning about the maximum length string if the
> | lookbehind might be an issue. So the limitation is that "+" then
> | represents 1 to 10 characters and "*" 0 to 10 characters in a lookbehind
> | expression. This limitation doesn't apply to lookaheads, etc.
>
> Is the bug report
>
> http://bugs.exim.org/show_bug.cgi?id=547
>
> the same problem? Note the comment
>
> I can't see an efficient way of doing this with the current
> implementation. Note that Perl is even more restrictive - all
> alternatives in the lookbehind have to be the same length in Perl.
>
> I guess it might be worth asking whether there is a way to get this
> feature, even if it is not efficient.
>
> Meanwhile, I've applied your changeset.
>
I transplanted it to 3.0.x. The patch to regexp.cc applied cleanly
only after I replaced std::string::npos with NPOS (8211 was never
applied to 3.0.x).
thx
--
RNDr. Jaroslav Hajek
computing expert
Aeronautical Research and Test Institute (VZLU)
Prague, Czech Republic
url: www.highegg.matfyz.cz
- Aw: Re: regexp: matching expressions b4 and after ...., giovanni . lombardo, 2008/09/09
- Re: Aw: Re: regexp: matching expressions b4 and after ...., David Bateman, 2008/09/09
- [Changeset] Re: Aw: Re: regexp: matching expressions b4 and after ...., David Bateman, 2008/09/09
- Re: Aw: Re: regexp: matching expressions b4 and after ...., David Bateman, 2008/09/09
- Re: Aw: Re: regexp: matching expressions b4 and after ...., Ben Abbott, 2008/09/09
- [Changeset] Re: Aw: Re: regexp: matching expressions b4 and after ...., David Bateman, 2008/09/09
- [Changeset] Re: Aw: Re: regexp: matching expressions b4 and after ...., John W. Eaton, 2008/09/09
- Re: [Changeset] Re: Aw: Re: regexp: matching expressions b4 and after ....,
Jaroslav Hajek <=
- Re: [Changeset] Re: Aw: Re: regexp: matching expressions b4 and after ...., David Bateman, 2008/09/10
Re: Aw: Re: regexp: matching expressions b4 and after ...., Ben Abbott, 2008/09/09