help-octave
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Changeset] Re: Aw: Re: regexp: matching expressions b4 and after ..


From: Jaroslav Hajek
Subject: Re: [Changeset] Re: Aw: Re: regexp: matching expressions b4 and after ....
Date: Wed, 10 Sep 2008 13:07:39 +0200

On Tue, Sep 9, 2008 at 6:37 PM, John W. Eaton <address@hidden> wrote:
> On  9-Sep-2008, David Bateman wrote:
>
> | Ben Abbott wrote:
> | > On Tuesday, September 09, 2008, at 09:41AM, "David Bateman" 
> <address@hidden> wrote:
> | >
> | >> Grrrr, its more annoying than I thought. PCRE CAN do arbitrary length
> | >> lookahead, but not arbitrary length lookbehind. Thus "(?[a-z]*)" is ok
> | >> but "(?<[a-z]*)" isn't. I'd hoped to replace this with
> | >> "(?<[a-z]{0,MAXLENGTH})" but the variable but not arbitrary length is
> | >> not ok either. What I'd have to do is replace it with
> | >>
> | >> ((?<[a-z]{0})(?<[a-z]{1})...(?<[a-z]{MAXLENGTH}))
> | >>
> | >> which used the alternate operator and MALENGTH+1 copies of the
> | >> lookbehind expression to get the effect. This seems to be a ridiculous
> | >> amount of extra crap in the pattern space to get this functionality. Is
> | >> it worth supporting arbitrary length lookbehind expressions like
> | >> "(?<[a-z]*)" if this is what is needed to get it to work with PCRE? Is
> | >> it worth supporting it but limits max_length, and print a warning? If so
> | >> what value should be the limit?
> | >>
> | >> Frankly I wonder how mathworks got this to work as they appear to be
> | >> using the Boost regex library which also doesn't support arbitrary
> | >> length lookbehind expressions....
> | >>
> | >> D.
> | >>
> | >
> | > David,
> | >
> | > Have you tried the example in Matlab?
> | >
> | > Using 2007b, It does *not* work for me. My 2008a/b is busy running some 
> simulations, so I can't try it there until later.
> | >
> | >
> | >>> g='x^(-1)+y(-1)+z(-1)=0';
> | >>> regexprep(g,'(?<=[a-z]*)\(\-[1-9]*\)','\_minus1')
> | >>>
> | > ans =
> | > x^_minus1+y_minus1+z_minus1=0
> | >
> | > If I understand correctly the result should be
> | >
> | > ans =
> | > x^(-1)+y_minus1+z_minus1=0
> | >
> | > Correct?
> | >
> | > Ben
> | >
> | >
> | >
> | >
> |
> | The message
> |
> | 
> http://groups.google.com/group/comp.soft-sys.matlab/browse_thread/thread/babf37252132fd99/250b037e60b345ff?lnk=gst&q=lookbehind#250b037e60b345ff
> |
> | seems to imply that mathworks have their own regexp engine and that
> | lookbehind is inefficient. I therefore don't consider it that much of an
> | issue to duplicate the lookbehind pattern in the pattern space and so
> | propose the attached changeset that replaces "(?>=[a-z]*)" with
> | "((?>=[a-z]{0})|(?>=[a-z]{1})|...(?>=[a-z]{10}))" before calling PCRE on
> | it. It also issues a warning about the maximum length string if the
> | lookbehind might be an issue. So the limitation is that "+" then
> | represents 1 to 10 characters and "*" 0 to 10 characters in a lookbehind
> | expression. This limitation doesn't apply to lookaheads, etc.
>
> Is the bug report
>
>  http://bugs.exim.org/show_bug.cgi?id=547
>
> the same problem?  Note the comment
>
>  I can't see an efficient way of doing this with the current
>  implementation.  Note that Perl is even more restrictive - all
>  alternatives in the lookbehind have to be the same length in Perl.
>
> I guess it might be worth asking whether there is a way to get this
> feature, even if it is not efficient.
>
> Meanwhile, I've applied your changeset.
>

I transplanted it to 3.0.x. The patch to regexp.cc applied cleanly
only after I replaced std::string::npos with NPOS (8211 was never
applied to 3.0.x).

thx


-- 
RNDr. Jaroslav Hajek
computing expert
Aeronautical Research and Test Institute (VZLU)
Prague, Czech Republic
url: www.highegg.matfyz.cz


reply via email to

[Prev in Thread] Current Thread [Next in Thread]