|
From: | David Bateman |
Subject: | Re: Aw: Re: regexp: matching expressions b4 and after .... |
Date: | Tue, 09 Sep 2008 15:41:54 +0200 |
User-agent: | Thunderbird 2.0.0.16 (X11/20080725) |
David Bateman wrote:
Ok, forget it.. I figured it out.. The issue is that matlab uses a different syntax for named tokens than PCRE, so we are obliged to look for named tokens like "(?<name>)" and replace them with the PCRE compatible "(?P<name>)". The test in Octave to do this was trapping "(?<=...)" and "(?<!...") as a syntax error for a matlab named token. The other lookaround operator "(?=...)" and "(?!...)" seem to work as pretty much as expected.One issue is that PCRE does not accept arbitrary length lookaround expressions and so "(?<=[a-z]*)" is not legal with PCRE. Though maximum length lookarounds are acceptable, so you can write instead "(?<=[a-z]{10})" for example.I have a changeset to address this, but wonder if I should look for lookaround operators with "*" or "+" and replace with "{MAX_LENGTH}" and "{1:MAX_LENGTH}" respectively, with a warning about this limitation. Should I do this before submitting the changeset?
Grrrr, its more annoying than I thought. PCRE CAN do arbitrary length lookahead, but not arbitrary length lookbehind. Thus "(?[a-z]*)" is ok but "(?<[a-z]*)" isn't. I'd hoped to replace this with "(?<[a-z]{0,MAXLENGTH})" but the variable but not arbitrary length is not ok either. What I'd have to do is replace it with
((?<[a-z]{0})(?<[a-z]{1})...(?<[a-z]{MAXLENGTH}))which used the alternate operator and MALENGTH+1 copies of the lookbehind expression to get the effect. This seems to be a ridiculous amount of extra crap in the pattern space to get this functionality. Is it worth supporting arbitrary length lookbehind expressions like "(?<[a-z]*)" if this is what is needed to get it to work with PCRE? Is it worth supporting it but limits max_length, and print a warning? If so what value should be the limit?
Frankly I wonder how mathworks got this to work as they appear to be using the Boost regex library which also doesn't support arbitrary length lookbehind expressions....
D. -- David Bateman address@hiddenMotorola Labs - Paris +33 1 69 35 48 04 (Ph) Parc Les Algorithmes, Commune de St Aubin +33 6 72 01 06 33 (Mob) 91193 Gif-Sur-Yvette FRANCE +33 1 69 35 77 01 (Fax) The information contained in this communication has been classified as: [x] General Business Information [ ] Motorola Internal Use Only [ ] Motorola Confidential Proprietary
[Prev in Thread] | Current Thread | [Next in Thread] |