|
From: | Philip Nienhuis |
Subject: | Re: regexp question |
Date: | Tue, 06 Dec 2011 21:00:13 +0100 |
User-agent: | Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.9.1.11) Gecko/20100701 SeaMonkey/2.0.6 |
Sergei Steshenko wrote:
----- Original Message -----From: Philip Nienhuis<address@hidden> To: William Krekeler<address@hidden> Cc: "address@hidden"<address@hidden>; address@hidden Sent: Tuesday, December 6, 2011 7:52 PM Subject: Re: regexp question Sergei, Wiliam, 2 answers in one post: Sergei Steshenko wrote:I guess you need 'aa' surrounded by not 'a'. Octave usesPCRE; I am not familiar with nuances of Octave PCRE usage; in Perl I would write the regular expression this way:[^a]aa[^a] and if/when it matches, it returns pointer to the character preceding the'aa' substring, i.e. in case of 'baab' it should return pointer to the first 'b'. Thanks, Sergei. I already tried this and found it'll work, but unfortunately not in a more complicated situation: octave:35> tststr3 = 'aa aaaaa baa' ## Patterns at start& end tststr3 = aa aaaaa baa octave:36> regexp (tststr3, "[^a]aa[^a]") ans = [](1x0) ## Hey...... but octave:41> tststr4 = ' aa aaaaa baa ' ## Note spaces at start and end tststr4 = aa aaaaa baa octave:42> regexp (tststr4, "[^a]aa[^a]") ans = 1 11 ... so it doesn't catch the pattern at start and end of line.[snip] I still suggest the Perl regular expressions tutorials/documentation I gave links to. Straightforwardly the regular expression can be extended to (in Perl syntax) : (^|[^a])(aa)([^a]|$) # $1 $2 $3 . Not inside character class '^' means line beginning, and '$' means line end. In Perl terms the 'aa' part you are interest in is in $2.
Thank you, Sergei. How do I get $2? octave-3.5.0+:1> tststr3 = 'aa aaaaa baa' # No spaces at ends tststr3 = aa aaaaa baa octave-3.5.0+:2> regexp (tststr3, "(^|[^a])(aa)([^a]|$)") ans = 1 10octave-3.5.0+:3> tststr3(1) ans = a octave-3.5.0+:4> tststr3(10)
ans = b... so there's some extra interpretation involved to get the proper position. (Little wonder as line beginnings/-ends have no length.)
Anyway, I think a regexp() solution is doomed here as its execution time is -currently- excessive (see my previous post). A while ago Rik wrote that regexprep() would be in the order of 20 X slower than strrep. The script in my previous post confirms this relative slowness of regexp vs. compiled script functions.
In conclusion, I think I'll try to cook up something with strfind(). Philip
[Prev in Thread] | Current Thread | [Next in Thread] |