bug-apl
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Bug-apl] Suggestion for Quad-RE


From: Juergen Sauermann
Subject: Re: [Bug-apl] Suggestion for Quad-RE
Date: Wed, 11 Oct 2017 15:15:45 +0200
User-agent: Mozilla/5.0 (X11; Linux i686; rv:52.0) Gecko/20100101 Thunderbird/52.3.0

Hi Elias,

I understand your case but I am afraid that 1↓ is the wrong approach in general. It happens to work in
special cases (read: 1 level of sub-expressions) but not in general.

If I understand libpcre correctly (and I propably don't) then a general regular _expression_ RE is a tree whose
structure is determined by the nesting of the parentheses in RE, and the result of a match follows the tree structure.
If the tree has only a single level (so it has a root and only one level of children of the root) then you can (mis-)interpret
the children as an array and dropping the root (by means of 1↓ ) happens (by chance) to return the children.
But it remains incorrect because the functions to access tree-like nested values should be ⌷, ⊃, and ⊂ and not
↑, ↓, or [].

Best Regards,
/// Jürgen


On 10/11/2017 07:14 AM, Elias Mårtenson wrote:
I think you have a point. It would be very useful to be able to have ⎕RE filter the results for you.

In experimenting with your specific case, I came across another use-case that might warrant another flag: One that does not return the full match, but only the parenthesised subexpressions (this used to be the default in my initial draft version). Now I have to use 1↓ to remove this.

Here is my somewhat realistic test case that takes the log file, and extracts the date and the name of the service that was started or stopped:

      file ← ⎕FIO[49] "/some/file/name"
      x ← "^([a-zA-Z]{3} [0-9]+ [0-9]{2}:[0-9]{2}:[0-9]{2}).*: (Started|Stopped) (.*)$" ⎕RE file
      ⍴ x
┏→━━━━┓
┃69339┃
┗━━━━━┛
      result ← ⊃ 1↓¨ ({⍬≢⍵}¨x) / x
      ⍴ result
┏→━━━━━┓
┃7269 3┃
┗━━━━━━┛

This is a lot more complicated than it needs to be. The two new flags mentioned would completely remove the last line and replace it with a simple pair of ⎕RE["XY"] flags.

Regards,
Elias

On 11 October 2017 at 11:12, Christian Robert <address@hidden> wrote:
Sometimes we only want to know if it match or not.

I suggest a new flag ['m']  (as match) that will return ...

  for a string:  either 0 or 1 as a scalar for "not matching" or "matching"
  for an array of strings: a vector of 0/1 for each string saying like above.


lets say:

      z←⎕fio[49] '/var/log/messages'  // beware that this file is inaccessible by default unless being "root" on linux
                                      // or you chmod a+r /var/log/messages  # as root

who may return 50,000 lines or even 2 millions, on an average of say ~120 characters each.


I would hope to be able to use a flag as ['m']:

     'Started|Stopped' ⎕RE['m'] z

who will return an array of (0/1) telling which lines match or not the pattern, so I can
only retain those matching for further fine tuning (via diadic operator "/").

It will be a LOT faster than letting ⎕RE returning the whole result of pcre2 INTO the physical Gnu-APL memory engine
creating a lot of integers arrays for no real purpose, ie: seen from the application.

comments welcome,

my usual 2 cents,
Xtian.




reply via email to

[Prev in Thread] Current Thread [Next in Thread]