bug-apl
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Bug-apl] Regex support


From: Elias Mårtenson
Subject: Re: [Bug-apl] Regex support
Date: Mon, 2 Oct 2017 16:47:28 +0800

In playing around with this, I realise that the "B" mode is quite useful. So much so, in fact, that I'm wondering if it's warranted to have a dedicated quad-function for this specific behaviour.

Here's an example of extracting sequences of 4 characters:

      {⍵ ⊂⍨ "[a-z]{4}" ⎕RE['B'] ⍵} 'abcdef45abchello9'
┏→━━━━━━━━━━━━━━━━━━━┓
┃"abcd" "abch" "ello"┃
┗∊━━━━━━━━━━━━━━━━━━━┛

Regards,
Elias

On 2 October 2017 at 16:27, Elias Mårtenson <address@hidden> wrote:
Some progress:

The behaviour I described earlier still works, but now has the ability to work N-dimensional arrays of strings, compiling the regex only once and then applying it on all the cells.

In addition to this, I have now also added a flag "B" (meaning "bitmap") that creates a bitmap of all matches and can be used in conjunction with ⊂ to split strings by regex.

Here's an example:

      " +" ⎕RE["B"] "this is   a     test"
┏→━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┓
┃0 0 0 0 1 0 0 2 2 2 0 3 3 3 3 3 0 0 0 0┃
┗━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┛

This matches any sequence of spaces, and we can easily use ⊂ to split the string:

      {⍵ ⊂⍨ 0=" +" ⎕RE["B"] ⍵} "this is   a     test"
┏→━━━━━━━━━━━━━━━━━━━━━┓
┃"this" "is" "a" "test"┃
┗∊━━━━━━━━━━━━━━━━━━━━━┛

However, I'm not sure if the value returned from the function are ideal. The idea of the increasing numbers is to be able to differentiate between the result of:

      " " ⎕RE["B"] "    "
┏→━━━━━━┓
┃1 2 3 4┃
┗━━━━━━━┛

vs:

      " +" ⎕RE["B"] "    "
┏→━━━━━━┓
┃1 1 1 1┃
┗━━━━━━━┛

Should it be left like this, or should it be done in some other way?

Regards,
Elias

On 25 September 2017 at 20:10, Juergen Sauermann <address@hidden> wrote:
Hi Elias,

making a quad function an operator is simple if the function argument(s) is/are primitive functions
and a little more complicated if not.

First of all you have to implement (read: overload) some of the eval_XXX() function that have function
arguments. For monadic operators these eval_XXX() functions areare:

   virtual Token eval_ALB(Value_P A, Token & LO, Value_P B)
   virtual Token eval_ALXB(Value_P A, Token & LO, Value_P X, Value_P B)
   virtual Token eval_LB(Token & LO, Value_P B)
   virtual Token eval_LXB(Token & LO, Value_P X, Value_P B)

where L resp. LO stands for the left function argument. For a dyadic operators they are:

   virtual Token eval_ALRB(Value_P A, Token & LO, Token & RO, Value_P B)
   virtual Token eval_ALRXB(Value_P A, Token & LO, Token & RO, Value_P X, Value_P B)
   virtual Token eval_LRB(Token & LO, Token & RO, Value_P B)
   virtual Token eval_LRXB(Token & LO, Token & RO, Value_P X, Value_P B)

where L resp. LO and R resp. RO stand for the left and right function argument(s), A and B
are the value arguments, and X the axis.

Not all of them need to be implemented only those that have function signatures that
are supported by the operator (mainly in terms of allowing an axis argument X or a
left value argument A).

If an operator supports defined functions (as opposed to primitive functions) then it will typically
implement the operator itself as a macro, which means that the implementation is written in APL
rather than in C++ (similar to "magic functions" in NARS). This is needed because primitive functions
are atomic (they either succeed or fail, but cannot be continued after a failure) while defined functions
(and operators) can continue at the point of interruption after having fixed the values that have cause
the fault.

Some of the build-in operators in GNU APL have both a primitive implementation (which is used when
the function arguments are primitive) and a macro based implementation if not. This is for performance
reasons so that the ability to take defined functions as arguments does not performance-wise harm the
cases where the function arguments are primitive.

The Macro definitions are contained in Macro.def

Please note that in GNU APL functions cannot return functions, which may or may not be a problem
in your case, depending on whether the function argument(s) of the ⎕-operator is/are primitive or not.
In standard APL you cannot assign a function to a name. The usual work-around return a string and ⍎ it.

My guts feeling is that if you need function arguments for implementing regular expressions then
something has been going into the wrong direction somewhere else.

Best Regards,
/// Jürgen



On 09/25/2017 05:18 AM, Elias Mårtenson wrote:
Dyalog's implementation is much more expressive than what I had proposed.

There are technical reasons why we have no hope of replicating their functionality (in particular, GNU APL does not have support for namespaces).

Their function takes arguments and returns a function, which is a matcher function that can be reused, which is useful since you'd only compile the regexp once. Jürgen, how can I make a quad-function behave like below? It seems to be similar in behaviour to ⍤ and ⍣.

*      ('.at' ⎕R '\u0') 'The cat sat on the mat' *
The CAT SAT on the MAT

It can also accept a function, in which case the function is called for each match, to return a replacement string. Can you explain how to make a quad-function an operator?
*
*
*      ('\w+' ⎕R {⌽⍵.Match}) 'The cat sat on the mat'*
ehT tac tas no eht tam

As you can see, they leverage namespaces in order to pass a lot of different fields to the replace-function. If we want to do something similar, ⍵ would probably have to be the match string, and we'll have to live without the remaining fields.

Regards,
Elias


On 23 September 2017 at 00:08, Juergen Sauermann <address@hidden <mailto:address@hiddenline.de>> wrote:

    Hi,

    I have not looked into Dyalogs implementation myself, but if they
    have it then we should aim at being as compatible as it makes sense.
    No problem if some of their capabilities are not supported (please
    avoid
    going over the top in the GNU APL implementation)

    Unfortunately ⎕R is already occupied in GNU APL (inherited from
    IBM APL2),
    so some other name(s) are needed.

    Before implementing too much in advance, it would be good to
    present the
    intended syntax and semantics on bug-apl and solicit opinions.

    /// Jürgen


    On 09/22/2017 04:59 PM, Elias Mårtenson wrote:
    I did not know this. I took a look at Dyalog's API and it's not
    possible to implement it fully, as it relies on their object
    oriented features. However, the basic functionality wouldn't be
    hard to replicate, if that is something that is desired.

    Jürgen, what is your opinion on this?

    On 22 September 2017 at 20:21, Jay Foad <address@hidden
    <mailto:address@hidden>> wrote:

        FYI Dyalog has operators ⎕S (search) and ⎕R (replace) which
        are implemented with PCRE:

        ('[Aa]..'⎕S'&')'Dyalog APL'
        ┌───┬───┐
        │alo│APL│
        └───┴───┘
        ('red' 'green'⎕R'green' 'blue')'red orange yellow green blue'
        green orange yellow blue blue

        http://help.dyalog.com/16.0/Content/Language/System%20Functions/r.htm
        <http://help.dyalog.com/16.0/Content/Language/System%20Functions/r.htm>

        Jay.








reply via email to

[Prev in Thread] Current Thread [Next in Thread]