poke-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [WIP][PATCH 2/2] pkl,pvm: add support for regular expression


From: Mohammad-Reza Nabipoor
Subject: Re: [WIP][PATCH 2/2] pkl,pvm: add support for regular expression
Date: Tue, 18 Apr 2023 01:57:41 +0200

Hello Jose.

Thanks for poke 3.1 :)


On Mon, Feb 20, 2023 at 02:43:47PM +0100, Jose E. Marchesi wrote:
> 
> > On Fri, Feb 17, 2023 at 12:19:51PM +0100, Jose E. Marchesi wrote:
> >> 
> >> >
> >> > What about having a new compile-time type for matched entities.
> >> > Both useful in regular expression matching for strings and array of
> >> > characters.
> >> >
> >> > Something like this:
> >> >
> >> > ```poke
> >> > var m1 = "Hello pokers!" ~ /[hH]ello/,
> >> >     m2 = [0x00UB, 0x11UB, 0x22UB] ~ /\x11\x22/;
> >> >
> >> > if (m)
> >> >   {
> >> >     printf "matched at index %v and offset %v\n", m.index_begin, 
> >> > m.offset_begin;
> >> >     assert ("Hello pokers!"[m.index_begin:m.index_end] == "Hello");
> >> >   }
> >> > else
> >> >   {
> >> >     assert (m.index_begin ?! E_elem);
> >> >     assert (m.offset_begin ?! E_elem);
> >> >   }
> >> > ```
> >> >
> >> > We can use other fields for the giving the access to sub-groups.
> >> >
> >> > We can take an approach similar to `Exception` struct.  But for 
> >> > `Matched`.
> >> > Compiler can cast it to boolean when necessary.
> >> 
> >> The idea is interesting.  But I don't like the part of changing the
> >> semantics of `if' like this: it is not orthogonal.
> >> 
> >> Note that the syntactic construction that uses Exception only works with
> >> exceptions:
> >> 
> >>   try STMT; catch if EXCEPTION { ... }
> >> 
> >> If we could come with a syntactic construction for regular expression
> >> matching, then it would be better IMO.
> >> 
> >> 
> >
> >
> > What about this syntax:
> >
> > ```poke
> > var matched_p = "Hello pokers!" ~? /[hH]ello/,
> >     matchinfo = "Hello pokers!" ~ /[hH]ello/;
> >
> > assert (matched_p isa int<32>);
> > assert (matchinfo isa Matched);
> >
> > if (matchinfo.matched_p) { ... }
> > ```
> 
> Hmm... that has the disadvantage of having to match twice.
> 
> It seems to me, we could make use of the exceptions by having ~ return a
> Match struct and raising an E_nomatch exception when there is no match.
> 
> Then we can use the normal operators ?! and try-until and try-catch to
> check for when there is no match.
> 


As we discussed some time ago, to keep the matching functionality consistent, we
can use closures (the `_pkl_regexp_matcher' function in the following patch).
The `_pkl_regexp_matcher' will return a closure which will return true/false
for an input string. It also takes two more optional parameters (one to specify
the start index and the other to report the sub-matches to the user).
I re-implemented the `pk_regexp_match' and `pk_regexp_gmatch' using 
`_pkl_regexp_matcher'.

If you do like the approach, I can add the ~ operator to the language which
expects a string on the LHS and a function with the `_Pkl_Matcher' signature on
the RHS.
I also can add `/.../' construct to evaluate to a `_Pkl_Matcher' for the
specified regexp. 


```poke
var matched_p = "Hi" ~ /[hH]/;

assert (matched_p);
assert (/[hH]/ ("Hi"));
assert (/[Bb]/ ("HiBye", 2));
```


WDYT?

- Maybe ~ is not necessary and /.../ is enough?
- I'm not sure reporting sub-match using `_Pkl_Regexp_Match' type is the right
  way. What about accepting `int<32>[2]` as the sub-index? Then we'll call the
  `clbk` several times, each time with a pair of indices).


Regards,
Mohammad-Reza


```
diff --git a/libpoke/pkl-rt.pk b/libpoke/pkl-rt.pk
index 896aeb44..13d6460b 100644
--- a/libpoke/pkl-rt.pk
+++ b/libpoke/pkl-rt.pk
@@ -1819,6 +1819,63 @@ fun _pkl_re_gmatch = (string regex, string str,
   return result;
 }
 
+type _Pkl_MatcherCallback = (any)void;
+type _Pkl_Matcher = (string, int<32>?, _Pkl_MatcherCallback?)int<32>;
+
+fun _pkl_regexp_matcher = (string regex) _Pkl_Matcher:
+{
+  return lambda (string str,
+                 int<32> start = 0,
+                 _Pkl_MatcherCallback clbk = lambda (any v) void: {}) int<32>:
+    {
+      var result = _Pkl_Regexp_Match {};
+
+      /* HACK This is equivalent to `push null'.
+         Until we get a more powerful assembler, we have to use this
+         trick.  */
+      var opq = asm any: ("push 7");
+
+      /* Unfortunately we have to compile the regexp in every invocation.
+         The reason is to not leak the opaque value (we have to invoke `refree'
+         instruction to free the resources explicitly).  */
+      {
+        var err = asm any: ("push 7");
+
+        asm ("recomp" : opq, err : regex);
+        if (asm int<32>: ("nn; nip" : err))
+          raise Exception {code = EC_inval,
+                           name = "invalid regular expression: " + err as 
string,
+                           exit_status = 1};
+      }
+
+      asm ("remtch; nip" : result.count : opq, str, start);
+      {
+        var subnum = 0UL;
+
+        asm ("resubnum; nip" : subnum : opq);
+        result.submatches = int<32>[2][subnum] ();
+        for (var i = 0UL; i != subnum; ++i)
+          {
+            asm ("resubref; rot; drop"
+                 : result.submatches[i][0], result.submatches[i][1]
+                 : opq, i);
+          }
+      }
+      asm ("refree" :: opq);
+
+      if (result.count == -2)
+        raise Exception {code = EC_inval,
+                         name = "regular expression match function internal 
error",
+                         exit_status = 1};
+
+      var found_p = result.count != -1;
+
+      if (found_p)
+        clbk (result);
+      return found_p;
+    };
+}
+
 /**** Set the default load path ****/
 
 immutable var load_path = "";
diff --git a/libpoke/std.pk b/libpoke/std.pk
index bcc0d1cc..01c30073 100644
--- a/libpoke/std.pk
+++ b/libpoke/std.pk
@@ -866,7 +866,7 @@ fun pk_vercmp = (any _a, any _b) int<32>:
 
 fun pk_regexp_match = (string regex, string str, int<32> start = 0) int<32>:
 {
-  return _pkl_re_match (regex, str, start);
+  return _pkl_regexp_matcher (regex) (str, start);
 }
 
 type Pk_Regexp_Match =
@@ -879,7 +879,15 @@ type Pk_Regexp_Match =
 fun pk_regexp_gmatch = (string regex, string str,
                         int<32> start = 0) Pk_Regexp_Match:
 {
-  var result = _pkl_re_gmatch (regex, str, start);
+  var result = Pk_Regexp_Match {};
 
-  return Pk_Regexp_Match {count=result.count, submatches=result.submatches};
+  _pkl_regexp_matcher (regex) (str, start, lambda (any v) void:
+    {
+      var m = v as _Pkl_Regexp_Match;
+
+      result.count = m.count;
+      result.submatches = m.submatches;
+    });
+
+  return result;
 }
```




reply via email to

[Prev in Thread] Current Thread [Next in Thread]