[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [WIP][PATCH 2/2] pkl,pvm: add support for regular expression
From: |
Mohammad-Reza Nabipoor |
Subject: |
Re: [WIP][PATCH 2/2] pkl,pvm: add support for regular expression |
Date: |
Tue, 18 Apr 2023 01:57:41 +0200 |
Hello Jose.
Thanks for poke 3.1 :)
On Mon, Feb 20, 2023 at 02:43:47PM +0100, Jose E. Marchesi wrote:
>
> > On Fri, Feb 17, 2023 at 12:19:51PM +0100, Jose E. Marchesi wrote:
> >>
> >> >
> >> > What about having a new compile-time type for matched entities.
> >> > Both useful in regular expression matching for strings and array of
> >> > characters.
> >> >
> >> > Something like this:
> >> >
> >> > ```poke
> >> > var m1 = "Hello pokers!" ~ /[hH]ello/,
> >> > m2 = [0x00UB, 0x11UB, 0x22UB] ~ /\x11\x22/;
> >> >
> >> > if (m)
> >> > {
> >> > printf "matched at index %v and offset %v\n", m.index_begin,
> >> > m.offset_begin;
> >> > assert ("Hello pokers!"[m.index_begin:m.index_end] == "Hello");
> >> > }
> >> > else
> >> > {
> >> > assert (m.index_begin ?! E_elem);
> >> > assert (m.offset_begin ?! E_elem);
> >> > }
> >> > ```
> >> >
> >> > We can use other fields for the giving the access to sub-groups.
> >> >
> >> > We can take an approach similar to `Exception` struct. But for
> >> > `Matched`.
> >> > Compiler can cast it to boolean when necessary.
> >>
> >> The idea is interesting. But I don't like the part of changing the
> >> semantics of `if' like this: it is not orthogonal.
> >>
> >> Note that the syntactic construction that uses Exception only works with
> >> exceptions:
> >>
> >> try STMT; catch if EXCEPTION { ... }
> >>
> >> If we could come with a syntactic construction for regular expression
> >> matching, then it would be better IMO.
> >>
> >>
> >
> >
> > What about this syntax:
> >
> > ```poke
> > var matched_p = "Hello pokers!" ~? /[hH]ello/,
> > matchinfo = "Hello pokers!" ~ /[hH]ello/;
> >
> > assert (matched_p isa int<32>);
> > assert (matchinfo isa Matched);
> >
> > if (matchinfo.matched_p) { ... }
> > ```
>
> Hmm... that has the disadvantage of having to match twice.
>
> It seems to me, we could make use of the exceptions by having ~ return a
> Match struct and raising an E_nomatch exception when there is no match.
>
> Then we can use the normal operators ?! and try-until and try-catch to
> check for when there is no match.
>
As we discussed some time ago, to keep the matching functionality consistent, we
can use closures (the `_pkl_regexp_matcher' function in the following patch).
The `_pkl_regexp_matcher' will return a closure which will return true/false
for an input string. It also takes two more optional parameters (one to specify
the start index and the other to report the sub-matches to the user).
I re-implemented the `pk_regexp_match' and `pk_regexp_gmatch' using
`_pkl_regexp_matcher'.
If you do like the approach, I can add the ~ operator to the language which
expects a string on the LHS and a function with the `_Pkl_Matcher' signature on
the RHS.
I also can add `/.../' construct to evaluate to a `_Pkl_Matcher' for the
specified regexp.
```poke
var matched_p = "Hi" ~ /[hH]/;
assert (matched_p);
assert (/[hH]/ ("Hi"));
assert (/[Bb]/ ("HiBye", 2));
```
WDYT?
- Maybe ~ is not necessary and /.../ is enough?
- I'm not sure reporting sub-match using `_Pkl_Regexp_Match' type is the right
way. What about accepting `int<32>[2]` as the sub-index? Then we'll call the
`clbk` several times, each time with a pair of indices).
Regards,
Mohammad-Reza
```
diff --git a/libpoke/pkl-rt.pk b/libpoke/pkl-rt.pk
index 896aeb44..13d6460b 100644
--- a/libpoke/pkl-rt.pk
+++ b/libpoke/pkl-rt.pk
@@ -1819,6 +1819,63 @@ fun _pkl_re_gmatch = (string regex, string str,
return result;
}
+type _Pkl_MatcherCallback = (any)void;
+type _Pkl_Matcher = (string, int<32>?, _Pkl_MatcherCallback?)int<32>;
+
+fun _pkl_regexp_matcher = (string regex) _Pkl_Matcher:
+{
+ return lambda (string str,
+ int<32> start = 0,
+ _Pkl_MatcherCallback clbk = lambda (any v) void: {}) int<32>:
+ {
+ var result = _Pkl_Regexp_Match {};
+
+ /* HACK This is equivalent to `push null'.
+ Until we get a more powerful assembler, we have to use this
+ trick. */
+ var opq = asm any: ("push 7");
+
+ /* Unfortunately we have to compile the regexp in every invocation.
+ The reason is to not leak the opaque value (we have to invoke `refree'
+ instruction to free the resources explicitly). */
+ {
+ var err = asm any: ("push 7");
+
+ asm ("recomp" : opq, err : regex);
+ if (asm int<32>: ("nn; nip" : err))
+ raise Exception {code = EC_inval,
+ name = "invalid regular expression: " + err as
string,
+ exit_status = 1};
+ }
+
+ asm ("remtch; nip" : result.count : opq, str, start);
+ {
+ var subnum = 0UL;
+
+ asm ("resubnum; nip" : subnum : opq);
+ result.submatches = int<32>[2][subnum] ();
+ for (var i = 0UL; i != subnum; ++i)
+ {
+ asm ("resubref; rot; drop"
+ : result.submatches[i][0], result.submatches[i][1]
+ : opq, i);
+ }
+ }
+ asm ("refree" :: opq);
+
+ if (result.count == -2)
+ raise Exception {code = EC_inval,
+ name = "regular expression match function internal
error",
+ exit_status = 1};
+
+ var found_p = result.count != -1;
+
+ if (found_p)
+ clbk (result);
+ return found_p;
+ };
+}
+
/**** Set the default load path ****/
immutable var load_path = "";
diff --git a/libpoke/std.pk b/libpoke/std.pk
index bcc0d1cc..01c30073 100644
--- a/libpoke/std.pk
+++ b/libpoke/std.pk
@@ -866,7 +866,7 @@ fun pk_vercmp = (any _a, any _b) int<32>:
fun pk_regexp_match = (string regex, string str, int<32> start = 0) int<32>:
{
- return _pkl_re_match (regex, str, start);
+ return _pkl_regexp_matcher (regex) (str, start);
}
type Pk_Regexp_Match =
@@ -879,7 +879,15 @@ type Pk_Regexp_Match =
fun pk_regexp_gmatch = (string regex, string str,
int<32> start = 0) Pk_Regexp_Match:
{
- var result = _pkl_re_gmatch (regex, str, start);
+ var result = Pk_Regexp_Match {};
- return Pk_Regexp_Match {count=result.count, submatches=result.submatches};
+ _pkl_regexp_matcher (regex) (str, start, lambda (any v) void:
+ {
+ var m = v as _Pkl_Regexp_Match;
+
+ result.count = m.count;
+ result.submatches = m.submatches;
+ });
+
+ return result;
}
```
- Re: [WIP][PATCH 2/2] pkl,pvm: add support for regular expression,
Mohammad-Reza Nabipoor <=