[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [RFC] Array types bounded by predicate
From: |
Jose E. Marchesi |
Subject: |
Re: [RFC] Array types bounded by predicate |
Date: |
Mon, 16 Sep 2024 21:29:19 +0200 |
User-agent: |
Gnus/5.13 (Gnus v5.13) |
Forgot to mention that the predicates can of course also raise
E_constraint and other exceptions. This is a way to support data
integrity in array elements without having to use an "intermediate"
struct type.
Salud!
> Hello people!
>
> First of all, I will abuse your patience with a little recap. Currently
> the Poke language supports three kind of array types:
>
> 1. "Unbounded" array types, like
>
> int<32>[]
>
> When mapped, values get added to the array until either EOF happens
> or a constraint expression fails. Suppose for example:
>
> (poke) type Printable_Char = struct { uint<8> c : c >= 32 && c <= 126; }
> (poke) Printable_Char[] @ 0#B
>
> The resulting array will contain all printable chars from the
> beginning of the IO space to either the end or to where a
> non-printable char is found. The first non-printable char (the
> Printable_Char whose mapping results in E_constraint) is not included
> in the array.
>
> Constructing an unbounded array results in an empty array:
>
> (poke) Printable_Char[]()
> []
>
> 2. Array types bounded by number of elements, like
>
> int<32>[10]
>
> Where the index can be any Poke expression that evaluates to an
> integral value.
>
> When mapped, exactly that number of values are read from the IO space
> to conform the elements of the new array. If an array of the
> specified numer of elements cannot be mapped, i.e. if the bound is
> not satisfied, then an exception gets raised.
>
> Constructing an array bounded by number of elements results in an
> array with that number of elements:
>
> (poke) int<32>[5]()
> [0,0,0,0,0]
> (poke) int<32>[5](100)
> [100,100,100,100,100]
>
> 3. Array types bounded by size, like
>
> int<32>[16#B]
>
> Where the index can be any Poke expression that evaluates to an
> offset value, which is the size of the whole array, not of each
> element.
>
> When mapped, an exact number values are read from the IO space to
> conform the elements of the new array, so the total size of the array
> equals the size bound. If no exact number of values can be mapped to
> satisfy the bound an exception is raised.
>
> Constructing an array bounded by size results in an array with the
> number of elements that satisfy the bound:
>
> (big:poke) int<32>[16#B]()
> [0,0,0,0]
>
> This works well.
>
> However, consider the following two real-life situations:
>
>
> a) The ASN1 BER encoding specify variable byte sequences, which are
> basically streams of bytes that end with two consecutive zero bytes.
>
> This is how the asn1-ber.pk pickle implements these sequences:
>
> type BER_Variable_Contents =
> struct
> {
> type Datum =
> union
> {
> uint<8>[2] pair : pair[1] != 0UB;
> uint<8> single : single != 0UB;
> };
>
> Datum[] data;
> uint<8>[2] end : end == [0UB,0UB];
>
> method get_bytes = uint<8>[]:
> {
> var bytes = uint<8>[]();
> for (d in data)
> try bytes += d.pair;
> catch if E_elem { bytes += [d.single]; }
> return bytes;
> }
> };
>
> As you can see, the trick is to use an unbounded array of `Data'
> values, each of which is either a pair of bytes the second of which
> is not zero, or a single non-zero byte. The finalizing two zero
> bytes follow. This approach has several problems: it is difficult to
> understand at first sight, it is bulky, inefficient and it requires
> an additional method (get_bytes) to construct the desired array from
> the rather more convoluted underlying data structure.
>
> b) The JoJo Diff format also uses variable sequences of bytes, but this
> time each sequence is finalized by one of several possible escapse
> sequences of two bytes, which are themselves _not_ part of the
> resulting sequence.
>
> This is how the jojodiff.pk pickle implements these sequences:
>
> type Jojo_Datum =
> union
> {
> uint<8>[2] escaped_escape : escaped_escape == [JOJO_ESC, JOJO_ESC];
> uint<8>[2] pair : pair[0] == JOJO_ESC
> && !(pair[1] in [JOJO_MOD, JOJO_INS, JOJO_DEL, JOJO_EQL,
> JOJO_BKT]);
> uint<8> value : value != JOJO_ESC;
> };
>
>
> type Jojo_Bytes =
> struct
> {
> Jojo_Datum[] data : data'length > 0;
>
> method get_count = Jojo_Offset:
> {
> var len = 0UL#B;
>
> for (d in data)
> len += !(d.pair ?! E_elem) ? 2#B : 1#B;
> return len;
> }
> };
>
> This uses a similar technique than the BER case. Again, this is
> cumbersome, not trivial to understand, and again it builds convoluted
> data structures that have to be turned into the expected simple form
> of a sequence of elements by a method.
>
> There are many more similar such cases. In order to better support
> these, I am proposing to a fourth kind of array type:
>
> 4. Array types bounded by predicate, like
>
> int<32>[lambda (int[] a, int e) int: { return e != 0; }]
>
> Where the index is any expression that evaluates to a closure with
> prototype:
>
> (ELEMTYPE[],ELEMTYPE)int<32>
>
> The predicate gets called for each value that is to be appended to
> the array when mapping or constructing an array value. The first
> argument is the so-far mapped/constructed array. The first time the
> predicate gets called this array is empty. The second argument is
> the value that is a candidate to be added as an element.
>
> If the predicate returns a positive number the candidate value gets
> added to the array and the construction/mapping process continues
> with a new candidate value.
>
> If the predicate returns 0 the candidate value gets added to the
> array and the construction/mapping process is stopped.
>
> If the predicate returns -N the candidate value gets added to the
> array, N elements get trimmed from the right of the array, and the
> construction/mapping process is stopped.
>
> The previous two data structures can now be expressed in a much better
> way. A BER variable byte sequence becomes:
>
> fun ber_is_byte_array = (uint<8>[] arr, uint<8> elem) int<64>:
> {
> if (arr'length > 0 && arr[arr'length-1] == 0UB && elem == 0UB)
> /* The trailer zero bytes are part of the array. */
> return 0;
> else
> /* Continue mapping/constructing. */
> return 1;
> }
>
> type BER_Variable_Contents = uint<8>[ber_is_byte_array];
>
> Whereas the Jojo Diff variable sequence of bytes becomes something like:
>
> fun jojo_is_bytes = (uint<8>[] arr, uint<8> elem) int<64>:
> {
> if (arr'length > 1
> && [arr[arr'length - 1],elem] in [[JOJO_ESC, JOJO_INS],
> [JOJO_ESC, JOJO_MOD],
> ...])
> {
> /* The escape sequence is _not_ part of the
> resulting array. */
> return -2;
> }
> else
> /* Continue the mapping/construction. */
> return 1;
> }
>
> type Jojo_Bytes = uint<8>[jojo_is_bytes];
>
> An interesting aspect of this is that nothing prevents the predicate
> functions to alter the array mapped/constructed so far, do all sort of
> stuff based on global variables, do their own exploratory mapping, and
> other crazy stuff... guaranteed fun and sheer power 8-)
>
> No additional syntax is required, and I think it fits well with the
> already used notion of "the bounding varies depending on the kind of
> expression one specifies in the [XXX] part of the array type specifier".
>
> Comments, thoughts?