poke-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [RFC] Array types bounded by predicate


From: Jose E. Marchesi
Subject: Re: [RFC] Array types bounded by predicate
Date: Mon, 16 Sep 2024 21:29:19 +0200
User-agent: Gnus/5.13 (Gnus v5.13)

Forgot to mention that the predicates can of course also raise
E_constraint and other exceptions.  This is a way to support data
integrity in array elements without having to use an "intermediate"
struct type.

Salud!

> Hello people!
>
> First of all, I will abuse your patience with a little recap.  Currently
> the Poke language supports three kind of array types:
>
> 1. "Unbounded" array types, like
>
>     int<32>[]
>
>    When mapped, values get added to the array until either EOF happens
>    or a constraint expression fails.  Suppose for example:
>
>    (poke) type Printable_Char = struct { uint<8> c : c >= 32 && c <= 126; }
>    (poke) Printable_Char[] @ 0#B
>
>    The resulting array will contain all printable chars from the
>    beginning of the IO space to either the end or to where a
>    non-printable char is found.  The first non-printable char (the
>    Printable_Char whose mapping results in E_constraint) is not included
>    in the array.
>
>    Constructing an unbounded array results in an empty array:
>
>    (poke) Printable_Char[]()
>    []
>
> 2. Array types bounded by number of elements, like
>
>      int<32>[10]
>
>    Where the index can be any Poke expression that evaluates to an
>    integral value.
>
>    When mapped, exactly that number of values are read from the IO space
>    to conform the elements of the new array.  If an array of the
>    specified numer of elements cannot be mapped, i.e. if the bound is
>    not satisfied, then an exception gets raised.
>
>    Constructing an array bounded by number of elements results in an
>    array with that number of elements:
>
>    (poke) int<32>[5]()
>    [0,0,0,0,0]
>    (poke) int<32>[5](100)
>    [100,100,100,100,100]
>
> 3. Array types bounded by size, like
>
>      int<32>[16#B]
>
>    Where the index can be any Poke expression that evaluates to an
>    offset value, which is the size of the whole array, not of each
>    element.
>
>    When mapped, an exact number values are read from the IO space to
>    conform the elements of the new array, so the total size of the array
>    equals the size bound.  If no exact number of values can be mapped to
>    satisfy the bound an exception is raised.
>
>    Constructing an array bounded by size results in an array with the
>    number of elements that satisfy the bound:
>
>    (big:poke) int<32>[16#B]()
>    [0,0,0,0]
>
> This works well.
>
> However, consider the following two real-life situations:
>
>
> a) The ASN1 BER encoding specify variable byte sequences, which are
>    basically streams of bytes that end with two consecutive zero bytes.
>
>    This is how the asn1-ber.pk pickle implements these sequences:
>
>    type BER_Variable_Contents =
>      struct
>      {
>        type Datum =
>          union
>          {
>            uint<8>[2] pair : pair[1] != 0UB;
>            uint<8> single : single != 0UB;
>          };
>
>        Datum[] data;
>        uint<8>[2] end : end == [0UB,0UB];
>
>        method get_bytes = uint<8>[]:
>        {
>          var bytes = uint<8>[]();
>          for (d in data)
>            try bytes += d.pair;
>            catch if E_elem { bytes += [d.single]; }
>          return bytes;
>        }
>      };
>
>    As you can see, the trick is to use an unbounded array of `Data'
>    values, each of which is either a pair of bytes the second of which
>    is not zero, or a single non-zero byte.  The finalizing two zero
>    bytes follow.  This approach has several problems: it is difficult to
>    understand at first sight, it is bulky, inefficient and it requires
>    an additional method (get_bytes) to construct the desired array from
>    the rather more convoluted underlying data structure.
>
> b) The JoJo Diff format also uses variable sequences of bytes, but this
>    time each sequence is finalized by one of several possible escapse
>    sequences of two bytes, which are themselves _not_ part of the
>    resulting sequence.
>
>    This is how the jojodiff.pk pickle implements these sequences:
>
>      type Jojo_Datum =
>        union
>        {
>          uint<8>[2] escaped_escape : escaped_escape == [JOJO_ESC, JOJO_ESC];
>          uint<8>[2] pair : pair[0] == JOJO_ESC
>              && !(pair[1] in [JOJO_MOD, JOJO_INS, JOJO_DEL, JOJO_EQL, 
> JOJO_BKT]);
>          uint<8> value : value != JOJO_ESC;
>        };
>
>
>     type Jojo_Bytes =
>       struct
>       {
>         Jojo_Datum[] data : data'length > 0;
>
>         method get_count = Jojo_Offset:
>         {
>           var len = 0UL#B;
>
>           for (d in data)
>             len += !(d.pair ?! E_elem) ? 2#B : 1#B;
>           return len;
>         }
>       };
>
>    This uses a similar technique than the BER case.  Again, this is
>    cumbersome, not trivial to understand, and again it builds convoluted
>    data structures that have to be turned into the expected simple form
>    of a sequence of elements by a method.
>
> There are many more similar such cases.  In order to better support
> these, I am proposing to a fourth kind of array type:
>
> 4. Array types bounded by predicate, like
>
>      int<32>[lambda (int[] a, int e) int: { return e != 0; }]
>
>    Where the index is any expression that evaluates to a closure with
>    prototype:
>
>      (ELEMTYPE[],ELEMTYPE)int<32>
>
>    The predicate gets called for each value that is to be appended to
>    the array when mapping or constructing an array value.   The first
>    argument is the so-far mapped/constructed array.  The first time the
>    predicate gets called this array is empty.  The second argument is
>    the value that is a candidate to be added as an element.
>
>    If the predicate returns a positive number the candidate value gets
>    added to the array and the construction/mapping process continues
>    with a new candidate value.
>
>    If the predicate returns 0 the candidate value gets added to the
>    array and the construction/mapping process is stopped.
>
>    If the predicate returns -N the candidate value gets added to the
>    array, N elements get trimmed from the right of the array, and the
>    construction/mapping process is stopped.
>
> The previous two data structures can now be expressed in a much better
> way.  A BER variable byte sequence becomes:
>
>   fun ber_is_byte_array = (uint<8>[] arr, uint<8> elem) int<64>:
>   {
>     if (arr'length > 0 && arr[arr'length-1] == 0UB && elem == 0UB)
>       /* The trailer zero bytes are part of the array.  */
>       return 0;
>     else
>       /* Continue mapping/constructing.  */
>       return 1;
>   }
>
>   type BER_Variable_Contents = uint<8>[ber_is_byte_array];
>
> Whereas the Jojo Diff variable sequence of bytes becomes something like:
>
>   fun jojo_is_bytes = (uint<8>[] arr, uint<8> elem) int<64>:
>   {
>     if (arr'length > 1
>         && [arr[arr'length - 1],elem] in [[JOJO_ESC, JOJO_INS],
>                                           [JOJO_ESC, JOJO_MOD],
>                                           ...])
>       {
>         /* The escape sequence is _not_ part of the
>            resulting array.  */
>         return -2;
>       }
>     else
>      /* Continue the mapping/construction.  */
>      return 1;
>   }
>
>   type Jojo_Bytes = uint<8>[jojo_is_bytes];
>
> An interesting aspect of this is that nothing prevents the predicate
> functions to alter the array mapped/constructed so far, do all sort of
> stuff based on global variables, do their own exploratory mapping, and
> other crazy stuff... guaranteed fun and sheer power 8-)
>
> No additional syntax is required, and I think it fits well with the
> already used notion of "the bounding varies depending on the kind of
> expression one specifies in the [XXX] part of the array type specifier".
>
> Comments, thoughts?



reply via email to

[Prev in Thread] Current Thread [Next in Thread]