poke-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Documentation


From: Jose E. Marchesi
Subject: Re: Documentation
Date: Tue, 15 Sep 2020 11:38:01 +0200
User-agent: Gnus/5.13 (Gnus v5.13) Emacs/28.0.50 (gnu/linux)

>>The manual is way far from being finished.  I think publishing it
>>could confuse people at this point.
>
> Publishing of an unfinished **good thing** is not bad!
> And it's already in an acceptable state. It's very helpful.

I think you are too optimistic regarding the current state of the
manual...  or maybe I'm being too pesimist!  I really wish someone would
help us with that.

>>Went thru it and have a few comments, but it would help if you could
>>send learn-poke-in-y-minutes.pk as either inline or as an attachment so
>>we can discuss it here.  (The gitlab.com link you provided won't work
>>with a Javascript-disabled browser, and wget'ting it gives a useless
>>.html document.)
>
> Sorry for the inconvenience. Everyday I wish for a JavaScript-less world (and
> in general a web-less world!) :D

Amen to that! :)

> /* Copyright (C) 2020, Mohammad-Reza Nabipoor */
> /* SPDX-License-Identifier: GFDL-1.3-or-later */

If I was you I would use GPLv3+ for that file, as it is basically
commented code.

> /* GNU poke is an interactive editor for binary data. But it's not just an
>  * editor, it provides a full-fledged procedural, interactive programming
>  * language designed to describe data structures and to operate on them.
>  * The programming language called Poke (with upper-case P).

_is_ called Poke.

>  *
>  * When the user have a description of binary data, he/she can *map* it on
>  * the actual data and start poking the data! The user can inspect and modify
>  * data.
>  */
>
> /* First start with nomenclature:
>  *
>  *   - poke      The editor program (also called GNU poke)
>  *   - Poke      Domain-specific programming language that used by `poke`
>  *   - pickle    A Poke source file. The extension of filename is `.pk`

A pickle is not any Poke source file: it is a Poke source file that
implements some definited domain, like a file format (like ELF) or some
functional domain (like time.pk).  You may want to reflect that in the
summary..

>  */
>
> /* Let's talk about the Poke! */
>
> /* Variables
>  *
>  * We can define variables in Poke using `defvar` keyword:
>  *
>  *   defvar NAME_OF_VARIABLE = VALUE
>  */
>
> defvar an_integer = 10;
> defvar a_string = "hello, poke users!";
>
> /* Values
>  *
>  * Poke programming language has the following types of value:
>  *
>  *   - Integer
>  *   - String
>  *   - Array
>  *   - Offset
>  *   - Struct
>  *   - Union
>  *   - Function
>  */
>
>
> /* Integer values */
> defvar decimal = 10;
> defvar hexadecimal = 0xff;
> defvar binary = 0b1100;
> defvar octal = 0o777;
>
> defvar si8  = 1B;     /* byte (8-bit)  */
> defvar si16 = 2H;     /* byte (16-bit) */
> defvar si32 = 3;      /* int  (32-bit) */
> defvar si64 = 4L;     /* long (64-bit) */
>
> defvar ui8  = 4UB;    /* unsigned byte (8-bit)  */
> defvar ui16 = 5UH;    /* unsigned int  (16-bit) */
> defvar ui32 = 6U;     /* unsigned int  (32-bit) */
> defvar ui64 = 7UL;    /* unsigned long (64-bit) */
>
>
> /* String values (null-terminated) */
> defvar foobar_string = "foo\nbar";
> defvar empty_string = "";
>
>
> /* Array values */
> defvar arr1 = [1, 2, 3];
> defvar arr2 = [[1, 2], [3, 4]];
>
> defvar elem10 = arr1[0];    /* Arrays are indexed using the usual notation */
> defvar elem12 = arr1[2];    /* This is the last element of `arr1`: 3 */
>
> /* If you try to access elements beyond the bounds, you'll get an
>  * `E_out_of_bound_exception` exception.
>  */
> /* defvar elem1x = arr1[3]; */
> /* defvar elem1y = arr1[-1]; */
>
> /* Array trimming: Extraction of a subset of the array */
> defvar arr3   = arr1[0:1];  /* arr3 == [1, 2] */

Probably it is worth mentioning that the provided indexes in a trim are
both inclusive.

> /* Offset values
>  *
>  * Poke does not using integers to specify offsets in binary data, it has a
>  * primitive type for that: offset!
>  *
>  * Offsets have two parts:
>  *  - magnitude (an integer)
>  *  - unit      (b (bit), byte (B), etc.)
>  *
>  * Offsets are also useful for specifying the size.
>  */
>
> /* Offsets with named units */
> defvar off_8_bits     = 8#b;
> defvar off_23_bytes   = 23#B;
> defvar off_2000_bits  = 2#Kb;
> defvar off_2000_bytes = 2#KB;
> defvar off_3_nibbles  = 3#N;    /* 3 nibbles (each nibble is 4 bits) */
>
> defvar off_1_byte = #B;   /* You can omit magnitude if it's 1 */
>
> /* Offsets with numeric units */
> defvar off_8_8 = 8#8;    /* magnitude: 8, unit: 8 bits */
> defvar off_2_3 = 2#3;    /* magnitude: 2, unit: 3 bits */
>
> /* Offset arithmetic
>  *
>  * OFF +- OFF -> OFF
>  * OFF *  INT -> OFF
>  * OFF /  OFF -> INT
>  * OFF %  OFF -> OFF
>  */
> defvar off_1_plus_2   = 1#B + 2#B;    /* 3#B  */
> defvar off_1_minus_2  = 1#B - 2#B;    /* -1#B */
> defvar off_8_times_10 = 8#B * 10;     /* 80#B */
> defvar off_10_times_8 = 10  * 8#B;    /* 80#B */
> defvar off_7_div_1    = 7#B / 1#B;    /* 7    */  /* This is an integer */
> defvar off_7_mod_3    = 7#B % 3#B;    /* 1#B  */

Ceiling division (and modulus) are also available for offsets.

> /* The following units are pre-defined in poke:
>  *
>  *   b, N, B, Kb, KB, Mb, MB, Gb, GB, Kib, KiB, Mib, MiB, Gib, GiB
>  */

Introduce `defunit' here?

>
>
> /* Types
>  *
>  * Before talking about `struct` values, it'd be nice to first talk about 
> types
>  * in Poke.
>  */
>
> /* Integer types
>  *
>  * Most general-purpose programming languages provide a small set of integer
>  * types. Poke, on the contrary, provides a rich set of integer types 
> featuring
>  * different widths, in both signed and unsigned variants.
>  *
>  * `int<N>` is a signed integer with `N`-bit width. `N` can be an integer
>  * literal in the range `[1, 64]`.
>  *
>  * `uint<N>` is the unsigned variant.
>  *
>  * Examples:
>  *
>  *    uint<1>
>  *    uint<7>
>  *    int<64>
>  */
>
> /* String type
>  *
>  * There is one string type in Poke: `string`
>  * Strings in Poke are null-terminated.
>  */
>
> /* Array types
>  *
>  * There are three kinds of array types:
>  *
>  *   - Unbounded: arrays that have no explicit boundaries, like `int<32>[]`
>  *   - Bounded by number of elements, like `int<64>[10]`
>  *   - Bounded by size, like `uint<32>[8#B]`
>  */
>
> /* Offset types
>  *
>  * Offset types are denoted as `offset<BASE_TYPE,UNIT>`, where BASE_TYPE is
>  * an integer type and UNIT the specification of an unit.
>  *
>  * Examples:
>  *
>  *   offset<int<32>,B>
>  *   offset<uint<12>,Kb>
>  */
>
> /* Struct types
>  *
>  * Structs are the main abstraction that Poke provides to structure data. A
>  * collection of heterogeneous values.
>  *
>  * And there's no padding or alignment between the fields of structs.

WYPIWYG (What You Poke Is What You Get) ;)

>  *
>  * Examples:
>  *
>  *   struct {
>  *     uint<32> i32;
>  *     uint<64> i64;
>  *   }
>  *
>  *   struct {
>  *     uint<16> flags;
>  *     uint<8>[32] data;
>  *   }
>  *
>  *   struct {
>  *     int<32> code;
>  *     string msg;
>  *     int<32> exit_status;
>  *   }
>  */
>
>
> /* User-declared types
>  *
>  * There's a mechanism to declare new types:
>  *
>  *   deftype NAME = TYPE;
>  *
>  * where NAME is the name of the new type, and TYPE is either a type specifier
>  * or the name of some other type.
>  *
>  * The supported type specifiers are integral types, string type, array types,
>  * struct types, function types, and `any` (The `any` type is used to
>  * implement polymorphism).
>  */
>
> deftype Bit   = uint<1>;
> deftype Int   = int<32>;
> deftype Ulong = uint<64>;
>
> deftype String = string;    /* Just to show that this is possible! */
>
> deftype Buffer  = uint<8>[];        /* Unbounded array of type uint<8> */
> deftype Triple  = int<32>[3];       /* Bounded array of 3 elements */
> deftype Buf1024 = uint<8>[1024#B];  /* Bounded array with size of 1024 bytes 
> */
>
> deftype EmptyStruct = struct {};
> deftype BufferStruct = struct
>   {
>     Buffer buffer;
>   };
> deftype Pair_32_64 =
>   struct
>   {
>     uint<32> i32;
>     uint<64> i64;
>   };
> deftype Packet34 =
>   struct
>   {
>     uint<16> flags;
>     uint<8>[32] data;
>   };
> deftype Error =
>   struct
>   {
>     int<32> code;
>     string msg;
>     int<32> exit_status;
>   };
>
>
> /* Now back to the values */
>
>
> /* Struct values */
>
> defvar empty_struct = EmptyStruct {};
>
> deftype Packet =
>   struct
>   {
>     uint<16> flags;
>     uint<8>[8] data;
>   };
>
> defvar packet_1 =
>   Packet
>   {
>     flags = 0xff00,
>     data = [0UB, 1UB, 2UB, 3UB, 4UB, 5UB, 6UB, 7UB],
>   };
>
> defvar packet_2 =
>   Packet
>   {
>     flags = 1,
>
>     /* The following line is invalid; because type of numbers is `uint<32>`.
>      */
>     /* data = [0, 1, 2, 3, 4, 5, 6, 7], */
>
>     /* User cannot specify less than 8 elements; because the `data` field is a
>      * fixed size array. So the following line is compilation error:
>      */
>     /* data = [0UB, 1UB, ], */
>   };
>
> defvar packet_3 =
>   Packet
>   {
>     /* flags = 0, */    /* Fields can be omitted */
>
>     /* The fifth element (counting from zero) is initialized to `128UB`;
>      * and all uninitialized values before that will be initialized to 
> `128UB`,
>      * too.
>      */
>     data = [1UB, .[5] = 128UB, 2UB, 3UB],
>   };
> /* packet_3 == 
> Packet{flags=0UH,data=[1UB,128UB,128UB,128UB,128UB,128UB,2UB,3UB]}
>  */
>
> deftype Header =
>   struct
>   {
>     uint<8>[2] magic;
>     offset<uint<32>,B> file_size;
>     uint<16>;    /* Reserved */
>     uint<16>;    /* Reserved */
>     offset<uint<32>,B> data_offset;
>   };
>
> deftype Payload =
>   struct
>   {
>     uint<8> magic;
>     uint<32> data_length;
>
>     /* Size of array depends on the `data_length` field */
>     uint<8>[data_length] data;
>   };
>
> /* An interesting feature of Poke is that types also can be used as units for
>  * offsets. The only restriction is that the type should have known size at
>  * compile-time.
>  */
> defvar off_23_packets = 23#Packet;    /* magnitude: 23, unit: Packet */
>
> /* Note that this is invalid and give compilation error:
>  *
>  *   defvar off_buffer = 1#Buffer;
>  *
>  * because `Buffer` is an unbounded array and the size is unknown at
>  * compile-time.
>  */
>
> /* Offset arithmetic with types as unit of offsets
>  */
> defvar packet_size     = 1#Packet / 1#B;    /* 10 */
> defvar two_packet_size = 2 #Packet/#B;      /* 20 */
>
>
> /* Struct Field Constraints
>  *
>  * It is common for struct fields to be constrained to their values to
>  * satisfy some conditions.  Obvious examples are magic numbers, and
>  * specification-derived constraints.
>  */
> deftype HeaderWithMagic =
>   struct
>   {
>     uint<8> magic : magic == 100UB;
>     uint<8> version : version <= 3;
>     offset<uint<32>,B> data_length;
>     uint<8>[data_length] data;
>   };
> /* The constraint expression should evaluate to an integer value; that value
>  * is interpreted as a boolean
>  */
>
> /* The following variable definition will raise an exception:
>  *   unhandled constraint violation exception
>  */
> /* defvar hdrmagic = HeaderWithMagic {}; */
>
> /* This will work because all field constraints are satisfied */
> defvar hdrmagic =
>   HeaderWithMagic
>   {
>     magic = 100UB,
>   };
>
> /* There is another way to specify the constraints: field initializers  */
>
> /* Struct Field Initializers
>  *
>  * Field initializer has two roles:
>  *   - Introduce constraint of the form: `field == initializer_expression`
>  *   - Initialize the field with initializer expression
>  */
> deftype HeaderWithInit =
>   struct
>   {
>     uint<8> magic = 100UB;
>     uint<8> version = 3;
>
>     offset<uint<32>,B> data_length;
>     uint<8>[data_length] data;
>   };
>
> /* With field initializers, this is possible: */
> defvar hdrauto = HeaderWithInit {};
> /* hdrauto.magic == 100UB && hdrauto.version == 3UB */
>
> /* The only limitation is that we cannot specify a constraint for initialized
>  * fields.
>  */
>
>
> /* Functions
>  *
>  * Functions are lexically scoped.
>  */
> defun func1 = (uint<32> arg0, uint<64> arg1) uint<32>:
>   {
>     return arg0 | arg1 .>> 32;    /* `.>>` is bitwise shift right operator */
>   }
>
> defvar three = func1 (1, 2**33);   /* three == 3 (and `**` is power operator) 
> */
>
> defun awesome = (string name) void:
>   {
>     printf ("%s is awesome!\n", name);
>   }
> awesome ("Poke");    /* Will print "Poke is awesome!" on terminal */
>
> defvar N = 10;
> defun Nsquare = int<32>:    /* No input parameter */
>   {
>     /* The `N` variable is captured inside the `Nsquare` function */
>     return N * N;
>   }
>
> defvar Nsq = Nsquare;     /* Nsq == 100 */
>
> N = 20;
> defvar Nsq2 = Nsquare;    /* Nsq2 == 400 */
>
>
> /* Functions with optional arguments
>  *
>  * Note that the value of initialization gets captured in the closure.
>  */
>
> defvar ten = 10;
> defun double32 = (int<32> n = ten) uint<64>:
>   {
>     n = n * 2;
>     return n;
>   }
>
> defvar twenty = double32 ();         /* twenty == 20UL */
> defvar another_twenty = double32;    /* It's OK to omit the `()` */
> defvar thirty = double32 (15);       /* thirty == 30UL */

To show `ten' being lexically closed in `double32', it would be fun to
add something like this:

ten = 11;
double32;
22

:)

> /* Function with no output (a procedure!) */
> defun packet_toggle_flag = (Packet p) void:
>   {
>     p.flags = p.flags ^ 1;
>   }
>
> packet_toggle_flag (packet_1);    /* packet_1.flags == 0xff01 */
>
>
> /* Struct Methods
>  */
> deftype Point =
>   struct
>   {
>     int<32> x;
>     int<32> y;
>
>     method norm_squared = int<32>:
>       {
>         return x*x + y*y;
>       }
>   };
>
> defvar point = Point{ x = 10, y = -1 };
> defvar point_nsq = point.norm_squared;    /* point_nsq == 101 */
>
>
> /* Unions
>  *
>  * Sometimes the structure of binary format can be different depending on some
>  * eariler fields. To describe these kinds of formats, Poke provides `union`s.
>  *
>  * The first field of `union` for which its constraints are satisfied will be
>  * selected.
>  */
> deftype PacketU =
>   struct
>   {
>     uint<8> size;
>
>     union
>     {
>       struct
>       {
>         uint<8> type;
>         uint<8>[size] data;
>       } : size < 32;
>
>       struct
>       {
>         uint<16> type;
>         uint<8>[size - 1] data;
>       } : size < 128;
>
>       struct
>       {
>         uint<16> type;
>         uint<8> flags;
>         uint<8>[size - 3] data;
>       };
>     };
>   };
>
>
> defvar packet_u_1 =
>   PacketU
>   {
>     size = 10,
>   };
> defvar packet_u_2 =
>   PacketU
>   {
>     size = 64,
>   };
> defvar packet_u_3 =
>   PacketU
>   {
>     size = 128,
>   };
>
>
> /* Casts
>  */
> defvar num_u32 = 1;
> defvar num_u64 = num_u32 as uint<64>;
>
>
> /* Attributes
>  *
>  * Each value has a set of attributes.
>  */
>
> /* `size` attribute */
>
> defvar sizeof_num_u32 = num_u32'size;    /* sizeof_num_u32 == 4#B */
> defvar sizeof_num_u64 = num_u64'size;    /* sizeof_num_u64 == 8#B */
>
> defvar sbuf = BufferStruct{};
> defvar sizeof_sbuf = sbuf'size;          /* sizeof_sbuf == 0#B */
> defvar sizeof_packet_1 = packet_1'size;  /* sizeof_packet_1 == 10#B */
>
> /* `length` attribute */
>
> defvar nelem_arr1 = arr1'length;         /* nelem_arr1 == 3 */
> defvar nelem_arrx = [1, 2, 3, 4, 5, 6]'length;    /* nelem_arrx == 6 */
>
> /* For structs it's the number of fields */
> defvar nfields_packet_1 = packet_1'length;      /* nfields_packet_1 == 2 */
>
>
> /* Conditionals
>  *
>  *   - if-else
>  *   - conditional expression
>  */
>
> if (num_u32 & 1) { /* This branch will be evaluated */
>   num_u32 = num_u32 | 2;    /* 1 | 2 == 3 */
>   num_u64 = num_u64 | 4;    /* 1 | 4 == 5 */
> } else {
>   num_u32 = num_u32 | 8;    /* 1 | 8 == 9 */
>   num_u64 = num_u64 | 16;   /* 1 | 16 = 17 */
> }
>
> defvar a_true_value = num_u32 == 3 && num_u64 == 5;
> defvar a_false_value = num_u32 == 9 || num_u64 == 17;
>
> defvar hundred = a_true_value ? 100 : 200;
> defvar thousand = a_false_value ? 200 : 1000;
>
>
> /* Loops
>  *
>  *   - while
>  *   - for-in
>  */
>
> defvar i = 0;
> while (1)
> {
>   i = i + 1;
>   if (i == 10)
>     break;
> }
> /* i == 10 */
>
> print "\nList of maintainers:\n";
> for (i in ["egeyar", "jmd", "positron", "darnir", "dan.cermak", "bruno",
>   "ccaione", "eblake", "tim.ruehsen", "sdi1600195", "aaptel"])
>   {
>     printf "  %v\n", i;
>   }
>
> defvar digits = [9, 8, 7, 6, 5, 4, 3, 2, 1, 0];
> for (i in "0123456789")
>   {
>     digits[i - '0'] = i - '0';
>   }
> /* digits == [0, 1, 2, 3, 4, 5, 6, 7, 8, 9] */
>
> defvar digitsEven = [8, 6, 4, 2, 0];
> for (i in "0123456789" where i % 2 == 0)
>   {
>     digitsEven[(i - '0') / 2] = i - '0';
>   }
> /* digitsEven == [0, 2, 4, 6, 8] */
>
>
> /* std.pk - Standard definition for poke
>  *
>  * The following types are defined as Standard Integral Types:
>  *   - bit
>  *   - nibble
>  *   - uint8, byte, char, int8
>  *   - uint16, ushort, int16, short
>  *   - uint32, uint, int32, int
>  *   - uint64, ulong, int64, long
>  *
>  * Standard Offset Types:
>  *   deftype off64 = offset<int64,b>;
>  *   deftype uoff64 = offset<uint64,b>;
>  *
>  * Conversion Functions:
>  *   - catos  Character array to string
>  *   - stoca  String to character array
>  *   - atoi   String to integer
>  *
>  * String Functions:
>  *   - strchr  Index of first occurrence of the character in string
>  *   - ltrim   Left trim
>  *   - rtrim   Right trim
>  *
>  * Sorting Functions:
>  *   - qsort
>  *
>  * CRC Functions:
>  *   - crc32
>  *
>  * Data and Time Functions:
>  *   - ptime   Print human-readable datetime string given seconds since epoch
>  *
>  * Data and Time Types:
>  *   - POSIX_Time32
>  *   - POSIX_Time64

Today I moved ptime and the POSIX_Time{32,64} definitions to a pickle
pickles/time.pk.

std.pk should only be used to define "standard" language constructions,
such as the standard basic types.  The "standard library" of Poke is in
reality std.pk + pickles/*.pk.

... I guess :)

>  *
>  * Misc:
>  *   defvar NULL = 0#B;
>  */
>
>
> /* Now we can talk about the most important concept in Poke: mapping! */
>
>
> /* Mapping
>  *
>  * The purpose of poke is to edit "IO spaces", which are the files or devices,
>  * or memory areas being edited.  This is achieved by **mapping** values.
>  */
>
> /* Using `open` function one can open an IO space; Poke supports the following
>  * IO spaces:
>  *
>  *   - Auto-growing memory buffer
>  *   - Address-space of a process

Not yet.  Someone should write a ptrace-based IOD :)

>  *   - File
>  *   - Block device served by an NDB server
>  *
>  * It has the following prototype:
>  *
>  *   defun open = (string HANDLER, uint<64> flags = 0) int<32>
>  */
>
> /* open an auto-growing memory buffer */
> defvar memio = open("*Arbitrary Name*");
>
> /* open a file */
> defvar zeroio = open("/dev/zero");
>
> /* close the IO space */
> close(zeroio);
>
> /* To access to IO space we can map a value to some area using this syntax:
>  *
>  *     TYPE @ OFFST
>  * or,
>  *     TYPE @ IOS : OFFSET
>  */
> defvar ui32num = uint<32> @ 0#B;
> defvar i32num = int<32> @ 4#B;
>
> /* If we modify the `ui32num` the first 4 bytes in IO space will change. */
> ui32num = 0xaabbccdd;

Not really.  `ui32num' is a simple value (integer, offset, string) and
therefore it is not mapped:

ui32num'mapped -> 0

In order to perform the operation above you would need to do something
like:

uint<32> @ 0#B = 0xaabbccdd;

> /* Endianness
>  *
>  * Big-endian is the default endian-ness. This can be verified by the 
> following
>  * expression:
>  *
>  *   get_endian == ENDIAN_BIG
>  *
>  * This can be changed using `set_endian` function.
>  */
> set_endian(ENDIAN_LITTLE);    /* get_endian == ENDIAN_LITTLE */
>
>
> /* WIP ... */
>
>
> /* Based on
>  * 
> https://kernel-recipes.org/en/2019/talks/gnu-poke-an-extensible-editor-for-structured-binary-data/
>  * GNU poke reference documentation (Texinfo file)
>  */

Very very nice work... once it is more complete we should include it not
only on the website, but also in the manual!

Thank you.



reply via email to

[Prev in Thread] Current Thread [Next in Thread]