Re: Dynamic mutidimensional arrays

Hi Michael

On Wed, 5 Apr 2023 at 16:08, Michael Riedl wrote:

thanks for the detailed answer - but it still does not become clear to me.

How would I (without and "surrounding") do a simple

val := A[i,k] * B[k,j];

In classic Modula-2 (both PIM and ISO) there is only one way to allocate a block of memory whose size is not known at compile time and then use array subscript notation on that block of memory, and this one way is to type cast the memory block to an array type of the specific size that the memory block was allocated with at runtime.

This is less messy than the address arithmetic I mentioned earlier, and it also significantly reduces the opportunity for error, but it is nevertheless cumbersome. I did this in an implementation of interned dynamic strings for our bootstrap compiler.

https://github.com/m2sf/m2bsk/blob/master/src/lib/String.def

https://github.com/m2sf/m2bsk/blob/master/src/lib/imp/String.pim.mod

https://github.com/m2sf/m2bsk/blob/master/src/lib/imp/String.iso.mod

The implementation of the dynamic string type is based on a variant record with a length field and the variant payload. In this particular case we need strings with a maximum length of 4096 characters because that is the maximum length of a string literal supported by the compiler.

Now, one might implement this with one variant for each possible length but that would mean a record with 4096 variants. Not really desirable. So I did some frequency analysis of literal lengths in Modula-2 source code and structured the variants around the most common cases, combining other less common cases. The most frequent literal lengths are in the range of one to 80 characters, so each of these got its own variant in the record. The longer literals are less and less frequent, so there are nine more variants to cover them, whereby the number of cases covered increases with increasing length. As a result, the record type has 89 variants, which is quite a lot still, but manageable.

Back to the dynamic string library, when you allocate a string at runtime, you determine its length, and then allocate a block of memory sufficient to hold the length field and the string. For any work done on the string other than obtaining its length, you typecast the variant field that holds the string onto a character array type that matches the exact length (for lengths up to 80) or the maximum length of the variant (for lengths above 80).

Once the data has been typecast you can use array subscript notation because to the compiler this is now a static array.

Now, if you were to use this technique for vectors and matrices, you may not need as many variants as the range of sizes you need to cover is likely far lower. The fewer variants you need, the less clutter you get within the record declaration and the less static array type declarations you need for casting.

Of course this is not really ideal. There is a lot of boilerplate involved that wouldn't be necessary if the language had a facility to declare records with an array field of indeterminate type, as C does, and as we have added in our Modula-2 revision. But unfortunately with classic M2, there is no other way, unless you use arrays whose allocation size is determined at compile time and then add boundary metadata.

hth

regards

benjamin

From:	Benjamin Kowarsch
Subject:	Re: Dynamic mutidimensional arrays
Date:	Wed, 5 Apr 2023 16:39:39 +0900