gm2
[Top][All Lists]

## Re: Dynamic mutidimensional arrays

 From: Benjamin Kowarsch Subject: Re: Dynamic mutidimensional arrays Date: Thu, 6 Apr 2023 02:53:32 +0900

Hi Gaius

On Wed, 5 Apr 2023 at 23:13, Gaius Mulley wrote:

an interesting discussion, I hear the wise warnings about painting
myself into a corner etc (especially when ISO generics, OO and M2R10 are
implemented).

Here is an exploratory idea, would the following code suit/have merit?
This time the open array dynamic variables are only allowed in local
procedure variables.

If you want to add some form of support for dynamic vectors and matrices, you can always implement record types permitting an indeterminate array field and then permit that to be either one- or two-dimensional ...

TYPE Vector = POINTER TO RECORD
+ data : ARRAY length OF REAL
END; (* Vector *)

which is structurally equivalent to

TYPE Vector POINTER TO RECORD
length : CARDINAL; (* alternatively LONGCARD *)
data : ARRAY OF REAL
END; (* Vector *)

In order to allocate an instance, either the size needs to be calculated and passed to ALLOCATE manually and the length field also initialised manually, or you can implement pervasive procedure NEW in such a way that it requires a second argument for the length when the first argument is of such a type. In the latter case, NEW then takes care of the size calculation and initialisation of the length field.

NEW(vector, 100);

The array field can then be accessed via array subscript notation:

vector^.data[index]

Where the array subscript notation can be used depends on whether or not the type declaration is permitted only within an implementation module or also within a definition module. In our revision we decided to permit this only in implementation modules, but you may of course decide differently.

In general, I would recommend not to let the syntax of how the data is read or written influence the design. This is a mistake that too many language implementors make. One should think of the design separately from the notation. The design is the structure and behavior of a data object, while the notation is a symbol that links to the object. The symbol is said to be bound to the object. The two are separate entities. In some languages this binding is static, in some it is dynamic (to varying degrees). Generally speaking, expressive power and genericity comes from dynamic binding, and in particular from late binding which is to say binding symbols to objects at runtime.

Now Michael (and I might add, many a mathematician) isn't going to be happy with the notation vector^.data[index], but that is not a good reason to go for a lesser design. The notation can always be adjusted by other means. The desired notation should not dictate the design.

Let's say you implemented this and only permit the declaration syntax in implementation modules. This would force the types to always be opaque. So, now you have no more direct access to the record fields outside the implementation module and users of the library have to use procedures and functions, thus the native notation would then be

Vector.Store(vector, index, value); (* and *)
Vector.value(vector, index)

Again, Michael and most mathematicians aren't going to be happy with that, but any desired notation can be overlayed. The notation is nothing but syntax sugar. Think of Lisp. When McCarthy set out to design and develop Lisp, he said "let's not worry about the syntax for now, we can think about some syntax later on, for now we are going to use an intermediary form". But then the intermediary form stuck and became the syntax. Yet, with Lisp's powerful macro system, you can overlay any kind of syntax on top you like.

And you don't need to add any macro system to create syntax sugar overlays. You can just add some compiler hint.

PROCEDURE [STORE] Store ( v : Vector; index : CARDINAL; value : REAL );

could be a syntax for such a hint. This is the syntax we use, but you can of course think of alternatives.

And with that, the compiler can then replace any occurrence of

vector[index] := value

with a procedure call

Vector.Store(vector, index, value)

and that procedure can of course be inlined so that the overhead of a procedure call is avoided.

You can do the same with the accessor function like so:

PROCEDURE [VALUE] value ( v : Vector; index : CARDINAL ) : REAL;

and let the compiler thereby replace any R-value occurrence of

vector[index]

with a function call

value(vector, index)

again this function can be inlined to avoid the cost of a function call.

With that, you don't even need to do anything fancy for operators to use +, -, * and / on the values as long as the value type is a built-in numeric type that supports these operators, because

v1[i] + v2[j]

is replaced with

value(v1, i) + value(v2, j)

which is already supported in the language.

The approach can also be extended to 2-dimensional array fields, like so:

TYPE Matrix = POINTER TO RECORD
+ data : ARRAY rows, columns OF REAL
END; (* Matrix *)

which would again be structurally equivalent to

TYPE Matrix = POINTER TO RECORD
rows, columns : CARDINAL;
data : ARRAY OF ARRAY OF REAL
END; (* Matrix *)

In this case you would need NEW() to take two additional arguments, one for rows and one for columns.

All else works like in the one-dimensional version, except that you need two indices, one for row and one for column.

The benefit will be that you have a general facility that can also be used to implement dynamic strings and other collection types.

By contrast, if you add a facility specifically for matrices, and the next guy comes along and asks for built-in support for dynamic strings, you will have to do yet another specific implementation for that, and yet another for yet another request, with feature creep building up.

Plus, if you eventually want to support M2R10, you'll need to implement indeterminate types anyway, so why do extra work?

hope this helps
regards
benjamin