Re: Portability Considerations

On Sun, 17 Mar 2024 at 04:37, Alice Osako wrote:

Well, ISO is as much a legacy standard as PIM.

Is there a more recent promulgated standard, then?

There is Wirth's successor language to Modula-2, called Oberon and there is a revised Modula-2 by Rick Sutcliffe and myself.

Oberon is very much a stripped down Modula-2 with extensible records replacing variant records, which supports both the static and dynamic dispatch paradigms of OOP. Unfortunately, it is overly simplistic, which led to balkanisation with a large number of dialects. But there are quite a few compilers available supporting one or another of these dialects. However, to my knowledge there is no GCC support.

Our Modula-2 revision is a modern language derived from PIM4 with Oberon-style extensible records replacing variant records, unnecessary and outdated features removed and modern features added to increase expressive power without increasing the footprint of the core language. To avoid balkanisation it provides means to bind library functions to built-in syntax and thereby elevate user defined types to first class status. It also defines interfacing to C, the JVM and CLR. Unfortunately, our compiler is still work in progress as this is an unfunded pet project on which we can only work sporadically. Gaius has pledged support in GM2, which is likely to appear in a "one feature at a time" manner.

Perhaps it is noteworthy that we have an agreement with Springer for a fifth edition of Wirth's "Programming in Modula-2" based on our core language definition and thus this is poised to become PIM5. Springer wanted to keep the same structure as in the previous editions with two parts, (1) a tutorial part and (2) a language report part, where Rick was to write/edit the former, and I the latter. The language report is finished, but the first is not as Rick had a tragedy in his family that required his full attention and I didn't feel like bothering him about it since, after all, I haven't finished the compiler yet either. But we'll get there eventually.

I'd like to point out that it is possible to write portable code in Modula-2 in the manner I advocated on this list before and again mentioned in my comments on your recent posts, and in doing so the code will be fairly straightforward to migrate to M2R10/PIM5.

One supported by the GCC implementation?

As mentioned, Oberon isn't and M2R10/PIM5 isn't yet.

Pretty much all other modifications to the core language made things worse.

I am not so well versed in the language at this time to judge how the ISO changes made things worse. Can you give me some examples of this?

Due to having spent an entire decade revising the language, I have looked at this in-depth and I could write an entire book on what is wrong with ISO M2. Rick was on the ISO standards committee from its early days until the very end and he was the designer and editor of the ISO I/O library and the generics extension. I had briefly participated in the committee myself. We both share the view that ISO M2 turned out to be that very same thing all the participants had hoped to avoid, pretty much repeating the folly of the Algol committee that led to Algol-68. In its defense, ISO M2 is not quite as bad as Algol-68, yet it had the same effect: it pretty much killed the language.

I had already mentioned the FOR loop semantics, which is a less evil example how ISO went wrong. The intentions were good in that the working group didn't want to leave the semantics undefined. But the way in which this was then done is not much better than leaving it undefined in the first place. The loop variant is still syntactically accessible outside the loop body even though its value is semantically undefined. In our revision we solved this by making the loop body the scope of the loop variant. It does not exist outside the loop body. For this, the loop variant is not declared in a VAR section but right inside the loop header.

(* foo is not defined yet *)

FOR foo IN bar DO (* <= foo is defined in the loop header *)

(* foo is in scope here *)

END; (* FOR *)

(* foo is no longer in scope here *)

Far more evil is the BITSET type and bitwise operations.

Suppose you need to implement a hash function that operates on strings. For this you iterate over all characters in the string and apply various operations where the operands are each character and the eventual result is accumulated in a temporary value. The operations used for hashing are typically addition and subtraction ignoring over- or underflow, shifts, rotations, logical NOT, AND and OR. None of these operations are permitted on type CHAR, and addition and subtraction aren't permitted on BITSET either. So we need to import from SYSTEM to use CAST and this will clutter our hash function as we have to cast forwards and backwards between CARDINAL, BITSET and CHAR, depending on what operation is used in the composite _expression_ that calculates the cumulative hash value. The code quickly becomes unreadable and difficult to maintain, significantly increasing the opportunity for error.

Why are shifts and rotations only permitted on BITSET when they are in module SYSTEM anyway? We have already crossed the line into potentially unsafe territory. Nothing is gained by restricting these operations to BITSET. It does not add any safety. We already left safety behind. So why not permit shifts and rotations and other bit manipulations at least on machine types LOC and WORD? At least we could cut down on the number of cast operations then. But if these operations are imported from SYSTEM anyway, they might as well be permitted on any type. There is no safety to be gained by restricting them to any particular type. The only outcome is an increase in the number of cast operations and thus clutter.

Then there are a number of issues with features that are academic only, but totally useless in practice.

For example, in PIM which was designed in 1978, there are lexical synonyms ~ for NOT and & for AND. This was already inconsistent because there is no such synonym for OR because | is already used as a separator in case label lists. The use of synonyms is bad design to begin with. Everything should only have a single syntax form. If the designer believes that it is of such importance to save the programmer two extra key strokes, then they should only have single character symbols and then consistently for all logical operations, for example ~ for NOT, & for AND and | for OR, in which case the designer should look for an alternative case label separator, such as a double semicolon ;; or whatever else. If the issue isn't considered important enough to change the case label separator, then the single character synonyms should be dropped entirely. It should be either all single character symbols and then single character symbols only, or all reserved word symbols and then reserved word symbols only. Consistency is far more important than saving some lazy arse programmer one or two key strokes.

And yet, as late as 1988, the ISO working group felt it was necessary to inflate the practice by introducing ! as a synonym for | and @ as a synonym for ^ as if more than 20 years after the introduction of the ASCII character set there was a need to accommodate dinosaur hardware with 5 or 6 bit character sets that might not include | and ^. Entirely academic. Totally useless in practice. Not only is it totally useless but it also makes it far more difficult to later assign these symbols for other far more practical uses. For example, Modula-2 like most Pascal family languages does not have single-line comments. Modern Fortran uses ! as a single line comment prefix which is a very good choice as it allows the insertion of function header specifications that stand out as documentation blocks when they all start with ! at the very left. This would have been a much better use for the ! character, but ISO reserves that for 1950s hardware with five or six bit character sets. This might have been understandable if ISO M2 had been defined in the early 1960s, but certainly not in 1988.

Similarly, ISO leaves the bitwidth of the smallest addressable unit type LOC implementation defined. This too would have been understandable in the 1960s, but not in 1988. In 1988 it was 100% foreseeable that all silicon will forever be based on units whose size is a multiple of eight. Yet again, a ridiculous decision due to being stuck in a 1950s/1960s mindset. And if that mindset had not existed, if it had been accepted that the future belongs to multiple's of eight, then the name of the smallest addressable unit type would have been OCTET, not LOC, thus making it self-explanatory, leading to better readability of the code, not even talking about implementation and portability issues with an implementation defined unit.

Plenty of chances were missed to remove outdated features and add features for more modern requirements in their place. In Oberon, Wirth followed the approach "How can I reduce the feature set to the absolute minimum that I can get away with". In our revision, we followed the approach "How can we keep the size of the language about the same but increase its expressive power and utility to the absolute maximum doable with that given footprint". ISO M2 followed the opposite approach allowing feature creep.

Built-in types COMPLEX and BCD were correctly rejected early on by the working group. As a mathematician, Rick had advocated COMPLEX, while p1 Modula-2 implementor and maintainer Albert Wiedemann and I had tabled a proposal for BCD. The working group explained to us that if we got BCD, then Rick would have to get COMPLEX and eventually somebody else would want to get even more built-in types, that a line had to be drawn somewhere. For the sake of keeping the language lean, we then withdrew our proposal much to Rick's disappointment. However, Rick stuck around for long enough to sneak COMPLEX back in later when most members had lost interest and resistance had faded. Albert stuck around to the end as well, but I didn't, so I don't know why he didn't push for the inclusion of BCD at that point. It is part of his ISO M2 compiler as an extension though.

In hindsight, Rick and I realised that this was a bad thing. In our revision we provide a feature called syntax binding which allows user defined types to be used like built-in types except for the need to import the library that implements them. With this general feature it is possible to keep the language lean but have library defined COMPLEX and BCD types that look just as if they were built-in.

Then there is the ISO way of doing COROUTINEs. It is incompatible with PIM, but without any real gain. Neither is it more user friendly, nor is it more powerful. The way COROUTINEs are done in both PIM and ISO is crude, almost assembler like. The Lua language is an example of how to do COROUTINEs in a user friendly and powerful way. It is subject of a seminal paper by Roberto Jerusalemschy (the primary designer, implementor and maintainer of Lua) and his co-author whose name escapes me right now. Again, a chance missed by ISO to improve things, but instead making it worse. Thanks to Roberto we didn't have to come up with an entirely new approach for coroutines in our revision, we adapted his approach to Modula-2.

There are many other issues with ISO M2 which are only apparent when getting into fair detail and I will therefore refrain from any discussion of those here.

Again, I am not familiar enough with the library to judge.

The by far biggest problem is however the ISO library, and in particular the I/O library. There, the committee dynamics show even more as everybody had some pet issue with library APIs proposed and those were then tweaked until everybody would agree. Not to improve the API or functionality, but simply to get approval. The worst that committee design has to offer.

Rick was the designer and editor of the ISO I/O library and he has taught ISO M2 at his university for 20+ years. He says, the ISO I/O library is unteachable to undergraduate students. It is overly complex and has circular dependencies. You cannot introduce a basic concept and then build upon it. Students need to understand the whole thing before you can teach any of its components. And not surprisingly, this makes it cumbersome to use.

Why shouldn't I/O be as simple as this:

IMPORT BCD;

IMPORT PervasiveIO;

VAR a, b, c : BCD;

READ "Enter a: ", a, "\nEnter b: ", b;

c := a + b;

WRITE "\nSum: ", #("5;2", c), "\n";

or with specified input and output streams

READ @infile: "Enter a: ", a, "\nEnter b: ", b;

c := a + b;

WRITE @outfile: #("5;2", a), " + ", #("5;2", b), " = ", #("5;2", c), "\n";

instead of importing from different layers of the API, wondering which layer to use, having to write tons os boilerplate code, then having to call several WriteThis(), WriteThat(), WriteSomethingElse() functions, each type requiring memorisation of another set of IO functions. What a holy mess.

Even so, it doesn't cover much of the functionality needed in our day and age.

That is inevitable in a language which hasn't been updated in 30 years, to be sure.

But that is only an explanation, not justification.

In any event, writing portably usually leads to cleaner code and fewer bugs, regardless of the language used. This is so because writing portable code typically requires the use of abstraction layers and design the API from a functional point of view.

I agree, but only to a point. While it is certainly true that a better designed API results in better client code, writing a library for portability generally leads to significantly less clear internal code, as it invariably means the use of special cases, and often means applying conditional compilations and/or separate versions of the library for separate circumstances.

You are thinking of a C style macro based style of portable coding where all the different scenarios are bundled into a single implementation. That is diametrically opposite to the philosophy of a modular language like Modula-2.

To write portable code in a modular way, you first design a platform agnostic API. Where it is possible to implement that API without using dialect or implementation or target specific features and syntax, you do that. And where this is not possible, you write separate platform specific implementations. Each of these will be lean and clean, readable and maintainable. Plus, if you need to migrate the library to another dialect, compiler or target, it is possible to do that with minimal effort since all the platform agnostic code remains in place, and you only need to write those platform specific implementations which will seamlessly fit into the whole architecture since they conform to the same API.

While better use of abstraction layers internally can mitigate this, it doesn't avoid it - at best it can isolate the non-portable sections more carefully. While this is worthwhile

More importantly, you can only write a program to be cleanly portable to a system/compiler/dialect you know exists.

Not necessarily.

Between PIM and ISO, it is mostly possible to write dialect agnostic code for most if not all use cases except for casting. This does require a collection of libraries to replace certain built-in functions, like DIV and MOD when used with signed types, and low-level functions for bit manipulations that are entirely based on basic math instead of using already provided but manipulation provided by the dialect or compiler extension. However, once you have these libraries in place it is all smooth sailing from there. You can also reuse the code in the github repo I posted links to before.

The alternatives are clear - either support one specific environment, and make it clear that you are doing so, or else support as many as you can, again making it clear that it might not portable to non-supported environments. The latter is certainly preferable, but is it really practical?

Ideally, you write one set of platform agnostic implementations which may sacrifice efficiency for portability, and only write platform specific implementations for cases where either platform agnosticity is not possible, or where the efficiency gain of a platform specific implementation justifies the extra effort.

There is also the aspect of reputation. You put your code into a public repository with some kind of open source license and eventually somebody may use it with a different compiler and/or different dialect.

True, but as I said earlier, it is impossible to predict all of the possible environments someone might try to use a program in. One can try to support as many environments as possible, but not everything.

If you design for portability in the manner I have outlined, people who will use your code and port it to a different platform (be it a different dialect, compiler or target) will take advantage of this design. They will follow your platform agnostic API. That way, they will complete your library by adding the missing platform specific implementations that you didn't have time nor motivation to provide. And you'll get the kudos for having designed the API in a forward looking way.

However, given that this was intended as a support library for particular use case of my own, it leads me to ask if it should be public at all, or if it would be better to make the repo private. I don't want to do that, if my code could be useful for others, but at the same time I have no expectation of maintaining this indefinitely.

If you write the code in the manner I have outlined, it will be much easier for somebody else to take over from you when the time comes and continue to maintain it. The more specific it is, and the less intuitive, the lower the chances of that.

I have no particular commitment to Modula-2 as a language - I am primarily a Lisper, and was undertaking this mainly as an interesting stretch of my skills in a language I had been curious about since the 1980s - and don't see this as something which very many people even within the Modula-2 community would have much interest in. Am I wrong about that?

I think a Unicode library would find great interest. Also a JSON parser should be of interest.

Given the lack of UNICODE support in the language itself (especially the lack of string literal support), a UNICODE library is of only limited use. I am writing this for a specific purpose, and I am not certain if it would be of general applicability.

It wouldn't be of general applicability in using Unicode within PIM/ISO Modula-2 source text itself, for that the dialects would need to support Unicode string literals. But it would certainly be of general applicability in using the library for developing Unicode supporting applications in PIM/ISO Modula-2.

I was posting on this mailing list mainly to get support in using gm2, not for general language support, though as you know I certainly have needed that as well. I wasn't even aware of the discrepancies between the different standards when I first posted here. My knowledge of Modula-2 is still fairly limited; the only reason I made the repos public was because that is the default on GitHub, and I kept them public to facilitate getting outside help.

I didn't mean to discourage you. I meant to encourage you ;-)

As far as I was aware when I started this project, gm2 was the only Modula-2 compiler in active support, and I was frankly surprised that even that was the case when I heard about it - I hadn't done anything with the language before in part because every other Modula-2 compiler I'd ever heard of was an expensive commercial product that hadn't been updated since the 1990s. I now know that this isn't the case, but I hadn't known that when I began this trek. However, far from simplifying matters, this just makes it more complicated.

p1 Modula-2 (supports ISO) is also in active development and support. And then there is the ACK Modula-2 compiler (supports PIM) which is part of a compiler kit (like GCC) formerly by the University of Amsterdam. It was abandoned for a number of years but has found new maintainers and is now again actively maintained and supported again.

The sum of all this is leading me to question whether to proceed at all.

I would encourage you to proceed, but also consider my advice on portability.

As far as the Unicode library is concerned, I am quite happy to help out a bit although I do not have a lot of time, so this might be more in the form of API review and advice, the odd library contribution of stuff that I have already written, lying around some place else etc.

regards

benjamin

From:	Benjamin Kowarsch
Subject:	Re: Portability Considerations
Date:	Sun, 17 Mar 2024 16:47:59 +0900