Re: Inquiring about the status of the proposed UNICODE library

I cannot tell you anything about libraries, but I can put the question of Unicode support in ISO Modula-2 into perspective:

The ISO standard predates Unicode. When the ISO M2 standard was being worked on, there didn't even exist anything that would hint at a future Unicode standard. It was simply not on the radar. It was still considered perfectly sufficient to switch between the many different ASCII extension sets, aka code pages.

If it had not been for an unexpected comment from the Japanese standards body JIS to a draft of the ISO Modula-2 standard that complained about the lack of multi-byte character support as it was called back then, there would have been absolutely no support in ISO Modula-2 at all.

And the reaction within the working group to this comment was "Oh well, supposedly if you are Chinese or Japanese, then you need multi-byte characters, but we don't really need that, or do we?!".

Unfortunately, JIS did not send any delegate to the working group meetings nor was there anyone at JIS who actively participated in the draft work. The comment came totally out of the blue. Nobody ever expected any interest in Modula-2 from Asia and therefore nobody expected any comments from there. It was like "Oh, they do read our stuff".

Consequently, there was zero expertise within the working group how to do multi-byte character support. Nor was there really any interest in it. It was more of an inconvenient chore you know you probably have to do but try not to.

As a result, multi-byte character support in ISO Modula-2 was bolted on afterwards by making the implementation of type CHAR implementation definable.

This means, if you want to build an ISO M2 compliant compiler that supports multi-byte characters this way -- whether it is Unicode or something else -- then you have to change type CHAR to be multi-byte ALL THE TIME. In other words, you have to trade multi-byte support for ASCII support. It is one or the other, not both.

And even after the ratification of the ISO M2 base standard, when Unicode was in its development and it was already apparent that there would be a global standard, even then the working group did not revisit their incredibly stupid decision to permit type CHAR to be switchable between ASCII and multi-byte. Instead, they wasted their time on totally useless extensions for generics and OOP which only one single compiler ever implemented.

And this is representative of the relationship between ISO M2 and Unicode.

What should have been done is the introduction of a separate built-in type UNICHAR in addition to type CHAR.

We have done so in our M2 revision project, but since this is an unfunded private undertaking during occasional spare time spread out over many years, we don't have a working compiler yet. Of course there is no reason why a type UNICHAR and a supporting function UCHR() couldn't be added to GNU Modula-2 as an extension.

https://github.com/m2sf/m2bsk/wiki/Language-Specification-(9)-:-Predefined-Identifiers#unichar

https://github.com/m2sf/m2bsk/wiki/Language-Specification-(9)-:-Predefined-Identifiers#uchr

This way no user-visible library would be required. The usage would be the same as for type CHAR.

In the hope this puts things into perspective

regards

benjamin

On Wed, 6 Mar 2024 at 21:48, Alice Osako <alicetrillianosako@gmail.com> wrote:

As background, I am writing a minimal JSON parser and generator library for Modula-2 as part of a larger project (a Language Server Protocol implementation for Modula-2, which is itself in service of developing yet another unrelated project in Modula-2), and while I would be satisfied in the short run with a parser which is limited to ASCII, I would strongly prefer to properly support the JSON standard as given in RFC 8259.

To this end, I was wondering if there was a UNICODE library available for GNU Modula-2 at this time, so I checked the archives of this mailing list and came across a post by Chris Lilley from October 2010. While the posted message did include the module and definition files for the library, the message itself made it clear that the library was still in early development at that time.

I was wondering if there had been any subsequent development on this library, whether it was released at any point, and what its availability was. Conversely, if there were any other UTF-8 capable Modula-2 libraries I would be interested in hearing about them.

I have gone through the official documentation for GNU Modula-2, as well as checked the GNU software repositories, and found nothing that fit my needs, though it is possible I overlooked something.

I will confess that I am not tremendously fluent in Modula-2 yet, nor am I conversant with the UNICODE standards. I do not see trying to implement a UNICODE library myself as a viable option, at least not without a great deal of study on the problem.

Thank you for your time.

From:	Benjamin Kowarsch
Subject:	Re: Inquiring about the status of the proposed UNICODE library
Date:	Thu, 7 Mar 2024 04:41:15 +0900