Portable UCS-4/UTF-8 encoder/decoder library (was: Hoisting other librar

Similarly, if you are planning to have a single large repo, you might
want to roll my work on UNICODE character support into the library,
though that may present a more significant licensing conflict given that
it was based in part on Chris Lilley's code (which is under GPL3 - the
whole reason I was using GPL3 was to avoid conflicts with that).

It seems to me that something as important as Unicode support should not be so restrictive. Since it will likely be the only Unicode library for classic Modula-2, it would be better if it could be licensed under an LGPL license. Otherwise, there will likely be folks who can't use it. So we need to ask ourselves what our priority is: Is our primary goal to spread the GPL, or is it to provide Unicode for PIM and ISO Modula-2?

I have now written a portable UCS-4/UTF-8 encoder/decoder from scratch and uploaded it to the M2BSK repo (for now). This way we won't have any license conflicts.

https://github.com/m2sf/m2bsk/tree/master/src/lib/unicode

This also needs the type definition of type Octet from here:

https://github.com/m2sf/m2bsk/blob/master/src/lib/Octet.def

The two modules Unichar0.cardinal32.def and Unichar0.longint32.def define an alias type UnicharBasetT for either CARDINAL or LONGINT. At least one of those will be 32 bit on any platform. To build the library, one of those files needs to be renamed to Unichar0.def and it should be the one that uses the type that is 32-bit wide.

I haven't tested the code yet and error handling for corrupted utf8 input is still missing.

regards

benjamin

From:	Benjamin Kowarsch
Subject:	Portable UCS-4/UTF-8 encoder/decoder library (was: Hoisting other libraries...)
Date:	Wed, 27 Mar 2024 02:08:49 +0900