I was working on my Unicode project, trying to apply various BITSET
and BitWordOps operations, when I got a 'internal compiler error',
something I had not encountered before.
The procedure which seems to have triggered this is
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
PROCEDURE Utf8ToUnichar(utf8: UTF8Buffer; VAR
ch: UNICHAR);
(*
Utf8ToUnichar - Convert a buffer of UTF-8 characters
to the internal UCS-4 format.
Any candidate character which does not match these cases should
be
replaced with the REPLACEMENT CHAR.
*)
VAR
edgeBit: SHORTCARD;
subChar: ARRAY [1..3] OF BITSET; (* holds the sub-components
of the character *)
i : CARDINAL;
octet: BITSET;
BEGIN
(* clear the output *)
ch := 0;
octet := 0;
subChar[0] := 0;
FOR i := 1 TO 3 DO
octet := utf8[i];
subChar[i] := octet - {6..7};
END;
(* Which is the last clear bit in the first byte? *)
edgeBit := GetEdgeBit(utf8[0]);
ch := utf8[0] - {7 .. edgeBit};
CASE edgeBit OF
7:
(* A single-byte ASCII char, just use as-is *) |
5:
(* use two bytes for the value *)
ch := WordOr(ch, WordShl(subChar[1], 6)); |
4:
(* use three bytes for the value *)
ch := WordOr(ch, WordShl(subChar[1], 6));
ch := WordOr(ch, WordShl(subChar[2], 12)); |
3:
(* use four bytes for the value *)
ch := WordOr(ch, WordShl(subChar[1], 6));
ch := WordOr(ch, WordShl(subChar[2], 12));
ch := WordOr(ch, WordShl(subChar[3], 18));
ELSE
(* should never happen, return the REPLACEMENT CHAR *)
ch := Replacement;
END;
END Utf8ToUnichar;
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Despite the fact that it triggered an internal error for the
compiler, I am certain that the code itself is deeply flawed. I am
simply too inexperienced with Modula-2 bitwise operations to say
what I have done wrong here, and unfortunately I lack a proper
language reference to work from (recommendations would be welcome).