[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: What is the equivalent type in GM2?
From: |
Gaius Mulley |
Subject: |
Re: What is the equivalent type in GM2? |
Date: |
Mon, 30 Mar 2020 14:32:23 +0100 |
User-agent: |
Gnus/5.13 (Gnus v5.13) Emacs/24.5 (gnu/linux) |
Benjamin Kowarsch <address@hidden> writes:
> On Sat, 28 Mar 2020 at 17:10, Hưng Hưng <address@hidden>
> wrote:
>
> The C version doesn't impose any max length.
>
> Strictly speaking, C the language considers char* to be a pointer to a
> single character, thus a maximum length of one.
>
> However, the C compiler doesn't care about type safety and allows you
> to read and write past the end of that character string of length one.
>
> This is the single most important reason why the major operating
> systems today are so vulnerable to cyber attacks. The vast majority of
> security vulnerabilities are based on buffer overflow exploits in C.
>
> By contrast, Modula-2 was specifically designed as a type safe
> language. For this reason, the compiler does not permit you to read
> and write past the declared capacity limit of a type.
>
> This is not a bug, but a feature. And a very important feature that we
> want to keep.
>
> So I have to guest and put a value for max length that I found
> suitable? It's like we impose an imagination/artificial limitation
> to our binding with no reason at all.
>
>
> The use of static character arrays goes back to the 1960s. Nowadays we
> tend to use dynamic collection types where the capacity is determined
> during allocation at runtime. But this isn't built into the language.
> Instead, it has to be supplied in form of libraries.
>
> You may want to consider writing yourself a dynamic string library and
> then use that.
>
> Or perhaps you can find one that meets your requirements and use that.
>
> The indeterminate record types Gaius and I were talking about in the
> other thread are specifically designed to allow easy implementation of
> dynamic collection types but with type safety.
>
> TYPE DynString = POINTER TO RECORD
> length : LONGCARD;
> + string : ARRAY OF CHAR
> END;
>
> then ...
>
> VAR str : DynString;
>
> NEW str CAPACITY 1000;
>
> after which
>
> CAPACITY(str) will return 1000
>
> and LENGTH(str) will return 0.
>
> Alternatively, with initialisation string ...
>
> NEW str := "The quick brown fox jumps over the lazy dog.";
>
> after which
>
> CAPACITY(str) and LENGTH(str) will both return 34.
>
> However, that's not available yet. So, in the meantime, you will have
> to either use dangerous pointer arithmetic in your dynamic type
> implementation, or if you want to keep type safety you will need to be
> creative.
>
> I have implemented a dynamic string library for interned strings in
> one of my projects which is available at github.
>
> PIM version
> https://github.com/m2sf/m2pp/blob/master/src/String.pim.def
> https://github.com/m2sf/m2pp/blob/master/src/imp/String.pim.mod
>
> ISO version
> https://github.com/m2sf/m2pp/blob/master/src/String.iso.def
> https://github.com/m2sf/m2pp/blob/master/src/imp/String.iso.mod
>
> This uses a Passepartout, which is French for a key that matches
> multiple locks.
>
> TYPE Passepartout = POINTER TO StrBlank.Largest;
>
> TYPE StringDescriptor = RECORD
> length : CARDINAL;
> intern : Passepartout
> END;
>
> where StrBlank.Largest is defined in
>
> https://github.com/m2sf/m2pp/blob/master/src/StrBlank.def
> https://github.com/m2sf/m2pp/blob/master/src/imp/StrBlank.mod
>
> which contains a number of length specific character array types.
>
> Type Largest is the largest character array type available.
>
> When a new dynamic string is allocated, the library determines the
> character array that is the closest match for capacity and allocates a
> new dynamic string of that type, which is then linked to the intern
> field using a CAST since the formal type of field intern is of the
> largest character array type. However the benefit is that we can still
> use array subscript notation to address individual characters in the
> string instead of having to use pointer arithmetic. The casting
> between type Largest and the actually allocated character array type
> is confined to this one library and happens only in two or three
> places. Outside the library the strings are only accessible via the
> library's API.
>
> This is a reasonable compromise between readability, convenience and
> type safety. Besides, using pointer arithmetic would be less readable,
> less convenient and less safe. So it is the best you can do with
> classical Modula-2 at this time.
>
>
>
>
>
> Vào Th 7, 28 thg 3, 2020 vào lúc 14:47 Benjamin Kowarsch
> <address@hidden> đã viết:
>
>
> A pointer to char in C is not equivalent to a pointer to CHAR
> in Modula-2.
>
>
> In C a string may be either a char array or a pointer to a
> single char where the lack of type safety is then EXPLOITED to
> ignore the fact that the pointer type points to a single char,
> not a character string, and with DEVASTATING CONSEQUENCES !!!
>
>
> By contrast, in Modula-2 a string is a character array with a
> maximum capacity associated to the type and type safety is
> enforced, thus a pointer to a singe character is always
> interpreted correctly as having a payload of only one single
> character.
>
>
> Thus, the closest equivalent of
>
>
> char* str;
>
>
> in Modula-2 would be
>
>
> POINTER TO ARRAY [0..MaxStrLen] OF CHAR;
>
>
> where MaxStrLen must be a compile time constant, that is, it
> cannot be changed dynamically at runtime.
>
>
> And if you have a static character array string in Modula-2,
> like
>
>
> VAR str : ARRAY [0..80] OF CHAR;
>
>
> then you can't just pass str to a char* parameter of a C
> function. Instead you need to pass a pointer to it.
>
>
> TYPE Str80 = ARRAY [0..80] OF CHAR;
> VAR str : Str80;
>
>
> TYPE Str80Ptr = POINTER TO Str80;
> VAR strPtr : Str80Ptr;
>
>
> then
>
>
> str := "the quick brown fox jumps over the lazy dog.";
> strPtr := VAL(Str80Ptr, ADR(str));
>
>
> then
>
>
> passToC(strPtr);
>
>
> assuming
>
>
> void passToC(const char* s);
>
>
> Although GM2 may already map an argument of a character array
> type to char* when using the DEFINITION MODULE FOR "C" syntax
> to map C functions. Even if it does, it likely won't do the
> same for char** and char***.
>
>
> Thus, if the C function parameters are char** then you need
>
>
> POINTER TO POINTER TO ARRAY [0..MaxStrLen] OF CHAR;
>
>
> Likewise for char*** you need
>
>
> POINTER TO POINTER TO POINTER TO ARRAY [0..MaxStrLen] OF CHAR;
>
>
> As I have mentioned before, the best way to interface to C
> APIs is to use a layered approach where the lowest level
> interfaces directly with the C API and a user level provides a
> wrapped Modula-2 representation that is independent of the C
> API. In the lower level library you can then convert and cast
> types as needed to pass between C and Modula-2.
>
>
>
>
>
>
>
>
>
>
>
> On Sat, 28 Mar 2020 at 03:27, Hưng Hưng
> <address@hidden> wrote:
>
>
>
> Let me add additional information. If I use the pointer
> trick, e.g: PChar, PPChar, PPPChar, then I can't pass the
> M2 string into C function as it requires C string, if I
> try to do so the compiler will complain because it expect
> char to have only length 1. M2 and C have a very different
> way of processing string, as I see the equivalent pointer
> to char trick in C would not work on M2.
>
>
> There is a procedure in module DynamicStrings allow to
> convert between M2 string and C string, but again, how to
> translate these data type correctly? If we go the pointer
> trick we will then have to figure out how to represent
> PPChar, PPPChar as the procedure in DynamicStrings only
> helps us up to here. It's circular reasoning. I feel my
> head as going to explode.
>
>
>
>
> Vào Th 7, 28 thg 3, 2020 vào lúc 01:15 Hưng Hưng
> <address@hidden> đã viết:
>
>
>
> The C function return or took a C string as parameter,
> with is an array of char or pointer to unsigned char.
>
>
> Another function return or took an array of C string
> as parameter, which is an 2D array of char or pointer
> to pointer to unsigned char.
>
>
> Another function return or took an array of array of C
> string as parameter, which is an 3D array of
>
> char or pointer to pointer to pointer to unsigned
> char.
>
>
> It's too complex. C code tends to abuse pointer too
> much.
>
>
> e.g:
>
>
> void IupResetAttribute(Ihandle* ih, const char* name);
>
>
> int IupGetAllAttributes(Ihandle* ih, char** names, int
> n);
>
>
> int IupOpen (int *argc, char ***argv);
Hi,
it might also be worth examining DynamicStrings in gm2 - there is a
'string' procedure which will convert a String to a C string.
So you can do:
VAR
s : String ;
cs: ADDRESS ;
BEGIN
s := InitString ('hello world') ; (* convert "hello world" into a
dynamic string type. *)
cs := string (s) ; (* cs is a C compatible string
attached to s. *)
s := KillString (s) ; (* deconstruct s and cs. *)