gm2
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Translating Modula-2 identifiers to C


From: Benjamin Kowarsch
Subject: Re: Translating Modula-2 identifiers to C
Date: Thu, 11 May 2023 03:04:59 +0900

Hi Gaius

On Thu, 11 May 2023 at 01:05, Gaius Mulley <gaiusmod2@gmail.com> wrote:

The issue of name mangling above is an interesting idea.

I think the default use should be simple so that the gdb user experience
will see procedure and identifier names as the same as the source code.
(Aiming to make it easy for first year undergrads).

That's very easy to do. You do the development and debugging with the
compiler switch turned off, so that all names in gdb appear exactly the
way they do now. Once you are ready for deployment of the library to be
used from C, then you rebuild it with the compiler switch turned on.

This way there is absolutely no need to let gdb representation issues impact
design decisions.
 

The compiler should also allow more complex name mangling for advanced
use.

Currently
=========

In gm2 there are named paths which are prefixed to the gcc generated
symbol name.  So for example libraries may be different and by default
have named paths, so the m2pim libraries, m2iso libraries have path
names associated with the default locations.

For example the m2pim libraries might be installed at:

    $HOME/opt/lib/gcc/x86_64-pc-linux-gnu/13.0.1/m2/m2pim/StrIO.def

and the driver gm2 sets up the named paths resulting in a call to
StrIO.WriteString appear as a call to an external function
m2pim_StrIO_WriteString.

Which allows ISO, PIM libraries to coexist even if they have the same
module name and a different interface.  (ISO Storage, PIM Storage, ISO
SYSTEM and PIM SYSTEM for example). 

I very much doubt that you would want to write any library intended for
use within C based projects and make use of either PIM or ISO libraries.

The mere dependency on those libraries will be an impediment for using
the library within C. So, I would think you'd rather use C's stdlib and hook
ALLOCATE into malloc(), and DEALLOCATE into free().

HOWEVER, you could still prefix the module prefix with a framework prefix,
if so desired. The heavy lifting of my library is done in the lower level module
that converts any string from mixed case to snake_case and stores it in a
dictionary. The user level library that composes the qualified identifiers by
affixing module prefixes, type suffixes and local suffixes, and also convert
to uppercase for macro identifiers, mostly consists of calling the former
lower level library and then stitch the returned strings together.

It would thus be very little effort to add another optional parameter for a
project suffix to be prepended before the module suffix for fully qualified
names.

gm2 allows _ in any identifier and so it is possible to choose
identifiers which will clash with the name mangling schema above.

Unless you limit lowline use to non-leading, non-trailing and non-consecutive
occurrences. With that minor restriction, it won't clash. I designed it that way
because my translator/compiler permits  non-leading, non-trailing and non-
consecutive lowlines in identifiers (when enabled by compiler switch).

You can always have two lowline-identifier modes, one with the restriction
above, and one without. And when the restriction is turned off, then the
compiler switch for snake-case/macro-case identifiers will be turned off.

Very simple.


Proposed change
===============

I wonder if if the following algorithm would resolve the above issue:

In order of priority:

   0.  DEFINITION FOR "C".  Turns off default name mangling for the entire
       module.

First, DEFINITION FOR is neither PIM, nor ISO syntax, nor is it in line with
what PIM compilers in the market back in the day used for foreign definition
modules. Back in the day of PIM, the most common I have seen was
FOREIGN DEFINITION MODULE. Some compilers I have seen used
pragmas for the same purpose.

The pragma route is preferable because a compiler that does not support
this can still accept the code and ignore the pragma. The code is semantically
still valid and a Modula-2 implementation could be written to match it.

Pragmas are non-semantic directives, just the right tool for the purpose.

So, I suggest you consider changing this to the <*FFI="C"*> pragma that
we use in our M2R10 specification, but this isn't anything new, like I said
some old compilers I have seen used pragmas, too.

Second, the purpose of a foreign definition module is to provide a
Modula-2 interface for a foreign library, that is to say, there is no
implementation module then.

This won't work when you want to write a Modula-2 library for use
within C. You will have to supply a Modula-2 definition and implementation
module for that purpose. The compiler then needs to translate the
definition module into a matching C header file, and the implementation
module into an object file that can be used from C together with the
generated header file.

Thus, we have two different scenarios:

(1) using a C library from within Modula-2
(2) providing a Modula-2 library for use within C

I am talking about scenario #2, your DEFINITION FOR "C" syntax
is for scenario #1 and probably shouldn't be shoehornder onto #2
because there are significant differences in how the two scenarios
need to be handled by the compiler. It is better to separate them.

Before this background, I'd suggest a different pragma for your
use case of having an entirely flat namespace:

DEFINITION MODULE FooLib <*FLATNAMESPACE*>;

However, whilst there may be some cases where this may be
useful, you probably don't want to write any larger piece of code
using this mode.

You would then have to write ALLCAPS for all your constants and for
all your enumeration types and enumerated values, for example.

And you are giving up one of the major advantages of Modula-2 that
can easily be auto-translated to C by automatic module prefixing.

Plus, you need to be aware of any potential name clashes with C, 
for example you couldn't declare a variable switch, since that is
a reserved word in C. There are good reasons why we have
automated the translation of programming languages.

When you write the code you want to focus on the task at hand,
not have your mind divided by name translation issues.

   1.  <* gcc-name: foo_bar *>   The attribute will override the symbol
       name as given to the GCC backend.

Having a pragma to supply a custom name for the occasional identifier
is certainly useful and I have already put that on my to do list for my
compiler/translator

PROCEDURE FooBar <*CNAME="foobar"*> ( baz : Bam );

where CNAME stands for custom name, not C name.

However, if you want to write an entire library this way, again, that
would be very cumbersome. Automation is your friend. And then you
supply custom names only in the odd cases, that's much less hassle.

 
   2.  <* gcc-mangle: (format specifiers to determine style of mangling)
       *>

I would recommend not doing this in your code, but by compiler switch.

You may want to change the output format later for a different purpose
and assuming that you may at some point have support for multiple
styles, then it would be a major hassle to go into every file and change
all those pragmas.

Basically, this should be considered akin to a different target architecture.
You don't tell the compiler in the code what architecture it should generate
code for. You do that on the command line or in the make file, but not in
your code.

And the same reason why you do it that way also applies here.

 
   3.  Any symbol containing a leading or trailing or consecutive
       occurrences of lowline chars attracts a warning message.

I would make the feature mutually exclusive with unrestricted use
of lowline in identifiers. 

   4.  Non exported identifiers appear as symbols with no mangling.

That is only sensible if you can say with certainty that GM2 will never
ever generate C source code from Modula-2 input.

Because if you don't transform the private identifiers in the same way
then your generated code will have a mish mash of different styles. And
if you ever generate C output, this will become visible and it will get the
code rejected by any self-respecting open source project out there.

       5.  The default namedpath__modulename__procedurename schema
       is applied.

As mentioned above, it would be a minor effort to add a framework prefix
to my library, even though it is rather unlikely you'd want to use PIM/ISO
library dependencies in any C based project, so you'd likely avoid using
them when you develop a library for this purpose.
 

The detail is in [2] above.  [1] and [2] can occur on a scope or per
identifier declaration.  Mangling specifiers were used in p2c iirc.
But I had thought that some of the format ideas could be taken from
https://github.com/gcc-mirror/gcc/blob/master/gcc/m2/gm2-compiler/M2MetaError.def
might be useful to drive/implement the format specifier code.  This would
allow users to specify the mangling schema on a per module or per
identifier basis if required.

If you go from camel- and title-case in Modula-2 to a snake-case/macro-case style
in C, then you have a lossy conversion where there is opportunity for name clashes.

For this reason, you can't treat all symbols with the same transformation. Instead,
you need to slightly different transformations for each kind of identifier, ideally you
have one transformation each for (1) constants, (2) types, (3) variables, (4) functions
and (5) procedures. Then you need to have a transformation for each of these with
the exception of variables for nested functions/procedures, and in PIM/ISO dialects
also one of each for nested modules.

So, you would need to write quite a lot of boilerplate if you were to come up with
some notation that specifies transformations. Again, cumbersome to use.

As I understand it a LGPL library can't be used as part of GCC and there
are two legal prerequisites:

   1.  the licence should be GPLv3 for the compiler or GPLv3 with GCC runtime
       exemptions for a runtime library.
   2.  copyright has to be signed over to the FSF.

(but I stand to be corrected :-)
 
Those are certainly not legal prerequisites for inclusion into GPL projects. The
LGPL was specifically designed by Eben Moglen (FSF counsel) to allow inclusion.

They may be GCC policy though. That I wouldn't know.

However, the library will likely need some modifications if you want to incorporate
it into GM2. For example, in the user level module I pass the identifiers as
interned string objects created by my interned strings library. There is no need
for that, but since I intern all lexemes in my translator/compiler, it is just the
most sensible thing to do. But if you want to incorporate it, I will modify that
so the procedures take a const char* instead. Easy to do.

Anyway, the point is, that whatever adjustments I would be making for
incorporation into GM2, I could relicense under whatever license and terms
you need. That's not an issue.

There would need to be some mentioning that the code was derived from
my original library and relicensed for GM2 so that the FSF cannot then
knock on my door and tell me that my original library is a knock off. ;-)

regards
benjamin

reply via email to

[Prev in Thread] Current Thread [Next in Thread]