[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[DotGNU]Yet another volunteer
From: |
Marco Manfredini |
Subject: |
[DotGNU]Yet another volunteer |
Date: |
Mon, 06 Aug 2001 04:40:56 +0200 |
User-agent: |
Mozilla/5.0 (Windows; U; Windows NT 5.0; en-US; rv:0.9.3) Gecko/20010801 |
Hello, my name is Marco Manfredini and some of you probably remember my
from #dotgnu as marquise.
I'm reading the discussions regarding the dostgnu-IL support with great
interest and would like to make a couple of annotations from here.
I was looking on the gcc->bytecode problem, after I read the call for
volunteers and checked out this document:
http://cobolforgcc.sourceforge.net/cobol_toc.html
which has a intriguing chapter 14 that deals with the undocumented
internals of gcc, mainly the AST->RTL part.
Was I read was disillusioning. Not only that the author clearly shows
the coding problems that are raisen by the current coding practice in
gcc, he also made clear that neither RTL or the "tree" ADT are powerful
enough to preserve the semantics (i.e. signatures, expression logic) of
the source code in a reasonable manner.
MS-IL/Metadata works AFAIK on the premise, that you can reconstruct the
signatures of the "exported" entities (functions, classes). This is
required for different reasons. One is, that you can #import an
assembly, for reuse it in you code, another is for bytecode-validation
(i.e. security). I can only suppose that the type-information problem
can be solved by interpretation of debugging information (given the
"RTL-wants-registers" problem can be solved), which looks like a kludge.
Compiling the gcc languages to IL will raise more issues. For example,
it will probably turn out that the ulgy "managed" stuff (including the
__gc et.al. keyworkd) has to be supported with gcc, because data under
the control of the runtime cannot be handled so frankly as C/C++ allows it.
But what I want to say is, that we should not stick on the byte-code
idea for the dotgnu-IL because of these reasons:
* Sure, the dotgnu-IL would support signatures of the external visible
entities, but a byte-code representation would loose the expression
semantics. This becomes clear if we consider a JIT which tries to
generate optimal code for a platform. The usual approach is to
reconstruct the expression tree from the byte code, because the
expression tree gives a better overview about what happens and allows
global optimization and inlining. So why should we remove information
that has to be reconstructed later?
* A byte-code representation is too low-level which makes it less
expressive. It deals with "values", but no more with
"types","functions","scopes". This can make things very hard, for
example predicting which variables are active and alive for a given
byte-code line, and which associated cleanup code they have. This
information would be neccesary to implement dtor-cleanup, and had to be
included in the generated code as "Meta-Information". On the other hand,
information about the currently active values and their types could be
also used to implement an efficient garbage collector scheme, but once
we defined the IL-Format we are more or less stuck to the amount of
information we put into it.
* I dont't say that byte-code cannot do what I want, but I am sure that
interesting applications (instantiation of generic types and function,
global code optimization and inlining, aspect-oriented programming) are
made unneccesarily hard.
I agree that we should invent before they patent and want to suggest
that we should follow, what I can recognize from the previous posts on
this list: The idea to use a semantical rich tree structure as a IL,
which preserves as many informations about the code as possible (without
being to language specific) and that we should give it a stupid name.
I'd propose "melody", since this project offers a lot obvious puns on
the background of C#.
I am currently developing ideas on how melody could look like, and try
to gather information about similar ideas (i.e. anything that's like
"compressed binary AST" as IL). Less for inspritation, but for the our
invention claims.
Currently my melody boils down to the idea, that we'd have constructor
expressions which build the semantitic entities we wish to represent. To
give you an example. This C-function:
bool even(unsigned num)
{
unsigned k=num%2;
return k==0;
}
would look in a human-readable version of melody as
001 function $even($num: uint32):boolean
002 {
003 let($k:uint32 ; modulo<uint32>($num, 2);)
004 {
005 (equal<uint32>($k,0))
006 }
007 }
The text-version looks like paraphrase of the original C function, which
doesn't mean that I cheat, but illustrates how much I would like to
preserve.
Just a short description on happes here:
First, the operations which do "stuff" look like this:
op<type>(parameters).
You see for example modulo<uint32>($num,2). This means "calculate modulo
with two parameter which are uint32". The compiler found, that $num is
unsigned and compiles the right function. This is obvious if you
remember that CPU usually has instructions like add.l addu.l addu.w,
i.e. the right op-code for the right operand.
Btw. If we can maintain a readable textual representation for melody,
then it would also make a valuable debugging tool, since you can see the
hidden operations (conversions, temporaries etc.) that you compiler has
produced to compile you innocent looking code.
line 001 starts the constructor for a function which is named $even (to
distiguish the name from the "keywords"). The num parameter has become
an uint32, the C-translator decided that "unsigned int" should be a
32-bit value. (Btw. In RTL or gcc-tree the analogous expression wouldn't
mention anymore, that the parameter is an /unsigned/ value. It only
contains the information that it is a 32-Bit value, but we need this
information to extract the signature.).
Now line 3 is interesting, because it shows how locals are introduced.
I'd propose to use the let-syntax which you might be familiar with, if
you are in functional programming. If not: My idea is, that every
statement represents a value. For example the above "function"
represents a ..eh..function. The let statement again, is the value of
its expression with a certain name binding. In Scheme I can say:
(let ((x 0)(y 0)) (+ x y)
You can read this as: the value of this statement is x+y where x is 0
and y is 0.
The melody-let statement here says: the value is $k=0 where $k is $num
mod 2.
So the let contains a lot of information: It says, that $k is only
available during the inner expression (which tells us about its block
scope), it says that $k is initialized with '$num mod 2' (which is
different from assigment, if you are familiar with C++ you will
understand what I mean). Note that let has a third parameter (left
blank), which is /the optional cleanup expression/ which has to be
evaluated when the let-body is evaluated. This can be used by a compiler
to represent its destructor semantics! (Now, do you see the fine
difference between initialization and assigment? If the initialization
expression generates an exception, then we are /not/ calling the cleanup
expression, since there is nothing to cleanup!).
Well, this is just an example I made up to entertain you, I admit that
there a still lot of things to consider (data aligment, how do I define
classes & operations on them, avoiding machine dependencies etc.), but I
hope you get the picture. But here's another teaser for you. Imagine
what you can, if melody would allow you to define functors (that is
functions in the melody language). For example:
functor even(T)
{
function ($num:T)
{
let($k:T ; modulo<T>($num, 2);)
{
(equal<T>($k,0))
}
}
}
melodies value-types are functions, structs, expressions. It can operate
on a type like on a number. So a "functor" implements essectially
genericity, since he f.e. compute a new function by replacing the
type-parameter!! This means, you can essentially /compile templates into
melody code/ and instantiate these generics on demand! And I can inline
them as well, because I'm producing a new expression tree, not stupid
bytecode.
Well, I hope I'm not completely on the wrong track and that the idea is
useful for the project. Comment please!
--
Marco
bytecodes? we don't need no stinkin' bytecodes.
P.S.
It should be possible to build a frontend to gcc which understands melody.
- [DotGNU]Yet another volunteer,
Marco Manfredini <=
- [DotGNU]Melody, Norbert Bollow, 2001/08/06
- Re: [DotGNU]Melody, Marco Manfredini, 2001/08/06
- Re: [DotGNU]Melody, Norbert Bollow, 2001/08/06
- Re: [DotGNU]Melody, Scott Lanham, 2001/08/06
- Re: [DotGNU]Melody, Marco Manfredini, 2001/08/07
- Re: [DotGNU]Melody, Norbert Bollow, 2001/08/07
- Re: [DotGNU]Melody, Marco Manfredini, 2001/08/07
- Re: [DotGNU]Melody, Scott Lanham, 2001/08/07
- Re: [DotGNU]Melody, Marco Manfredini, 2001/08/07
- Re: [DotGNU]Melody, Scott Lanham, 2001/08/07