[DotGNU]How CVM Works

dotgnu-general
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[DotGNU]How CVM Works

From:	Rhys Weatherley
Subject:	[DotGNU]How CVM Works
Date:	Sat, 01 Dec 2001 13:48:08 +1000
There have been some inquiries as to how the new engine
core for Portable.NET works, so here is an overview for
interested parties.  It also provides a useful roadmap for
those who want to hack on the code - here's how to get
started.

------

The new engine core executes IL applications as follows:

1. The application binary is loaded into memory, and any
references to external libraries are resolved.  If necessary,
this may cause extra images to be loaded into memory.

2. An "ILExecThread" object is created to represent the
flow of control within the application.  Right now, there is
only one such object in the system, but eventually there
will be one per executing thread (real threads will be
tackled when more of the engine is working properly).

3. A call is made to "ILExecThreadCallMethod" (call.c),
which causes a method call to happen on the "Main"
method within the context of the newly created thread.

4. The function "_ILConvertMethod" (convert.c) is called
to determine how to handle the method.  There are three
possibilities: IL, internalcall, and PInvoke.  IL is the
interesting case.

5. "_ILConvertMethod" invokes the bytecode verifier
"_ILVerify" (verify.c), which chomps through the bytecode
to do two things: check that the instruction stream is
correct (and optionally, secure), and call the CVM coder.

6. Once the verifier is happy that an instruction is valid,
it passes all of the information about it to the CVM coder.
This information includes the name of the instruction,
and the types of its arguments.  For example, the verifier
might call the following for a "neg" instruction:

ILCoderUnary(coder, IL_OP_NEG, ILEngineType_F);

This indicates that the verifier has inferred that the
type of the value on the stack is "float".

7. Given this information, the CVM coder (cvmc.c) can
generate the correct set of instructions to perform the
operation.  It doesn't need to know anything about type
inferencing, security, etc.  It assumes that the verifier
has done the heavy lifting.

The bytecode stream that is output by the CVM coder
uses a simplified set of instructions, which more closely
matches the underlying machine's characteristics while
remaining portable across platforms.  By the end of
the conversion process, the polymorphic nature of IL
instructions is completely removed, leaving only the
simplest possible operations.

8. Once verification is complete, the CVM coder has
created a CVM version of the method in the CVM
translation cache.  "ILExecThreadCallMethod" now
sets up a stack frame and calls the CVM interpreter
(cvm.c).

9. The CVM interpreter executes the code, much like
a JVM bytecode interpreter does.  Because the
instructions are very simple, execution proceeds
quickly.  Much more quickly than trying to execute the
polymorphic IL instructions directly, like Mono does.

10. When the interpreter encounters a call to a method
that doesn't yet exist in the CVM translation cache,
it invokes "_ILConvertMethod" and the process repeats.
Over time, the most commonly used methods end up
in the cache and no further translation is necessary.

11. If "_ILConvertMethod" encounters an internalcall
or PInvoke method, which refers to external native code,
it will generate a stub using the coder.  This stub marshals
its arguments into the correct native form, invokes the
native function, and then marshals the return values
back onto the CVM operand stack.

------

Astute readers will notice that this process is almost
identical to how a JIT works.  It was designed that way
on purpose.  It is possible to implement a new coder
that outputs code for a different instruction set.
x86 machine code, say.  Then one would have a fairly
simple JIT.

It is even possible that a coder could be written as a
front-end for a burg-style JIT, like Mono uses.  That way,
you'd get very efficient machine code with a small amount
of effort.

The biggest advantage of the approach that I've taken is
that it isn't necessary to write all of the type inferencing
code when you write a new coder.  That has already been
done for you in the verifier.  This removes a big source
of potentional bugs.

It will be a while before I get around to writing a JIT
coder, but if someone else wants to give it a go, then be
my guest.

------

Cheers,

Rhys.
[Prev in Thread]
Current Thread
[Next in Thread]
[DotGNU]How CVM Works, Rhys Weatherley <=
Prev by Date: Re: [DotGNU]GNU Lightening
Previous by thread: [DotGNU]GNU Lightening
Index(es):
- Date
- Thread