gomp-discuss
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Gomp-discuss] Plan ... coments wanted !


From: Lars Segerlund
Subject: Re: [Gomp-discuss] Plan ... coments wanted !
Date: Wed, 29 Jan 2003 14:11:28 +0100
User-agent: Mozilla/5.0 (X11; U; Linux i586; en-US; rv:1.2.1) Gecko/20021226 Debian/1.2.1-9


 I think were starting to get down to the specifics of the problem :-)

Also were getting to the interesting small things like collecting examples, test cases and the tasks we have to acomplish.

I will se if I can find some papers on massively parrallel compilers, www.tera.com used to have them online, I don't know if this is true anymore.

 / regards, Lars.

Steven Bosscher wrote:
Op wo 29-01-2003, om 09:25 schreef Lars Segerlund:

 [snip]

  What I strive for is some general point as follows:

   1. Start with the backend and code generation.

How would you test that work then?  Hmm, I think the first point you
brought up deserves a bit more attention...


Now in order to test this I was thinking of either modifying something simple like treelang to emit the intermediate form we need, or specificly build a small set of tools to emit test cases. This in order not to complicate things too much before we have a 'proof of concept' and something to work on. It would be sufficient to have some testcases right from the openMP standards to start with.


2. Make it independent on the threading model at the level of the front- and middle end. ( the last term nicely stolen from steve :-) thanks )

3. Reduce the front end to back end interface to a set of common primitives, ( perhaps not possible, but desirable ).


Isn't it the whole idea to *make* it possible?  The interface should be
able to represent all of OpenMP, and it should be up to the front end to
fit stuff to this interface. I don't see why it would not be possible.


I was thinking of reducing the openMP primitives to an ( if possible ) simpler set of primitives, also I am not sure if this actually would complicate or simplify things, thus if possible, in the respect of being a good thing to do.


4. As far as possible simplify the work that has to be done in the front end even if this means adding an additional 'layer' to gcc, and thoroughly specify the constraints that have to be imposed by the frontend in order to make this easy to integrate for the front end writers.

5. Keep the threading model and implementation ( which lib a.s.o. ) confined to as late in the code generating phase as possible.

Certainly we don't want the compiler to have to deal with different
thread library interfaces at all?


Actually we might want the compiler to emit the calls to the thread libraries ( or native system calls on some systems ), however which of these libraries or other dependencies MUST be hidden from the compiler. Thus the compiler must handle the data dependencies and such, and leave the threading model to a later stage.

Is there a way to make the library sort of "pure", in the sense that
there are functions that only have side-effects that we know about in
the OpenMP layer at compile time?  If so, maybe that's something we
should tell the compiler about, too.


This is what openMP aims to accomplish, you specify some of these dependencies in the directives, and the compiler actually have to emit code tailored to compensate for these effects.


<BRAINSTORM>
Lars' List, 1st point:

   1. Start with the backend and code generation.


Say, we have:

Front-end --> Middle end (language independent) --> Backend

Assuming we're still talking about implementing this in the
GCC tree-optimizer framework, this translates to:

Front-end --> [ (GENERIC --> GIMPLE) --> RTL ] --> Assembly

That is, the middle-end is GENERIC, GIMPLE, RTL.  The intermediate
representation GENERIC is the first language-independent layer we
have.  GIMPLE is just lowered GENERIC, i.e. all of GIMPLE is part
of GENERIC.

Now, the approach I was thinking of is:

1. Pick the layer (or layers) where we want to implement OpenMP
   in. This is obviously crucial to get right the first time
   because it determines how much work is involved for this
   project, and how much of the existing code needs modification.

2. Define how to extend the intermediate representation in that
   layer and all layers before it to allow OpenMP information to
   be represented.  The smaller the extensions, the better. (tree
   annotations?).
   Not as crucial to get right the first time, but it should
   not change too much.

openMP is actually BLOCK structured ( as far as I can tell ), so I think that it might be possible to work with annotations, perhaps just emiting additional 'blocks'.


3. Implement stubs for all of OpenMP.  Just some functions that
   understand the OpenMP information, but don't do anything with
   it.  Easy to change if we do 1. and 2. right.  We can replace
   them one by one with more useful code as we're progressing.

Now here is an important part, these stubs might be quite valuable, since it also gives us an option to emit and collect debugging information, also profiling info would be valuable. An openMP implementation without profiling would be useless.

4. Make sure everything between the OpenMP layer and the front
   end understands and propagates the OpenMP information.  Again,
   this is easy if we get 1. and 2. right.

 [ snip ]

That leaves us with GENERIC and GIMPLE.


I think we have to work with both :-( , I will look into this in the weekend.

A second thing to consider for the OpenMP layer, is what kind of
information we need to make good parallel loops (flow graph? data
dependence analysis? Function inlining?).  I'm not very familiar with
OpenMP yet, so I don't exactly know what assumptions the compiler is
allowed to make in the presence of OpenMP directives.  But I'd expect
that at least some data dependency information would have to be
available before you can safely parallelize the loop without changing
its semantics.


Data dependencies is the biggie ! I think we have to collect some examples of the desired behavior, perhaps source->outpout code examples. Also if someone with access to an openMP capable compiler would run some tests for us we could use them as 'template' for now.

If I have got everything right, each parralized block should be optimized as a separate called function, any further optimization of the code is not of our concern, as long as we can make the compiler meet the data dependencies.

The final consideration I can think of is code quality.  If we go too
high-level, can we still generate good parallel loops?  Would we inhibit
some basic optimizations (e.g. can a constant be propagated over a call
to the OpenMP library)?  If we go too low, would that make the
implementation unnecessarilly complicated?


Now here is something I don't quite agree on, I don't think we need to have a library for openMP, I think the code should be generated to make use on a existing threading library, thus ordinary 'thread safe' restrictions apply, and there is a great amount of knowledge about thread programming.

It's been said before, we could look at Open64 to see how they did
this.  It would be nice if somebody who knows that compiler could
explain that a bit (Pop? :-P).

For Intel, there's the article that was mentioned a while ago, or
http://www.intel.com/technology/itj/2002/volume06issue01/art04_fortrancompiler/p03_overview.htm
</BRAINSTORM>

Greetz
Steven




_______________________________________________
Gomp-discuss mailing list
address@hidden
http://mail.nongnu.org/mailman/listinfo/gomp-discuss






reply via email to

[Prev in Thread] Current Thread [Next in Thread]