certi-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [certi-dev] Re: CERTI-Devel Digest, Vol 28, Issue 5


From: Christian Stenzel
Subject: Re: [certi-dev] Re: CERTI-Devel Digest, Vol 28, Issue 5
Date: Fri, 25 Apr 2008 14:02:11 +0200
User-agent: IceDove 1.5.0.14pre (X11/20080305)

Hello,

now a longer post follows because this discussion is very interesting for me. I try to
formulate some of my own views to that subject.

@Erk
OK I see.
For such computation MPI seems more appropriate
are porting the code from MPI or do you "usually" have
such high volume data exchange?

If I'm even more curious could you tell me at which frequency
you send the matrix?
Here are some more details:
The principle concept is introduced and discussed in that paper:
http://www.mb.hs-wismar.de/~stenzel/publications/sne_16_2_paper_p51_p56_hla_military_ship_design.pdf

In short:
We've worked and are still working in the area of ship design processes. One aim is to have something like
a virtual ship in an early design phase.

Furthermore, we have a model to simulate more or less realistic seaways.
There is a federate computing a sea height matrix as fast as possible.
This matrix is transferred to a federate visualizing the seaway and a vessel in that seaway. So the time constraints are clear: In minimum we have to be as fast as realtime.

For the animation we need ca. 20 pictures in one sec (better would be >=25). This means, that we have to compute and transfer one sea height matrix in 1/20 sec. The computation time
depends mainly on the matrix dimensions.

Obviously this has nothing to do with classical distributed discrete event simulation. Here HLA
is used only as "data-exchange" middleware.

Anyhow, there are different reasons why we use HLA here:
- the NATO STANAG for Virtual Ships intends the usage of HLA
- in principle, HLA should be capable to do such communications (nothing in the standard says that
 HLA can not do that)
- adding more federates with different time advancing (TAR, NER) strategies
 should be easily achieved

I think we may collaborate for this if you can create
a small "test case" like a HugeUAV_federate which may
be launched  easily like

rtig
HugeUAV_federate -size 100000
HugeUAV_federate
etc...

the HugeUAV_federate will try to exchange a value of size -size
first federate creates federation, other are automatically "subscriber"
if "FederationAlreadyExists".

using this we may investigate the problem easilly and
with a dtest script added we will add this to our "regression test cases" box;


Yes, I will do that.

@Pierre
Hello Eric,
mon cher voisin de l'autre côté du couloir,
This are the new possibilities of digital communication :).
Discuss problems publicly on a mailing list instead of moving
to the other end of the couloir ;).

But this discussion is very interesting so I try to formulate
my position:


I think that the discussion is more open between MPI and HLA.
I have directed an internship on the subject of scientific computation with HLA
(parallel resolution of a linear system, Kathrin Quince, 2006).
We had also the problem of a huge matrix transfer, our solution has been
to transmit the matrix by blocks. It was not very efficient, but, after that,
the following computations were correct.
Are there written results of that internship?


Why to do scientific computation with HLA ?
To avoid a gateway (overhead) between various execution environments
(burden of mastering and deploying many tools).

Do we have a lot of applications which integrate scientific computation
in a distributed event-based simulation ?
I am thinking to the simulation of avionics systems which could require
more elaborate models of the plane and environment physics.
Christian could add more examples here ?
The general term "scientific computation" includes a wide field. Wiki says:
Computational science (or scientific computing) is the field of study concerned with constructing mathematical models <http://en.wikipedia.org/wiki/Mathematical_model> and numerical solution techniques and using computers to analyze and solve scientific <http://en.wikipedia.org/wiki/Scientific>, social scientific <http://en.wikipedia.org/wiki/Social_science> and engineering <http://en.wikipedia.org/wiki/Engineering> problems. In practical use, it is typically the application of computer simulation <http://en.wikipedia.org/wiki/Computer_simulation> and other forms of computation <http://en.wikipedia.org/wiki/Computation> to problems in various scientific disciplines.

My opinion is that also many event-based simulations refer to the field of scientific computations.
Event-based simulations are often a very high abstraction
of real systems. In some parts of the engineering community
the modeling of dynamic systems via differental equations or partial differental
equations is more common.

There are two common ways to combine continous and discrete models to so-called hybrid models.
One way is to detect events in a continous simulation, a good
standard example is "the bouncing ball". Each time the ball hits the
ground an event occurs and the differential equation describing
the ball trajectory is changed.
The other way is to integrate ODE- resp. PDE-solvers in discrete event simulations. A special event causes the computation of e.g. some trajectories. This approach
is more generell as the first one.

Again the example of the bouncing ball: If I'm interested in the trajectory I can model the behaviour through a differential equation or easier through a straight without any influence of the acceleration. Then I have to compute the position of the ball for each time step and check if it hits the ground. After that I can invert the direction. This is a hybrid continous simulation
with event detection.

But if I'm only interested in the collisions with the ground, I can advance my simulation time exactly to the time in which the collision takes place. This refers to the processing in event oriented simulations. Additionally if I like to know more about the deformation of the ball when hitting the ground, the event "ground hit" can be used to initiate the computation of a continous "damage model". This refers to hybrid simulation
based on event simulation.

I suppose both simulations can be assigned to the field of "scientific computation".

Before HLA, DIS and ALSP the only reason to compute such problems in parallel were to become faster than the sequentiell solution. This was the only motivation for parallel computing. The researches in this area show that the problem have to fulfill some preconditions to get something like a speedup.

Besides Amdahl's law the ratio between communication effort and processing effort is very important. This ratio is called granularity. The granularity is always application and implementation specific. E.g parameter studies like a Monte Carlo study have a good ratio. This means the communication effort is low and the processing effort high. Instead population simulations (predator-prey-models) need a lot of communications. Typically this problems
do not reach any speedup. Often the speedup is below 1.

All problems with a small granularity value are appropriate to compute in a distributed fashion (PDES: LPs have a big lookahead). Probably we will get a speedup. This means that such an application can run on top of nearly any middleware (see today's grid applications). When the communication effort increases the efficiency of the underlying communication
infrastructure becomes more and more important (as Eric already mentioned).

PDES can also be regarded as or better is an application of parallel computing (and scientific computing). The motivation is here also to become faster than the sequential solution. Main application areas of PDES are e.g. network or circuit simulations. Typically the whole sequentiell DES model is partioned in different LPs. Opposite to a parameter study, the distribution takes places on the model layer instead on the experiment layer.

I think PDES and that's why also scientific computing can be done with HLA. That's why HLA supports TM services.
Here the synchronization is done through the middleware.

But historically, HLA as predecessor of DIS and ALSP, the main application area lies in the flied of so-called Distributed Virtual Training Environments. Interoperability of existing simulations, reusabiltity of existing simulation code and scalability are in the main focus of this middleware. TM is here used as a possibility to ensure
the causality order.

My opinion is that HLA can be used for parallel processing and in particular for PDES. The communication
efficiency depends on the used RTI and that's why on implementation details.

Are the HLA services appropriate to write parallel programs ?
A first answer, we can write such programs with HLA.

Are the data management services appropriate ?
We can express point to point communication (a single publisher
and a single subscriber).
We can express one-to-many communication (data distribution).
We can express many-to-one commmunication (reduction operation
but without the power of an binary tree communication scheme).
The DDM services can be used to indicate the receiver of some data.

Are the time management services appropriate ?
A general parallel application requires a complex synchronization
of tasks. These services can be useful.
Even a fork-join mechanism can be easily (but not intituively
for a Fortran programmer) written.

How can we explain the superiority of MPI for the data transfers ?
a) Lower overhead of the MPI layer (latency) ?
b) Execution of MPI applications on efficient architectures
(processors and networks) ?
c) A lot of data transfer optimizations ?
(even in the case of MPI above TCP ?)

What are the MPI optimizations that cannot be included
in a RTI implementation ?
In the case of CERTI, we could study the direct connection
between RTIA for some objects (with a new transport attribute).
When you analyse existing approaches for parallel computing you will
mainly find two paradigms to realize parallel applications.

On the one hand you can use message-passing systems, on the other hand
SHM based systems. Historically mp-based approaches are
applied on distributed memory architectures (e.g. beowulf-clusters) whereas
SHM is used in close coupled memory architectures. Today, the usage of a specific
programming scheme for a specific architecture is no more required,
(e.g. VSHM based on mp-systems are thinkable).

All mp-based approaches have in common, that the communication between
the sender and the receiver is explicit. That means I have to specify the
receiver of my message, e.g. PVM does it by TIDs. The synchronization
is always implicit through e.g. blocked receive operations etc..

On the other side SHM approaches have primitives for an explicit
synchronization (mutex, semaphor), the communication is
implicit (changing the value of a shared variable).

In my opinion, HLA do not really fit in one of these categories. The communication scheme in HLA is implicit mainly caused through the declaration management services. A sender or publisher of information does not know who will receive that information. A subscriber also does not know who produced that information. In the view of an HLA
application using TM services the synchronization is
also implicit because sync primitives are provided through the RTI.

I would not say that we have a new programming paradigm for parallel applications
but HLA is a little bit special :).

Yes we do.
But my point is _efficiency_, current HLA services (at least HLA 1.3
and IEEE-1516 I _currently aware of_) do not offer
efficient data exchange services usually needed by scientific computation:

- periodic exchange
- efficient broadcast in a group/sub-group
- barrier
- reduction
I'm absolut conform with Eric, MPI is a very fast "message-exchanging"
middleware.

Eric mentioned barrier synchronizations. In the field of PDES barriers
are also used in some second generation conservative sync algortihms (synchronous
sync algorithms).

I suppose that these algorithms can also be applied to an HLA implementation. At time CERTI uses the CMB-algorithm (nullmessage) for conservative synchronizations. The synchronous sync algortihm can natively handle zero lookahead. Probably it could handle small lookahead
federates much more faster than CMB.

Hope that someone find this discussion useful. For me it is a good possibility to
order my own thoughts :).







reply via email to

[Prev in Thread] Current Thread [Next in Thread]