paper on replication of simulations (comments welcome)

swarm-support
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
paper on replication of simulations (comments welcome)

From:	Theodore C. Belding
Subject:	paper on replication of simulations (comments welcome)
Date:	Fri, 14 Jan 2000 04:49:35 -0500 (EST)
Exact Numerical Replication of Computer Simulations:
Some Pitfalls and How To Avoid Them.

Theodore C. Belding <address@hidden>

January 14, 2000

DRAFT --- Comments welcome!

Archived at: 
http://www-personal.umich.edu/~streak/papers/numerical-replication-20000114.txt

ABSTRACT

A computer simulation that uses IEEE standard floating-point
arithmetic may not produce exactly the same results in two different
runs, even if it is rerun on the same computer with the same input and
random number seeds. Researchers should not simply assume that the
results from one run replicate those from another but should verify
this by actually comparing the data. However, researchers who are
aware of this pitfall can reliably replicate simulations, in
practice. This paper discusses the problem and suggests solutions.

THE PROBLEM

When we perform computer simulations, it is often useful to be able to
replicate a run exactly, so that those results of the second run that
we care about are exactly the same as those of the first. For
instance, if we notice a strange result in a run, it is useful to be
able to redo the run exactly, using the same parameter settings and
random number seeds, but this time collecting additional data in order
to test hypotheses about what is causing the strange results.

However, a program may produce different results on two different
computers, even if the same input and random number seeds are used.
This is because different computers may perform floating point
calculations differently, even if the computers all follow the IEEE
754 floating-point standard [16].  This may happen, for instance, when
we run a simulation once on a Hewlett-Packard (HP) workstation running
HPUX and once on an Intel x86 computer running Linux. The HP
workstation uses IEEE single-precision (32-bit) and double-precision
(64-bit) floating point arithmetic, while the x86 uses IEEE
extended-precision (80-bit) floating-point arithmetic. (The Motorola
680x0 is another common CPU family that uses extended precision.) The
x86 floating-point unit (FPU) can be switched into "single-precision"
or "double-precision" mode; unfortunately, even when it is in one of
these modes, it will still produce different results than an HP
workstation would, since its internal registers will still use more
bits of precision for the exponents (15 bits instead of 8 or 11) [8].
To ensure that the x86 FPU always produces the same results as the HP
FPU, intermediate results must be transfered from the FPU to memory
and back on the x86. Also, if the x86 FPU is performing multiplication
or division, one of the operands must be scaled down before the
operation, and the result scaled back up afterwards [8].

If the results of a simulation depend on many floating-point
calculations, this difference in precision may cause the two runs to
produce wildly different results. This is particularly likely in
simulations of complex systems, such as a genetic algorithm, where the
simulation's precise trajectory is highly sensitive to the initial
conditions and to the stream of random numbers. Even if the different
runs produce the same qualitative results, the numeric results may
differ.

This may occur with any program that uses native IEEE floating-point
arithmetic, written in any language, on any computer or operating
system.  These discrepancies may also occur in integer arithmetic, but
only if a program makes unwarranted assumptions about the size or
representation of integer variables (for example, assuming that C
variables of type "int" are 32 bits long).

In fact, a program may produce different results when run twice on the
same computer, even if the same input and random number seeds are
used.  This is because the results produced by a program depend not
only on the computer's floating-point unit and operating system but
also on the compiler, the compile-time options, the compile-time and
run-time libraries installed, and the input (here I include the date,
the run-time environment, and the random number seeds). For instance,
the discrepancy may occur if we run a simulation twice on a x86
computer, where the simulation is compiled the first time to store
intermediate floating-point results to memory, and the second time to
keep the intermediate results in the CPU floating-point unit. It is
even technically possible that the results may depend on what other
programs are running on the computer, or on bugs in the program,
compiler, or libraries --- this is especially true if the program is
not carefully designed and implemented. Therefore, each time a
simulation is run, it is prudent to act as if it were run on a
different computer, even if the computer is in fact always exactly the
same.

Guaranteeing that two runs of a program will produce exactly the same
results is extremely difficult and may be impossible, in practice.
Every component which might affect the results would have to be
guaranteed to be the same for both runs; no component on the computer
could ever be changed or upgraded.  On the one hand, determining and
recording all of this data would be extremely expensive in time and
storage --- for instance, it might entail calculating the MD5 message
digest for every RPM package installed on a PC running Red Hat Linux
and storing these results with the data.  On the other hand, it will
be extremely difficult to weed out false positive results when testing
whether two computers have different components --- the fact that one of
two otherwise identical computers has a copy of the game Quake
installed probably will not affect whether a simulation will produce
identical results on the two machines. Finally, the date is always
changing, and this might have unforeseen effects on a program's
behavior. (Consider the recent Y2k problem, or a bug that depended on
the phase of the moon [13].)

RECOMMENDATIONS

If guaranteeing that we can replicate a run is not an option, what can
we do? I suggest that instead of asking how we can guarantee
replication, we should ask a different question: What is the
worst-case result that can occur because of this problem, and how can
we avoid it? The worst thing that can happen is that we mistakenly
believe that two sets of simulation results are the same, when they
are in fact different. Our main concern should be to avoid this
mistake. Luckily, there is an easy way to avoid it: simply compare the
data sets. If they are empirically identical, we are done. (Of course,
if we do not record enough data from each run, it is possible that the
runs' actual trajectories may be different, even though the data are
the same.)  If the results are not identical, then practical
experience suggests that it is likely that we can still replicate the
original simulation by tweaking a few common compilation or run-time
parameters.

Therefore, we need a set of easy-to-use tools to compare results from
two runs, and we should use these tools even if the runs were done on
the same computer. In some cases, where entire files need to match
exactly, the Unix diff command may suffice. In other cases, I suggest
using Rivest's [17] MD5 message digest algorithm.  This algorithm
produces a short string (called a hash) that is easy to store with the
data that it is computed from.  It is easy to write a short Perl
program [15,19] to compute this string from a data file, excluding
comments and data that we do not need to replicate, such as the date
of the run. If it is necessary to ensure that a dataset has not been
tampered with in any way, there are cryptographically secure methods,
such as signing the data set with PGP or GnuPG [12], or using another
message digest algorithm, such as RIPEMD-160 [2,3]. (MD5 should not be
used for this purpose [18]!)

Often, this will be all that is necessary to replicate a run.
Sometimes, however, we will not get identical results just by using
the same input and random number seed. In this case, we can almost
always replicate the results by tweaking a few special compile-time or
run-time parameters (such as whether intermediate floating-point
results are stored to memory from the floating-point unit).
Experience suggests that replication is very easy to achieve in
practice, even though it may be impossible to guarantee. In addition
to the techniques for comparing results mentioned previously, we need
a set of heuristics for replicating results, such as a list of
compile-time and run-time parameters that often need to be
tweaked. Currently, I only use one heuristic: To replicate the results
of a simulation run on an IEEE single- or double-precision
floating-point platform (such as an HP or PowerPC workstation) on an
IEEE extended-precision platform (such as x86), compile the simulation
with gcc (the GNU Compiler Collection [4]) with the "-ffloat-store"
option [5]. This will cause intermediate floating-point results to be
stored in memory and then reloaded into the FPU. Note that this will
not guarantee replication unless additional rescaling is also done
[8], and it may slow down a floating-point-intensive simulation by a
factor of 2 to 4. To make replication easier, the compile-time and
run-time parameters that are used should be stored with the data,
along with information such as the program and compiler versions, the
date, the name of the machine being used, the platform and operating
system, etc. We need tools that make storing this kind of information
easy, as well. (Perl [15] is an example of a program that stores a great
deal of configuration information at compile-time; the information is
accessible under Unix by running "perl -V".)

CONCLUSION

In summary, these are problems that everyone doing computer
simulations should be aware of, but they are not insurmountable. In
practice, a few simple techniques should be sufficient to avoid
problems. First, we should never assume that the results from two runs
using the same parameters and random number seed are identical, even
if they are run on the same computer. We should always verify this,
either by comparing the relevant results directly or by comparing the
MD5 hash string from the two datasets. This verification process
should be made so convenient that there is no reason not to do
it. Secondly, we should compile a knowledgebase of likely parameters
that can be tweaked to achieve replication, if simply redoing a run
with the same input and random seeds does not suffice.

FURTHER READING

For a technical discussion of this problem, see Priest [16] and the
Java Grande Forum Numerics Working Group's draft report [8]. For a
gentle introduction to floating-point arithmetic in general, see
Patterson and Hennessy [14] or Goldberg [7]; for a more technical
discussion, see Goldberg [6] or Knuth [11]. Kahan and Darcy [10] argue
that it is undesirable to enforce exact replicability across all
computing platforms, and Kahan [9] gives an example of differences in
floating-point arithmetic in different versions of Matlab on various
platforms.  Axtell, et al. [1] discuss the differing degrees to which
a simulation can be replicated.  See Rivest [17] and Robshaw [18] for
information on the MD5 message digest algorithm; for a Perl interface
to MD5, see Winton [19]. For information on RIPEMD-160, a more secure
replacement for MD5, see Bosselaers [2] or Dobbertin [3].

ACKNOWLEDGEMENTS

I thank the members of the egcs and gcc mailing lists for answering
questions and providing information and references.

REFERENCES

[1] Axtell, Robert L., Robert M. Axelrod, Joshua M. Epstein, and Michael
D. Cohen.  (1996). Aligning simulation models: A case study and results.
Computational and Mathematical Organization Theory 1:123-41.
Earlier version available as Santa Fe Institute Working Paper
95-07-065. 
http://www.santafe.edu/sfi/publications/Working-Papers/95-07-065.ps

[2] Bosselaers, Antoon. (1999). The RIPEMD-160 Page.
http://www.esat.kuleuven.ac.be/%7ebosselae/ripemd160.html

[3] Dobbertin, H., A. Bosselaers, B. Preneel. (1996). RIPEMD-160, a
strengthened version of RIPEMD. In Gollmann, D., Ed., Fast Software
Encryption. LNCS 1039.  Springer-Verlag. pp. 71-82.

[4] GNU Project. (1999). GCC [GNU Compiler Collection] Home
Page. http://gcc.gnu.org/

[5] GNU Project. (1999). Using and Porting GNU Fortran.
http://gcc.gnu.org/onlinedocs/g77_toc.html or execute "info g77" on a
computer that has a recent version of g77 installed.

[6] Goldberg, David. (1991). What every computer scientist should know
about floating-point arithmetic. Computing Surveys 23(1):5-48.
http://www.validgh.com/goldberg/paper.ps 
This online version includes the addendum by Priest [15].

[7] Goldberg, David. (1996). Computer Arithmetic. Appendix A in
Hennessy, John L., and David A. Patterson, Computer Architecture: A
Quantitative Approach. 2nd ed. Morgan Kaufmann.

[8] Java Grande Forum Numerics Working Group. Improving Java for
Numerical Computation. Public draft, October 30, 1998. II: Discussion
of Critical Issues. Issue 4: Use of floating-point hardware.
http://math.nist.gov/javanumerics/reports/jgfnwg-01.html#floating-point

[9] Kahan, William. (1998). Matlab's Loss is Nobody's
Gain. Unpublished.  http://www.cs.berkeley.edu/~wkahan/MxMulEps.pdf

[10] Kahan, William, and Joseph D. Darcy. (1998). How JAVA's
Floating-Point Hurts Everyone Everywhere. Talk presented at ACM 1998
Workshop on Java for High-Performance Network Computing, Stanford
University.  http://www.cs.berkeley.edu/~wkahan/JAVAhurt.pdf

[11] Knuth, D. (1998). The Art of Computer Programming, Vol. II:
Seminumerical Algorithms. 3rd Edition. Addison-Wesley. ISBN
0-201-89684-2.

[12] Koch, W. GnuPG [GNU Privacy Guard]. http://www.gnupg.org/

[13] Linuxcare. (1999). Richard Stallman: The Coder. Linuxcare
interview, Dec. 14, 1999.
http://www.linuxcare.com/news_columns/interviews/1999/12-14-99.epl

[14] Patterson, David A., and Hennessy, John L. (1997). Computer
Organization and Design: The Hardware/Software Interface. 2nd. ed.
Morgan Kaufmann. ISBN 1-55860-428-6 Chapter 4: Arithmetic for
Computers

[15] Perl homepage. http://www.perl.com/

[16] Priest, Doug. Differences Among IEEE 754 Implementations. 
Addendum to Goldberg [6].
http://www.validgh.com/goldberg/addendum.html 
It is also included in the online version of Goldberg [6] at
http://www.validgh.com/goldberg/paper.ps

[17] Rivest, R. (1992). The MD5 Message-Digest Algorithm. RFC 1321.
http://info.internet.isi.edu:80/in-notes/rfc/files/rfc1321.txt

[18] Robshaw, M.J.B. (1996). On Recent Results for MD2, MD4 and
MD5. RSA Laboratories' Bulletin Number 4. Nov. 12, 1996.
ftp://ftp.rsa.com/pub/pdfs/bulletn4.pdf

[19] Winton, Neil. MD5.pm (MD5 Perl module).
http://www.perl.com/CPAN-local/modules/by-module/MD5/

[fixme: add code examples]
[fixme: add definition of floating point vs. integer arithmetic]
[fixme: define types of replication]
[fixme: cite IEEE 754]
[fixme: add Perl MD5 code example]


                  ==================================
   Swarm-Support is for discussion of the technical details of the day
   to day usage of Swarm.  For list administration needs (esp.
   [un]subscribing), please send a message to <address@hidden>
   with "help" in the body of the message.
[Prev in Thread]
Current Thread
[Next in Thread]
paper on replication of simulations (comments welcome), Theodore C. Belding <=
Prev by Date: Re: files required
Next by Date: Re: testing memory problems
Previous by thread: Inconsistency in Map/KeyedCollection?
Next by thread: Speed
Index(es):
- Date
- Thread