lmi
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

RE: [lmi] Problem of the week: testing a testing tool


From: Ericksberg, Richard
Subject: RE: [lmi] Problem of the week: testing a testing tool
Date: Mon, 8 Jan 2007 14:56:38 -0500

On 2006-12-24 12:20 Zulu, Greg Chicares wrote:

> make unit_tests build_type=mpatrol unit_test_targets=input_test
> [add '.exe' at the end for msw]
> 
> Running input_test:
>   Speed tests...
>   Overhead: [3.931e+000] 1 iteration took 3930 milliseconds
>   Vector  : [9.064e+000] 1 iteration took 9063 milliseconds
>   Read    : [1.#IOe+000] 0 iterations took 15031 milliseconds
>   Write   : [7.772e+000] 1 iteration took 7771 milliseconds
>   'cns' io: [1.#IOe+000] 0 iterations took 54502 milliseconds
>   'ill' io: [1.#IOe+000] 0 iterations took 14409 milliseconds
> 
> .... 27 tests succeeded
> no errors detected
>   mpatrol: total warnings:    0
>   mpatrol: total errors:      0
> 
> 0. What's obviously wrong here on the face of it?

[A layperson's view - before looking at the code]

a) Labels are non-descriptive? [i.e. What kind of speed test
does 'Overhead:' or any of the others really describe?]

b) What does '1.#IO' mean? It looks like a possible truncation
or overflow on the scientific notation that happens on the
numbers that exceed 4 numerals. 

c) On those that don't have the '1.#IO' and appear correct,
scientific notation and millisecond counts don't match exactly. 

d) That 0 iterations were run.

e) That running 0 iterations took time.

> 1. Which revisions introduced defects observable above?

a) Not a defect
The rest - timer.hpp revision 1.5
 
For b), d) and e) above, a description of the problem:

From timer.hpp:

template<typename F>
std::string AliquotTimer<F>::operator()()
{
  double const v =
    (0.0 != initial_trial_time_)
    ? max_seconds_ / initial_trial_time_
    : max_seconds_ * timer_.frequency_
    ;

  double const w = std::min(std::log10(v),
static_cast<double>(ULONG_MAX));
  unsigned long int const x = static_cast<unsigned long int>(w);
  double const volatile y = std::pow(10.0, static_cast<double>(x));
  unsigned long int const z = static_cast<unsigned long int>(y);
  [...]
}

The erroneous output happens when the result of std::log10(v)
is negative and then cast to an unsigned long int.
Example [test using 0.001 for max_seconds_ and no mpatrol]:

[Overhead]
          v: 0.0501144
 trial_time: 0.0199543
max_seconds: 0.001
  frequency: 3192150000

  ULONG_MAX: 4294967295
          v: 0.0501144
   log10(v): -1.30004
          w: -1.30004
  cast of w: 4294967295
          x: 4294967295
          y: 1.#INF
          z: 0

In this case, when the negative result of std::log10(v) is
cast as an unsigned long int, it's getting the largest value
of an unsigned long int [ULONG_MAX]. This happens when the
integer value is one or greater. When the integer value is
zero, you get 0 from the cast. For instance:

[Read]
          v: 0.148825
 trial_time: 0.0067193
max_seconds: 0.001
  frequency: 3192150000

  ULONG_MAX: 4294967295
          v: 0.148825
   log10(v): -0.827324
          w: -0.827324
  cast of w: 0
          x: 0
          y: 1
          z: 1

Speed tests...
Overhead: [3.302e-003] 1 iteration took 3 milliseconds
Vector  : [2.561e-003] 1 iteration took 2 milliseconds
Read    : [1.#IOe+000] 0 iterations took 11 milliseconds
Write   : [6.719e-003] 1 iteration took 6 milliseconds
'cns' io: [1.#IOe+000] 0 iterations took 53 milliseconds
'ill' io: [1.#IOe+000] 0 iterations took 19 milliseconds

When the number of iterations [z] is 0, you get the #IO
from dividing by 0 in:

  oss
    << std::scientific << std::setprecision(3)
    << "[" << timer_.elapsed_usec() / z << "] "
    << z
    << " iteration" << ((1 == z) ? "" : "s") << " took "
    << timer_.elapsed_msec_str()
    ; 

One with a positive std::log10(v) result [using 1.0
for max_seconds_ and no mpatrol] looks like this:

[Overhead]
          v: 52.3017
 trial_time: 0.0191198
max_seconds: 1
  frequency: 3192150000

  ULONG_MAX: 4294967295
          v: 52.3017
   log10(v): 1.71852
          w: 1.71852
  cast of w: 1
          x: 1
          y: 10
          z: 10 

The higher the initial_trial_time_ gets in relation to
max_seconds_, the likelihood increases that std::log10(v)
will result in a negative. mpatrol showed this problem
up due to it's significantly higher initial_trial_time_
values [using 1.0 for max_seconds_ WITH mpatrol]:

          v: 0.0696166
 trial_time: 14.3644
max_seconds: 1
  frequency: 3192150000

  ULONG_MAX: 4294967295
          v: 0.0696166
   log10(v): -1.15729
          w: -1.15729
  cast of w: 4294967295
          x: 4294967295
          y: 1.#INF
          z: 0

All this shows that having 1.0 for max_seconds_ could be
considered arbitrary. If executing timing routines under
certain circumstances [e.g. using mpatrol] causes them
to run longer than expected, those variations need to be
taken into account.

For c) Mismatched values are result of differing methods of
mathematical manipulation. The scientific notation keeps its
floating-point value and is divided by the number of iterations
[z] where the milliseconds are multiplied by 1000.0 and cast as
an int. 

For d) and e) Iteration(s) really were run and took time. The
negative number problem described above caused the display of
iterations [z] to show as one or zero as:

10.0 ^ 0 is 1 [as any number ^ 0 is 1.]
10.0 ^ ULONG_MAX ends up as 0 [apparently because of overflow.]

> 2. How could those defects have been detected automatically?

First line of defense could be to trap the problem immediately
before the attempted cast and issue a warning for the condition.

  if(w < 0.0)
    {
    warning()
      << "**** Warning: Attempting to cast "
      << w
      << " as <unsigned long int> ****\n"
      << LMI_FLUSH;
    }

Produces:

Running input_test:
**** Warning: Attempting to cast -1.28502 as <unsigned long int> ****
[file c:/opt/lmi/src/lmi/timer.hpp, line 215]
**** Warning: Attempting to cast -1.72052 as <unsigned long int> ****
[file c:/opt/lmi/src/lmi/timer.hpp, line 215]
**** Warning: Attempting to cast -0.852249 as <unsigned long int> ****
[file c:/opt/lmi/src/lmi/timer.hpp, line 215]
**** Warning: Attempting to cast -1.02904 as <unsigned long int> ****
[file c:/opt/lmi/src/lmi/timer.hpp, line 215]
**** Warning: Attempting to cast -0.39764 as <unsigned long int> ****
[file c:/opt/lmi/src/lmi/timer.hpp, line 215]
**** Warning: Attempting to cast -0.516193 as <unsigned long int> ****
[file c:/opt/lmi/src/lmi/timer.hpp, line 215]

  Speed tests...
  Overhead: [3.282e-003] 1 iteration took 3 milliseconds
  Vector  : [2.498e-003] 1 iteration took 2 milliseconds
  Read    : [1.#IOe+000] 0 iterations took 10 milliseconds
  Write   : [7.116e-003] 1 iteration took 7 milliseconds
  'cns' io: [1.#IOe+000] 0 iterations took 52 milliseconds
  'ill' io: [1.#IOe+000] 0 iterations took 19 milliseconds

http://www.testing.com/writings/reviews/maguire-solid.html
| "adding more elaborate integrity checking to subsystems in order
| to catch more bugs."

Since it's probably desirable to see all occurrences to determine
how widespread, don't fail the test immediately.

Also routinely trap any possible divides by 0.

> 3. How could those defects have been prevented?

Classify, standardize, document, disseminate and utilize rules
[protocols if you like that better] for various implementation
situations. Ex: "Be sure a numeric value is not negative before
casting as unsigned."

Rigorous unit testing at coding time [e.g. matrix of all possible
conditions encountered for that operation.]
 
Use an interactive debugger.
http://www.testing.com/writings/reviews/maguire-solid.html
| "4. the virtues of stepping through every line of code using
| the debugger." 

Code review by others.
 
> 4. How should those defects be removed?

Reference the spec from AliquotTimer in timer.hpp:

/// If the operation took longer than the specified interval in the
/// initial calibration trial, then just report how long that took.
/// Rationale: if it is desired to spend one second testing an
/// operation, but the operation takes ten seconds, then it's not
/// appropriate to spend another ten seconds for a single iteration.

There is no direct specific code that addresses this condition.
It is implicit in:

  if(1 < z)
    {
    timer_.restart();
    for(unsigned long int j = 0; j < z; j++)
       {
       f_();
       }
    timer_.stop();
    }

but this does not handle the situation where z ends up as 0 from
the error situation previously described.

Recommend comparing max_seconds_ to initial_trial_time_ and if
max_seconds_ is less, skip the intermediate calculations and set
iteration [z] to 1.



---------------------------------------------------------
This e-mail transmission may contain information that is proprietary, 
privileged and/or confidential and is intended exclusively for the person(s) to 
whom it is addressed. Any use, copying, retention or disclosure by any person 
other than the intended recipient or the intended recipient's designees is 
strictly prohibited. If you are not the intended recipient or their designee, 
please notify the sender immediately by return e-mail and delete all copies. 

---------------------------------------------------------





reply via email to

[Prev in Thread] Current Thread [Next in Thread]