texmacs-dev
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Texmacs-dev] Efficiency of C++ switch-statements


From: Gabriel Dos Reis
Subject: Re: [Texmacs-dev] Efficiency of C++ switch-statements
Date: 11 Nov 2003 15:45:22 +0100

Joris van der Hoeven <address@hidden> writes:

| Hi,
| 
| Thanks for your detailed reply.
| 
| On 11 Nov 2003, Gabriel Dos Reis wrote:
| >   As far as GCC/g++ is conerned, g++ shares the same middle-end and
| > back-end as the C front-rend (and Fortran and other supported
| > languages). Here is what the GCC source says (gcc/stmt.c):
| >
| >    Switch statements can be output in one of two forms.  A branch table
| >    is used if there are more than a few labels and the labels are  dense
| >    within the range between the smallest and largest case value.  If a
| >    branch table is used, no further manipulations are done with the case
| >    node chain.
| 
| This is a bit vague: what is dense? 25% full? 50% full? 99% full?

  It was deliberately vague in the sense that no number is given so as
not to have users depends on internal details of the compiler -- which
may change as we're implementing new optimizing algorithms.  Before
elaborating, I would like to stress the fact that, this is a about a
particular compiler (namely GCC) and that other optimizing compilers
may implement different strategies (I don't know what VC++ or the
Intel compilers do, for example). 

  "Density" in the current GCC implementation is expressed as a ratio
of number of labels over the range of the cases.  For example if you're
requesting optimization for size, then it might be that the switch is
dense if that ratio is greater than 1/3, otherwise it might be 1/10.

  To illustrate this point, I modified (twice) my previous example.

    extern int e;

    int main()
    {
      switch(e)
        {
        case 6:
          e = 2;
          break;

        case 7:
          ++e;
          break;

        case 2:
          e = 10;
          break;

        case 3:
          e = 0;
          break;

        case 5:
          e = 4;
          break;

        case 1:
          --e;
          break;
        }
    }

The SPARC assembly output is

             .file      "t.C"
             .section   ".text"
             .align 4
             .global main
             .type      main, #function
             .proc      04
     main:
     .LLFB3:
             !#PROLOGUE# 0
             !#PROLOGUE# 1
             sethi      %hi(e), %o3
             ld [%o3+%lo(e)], %g1
             cmp        %g1, 7
             bgu        .LL2
             sll        %g1, 2, %g1
             sethi      %hi(.LL9), %o5
             or %o5, %lo(.LL9), %o5
             ld [%o5+%g1], %o4
             jmp        %o4
              nop
     .LL3:
             mov        2, %g1
             b  .LL2
             st %g1, [%o3+%lo(e)]
     .LL4:
             ld [%o3+%lo(e)], %g1
             add        %g1, 1, %g1
             b  .LL2
             st %g1, [%o3+%lo(e)]
     .LL8:
             ld [%o3+%lo(e)], %g1
             add        %g1, -1, %g1
             b  .LL2
             st %g1, [%o3+%lo(e)]
     .LL5:
             mov        10, %g1
             b  .LL2
             st %g1, [%o3+%lo(e)]
     .LL6:
             b  .LL2
             st %g0, [%o3+%lo(e)]
     .LL7:
             mov        4, %g1
             st %g1, [%o3+%lo(e)]
     .LL2:
             retl
             mov        0, %o0
             .align 4
             .subsection        -1
             .align 4
     .LL9:
             .word      .LL2
             .word      .LL8
             .word      .LL5
             .word      .LL6
             .word      .LL2
             .word      .LL7
             .word      .LL3
             .word      .LL4
             .previous
     .LLFE3:
             .size      main, .-main
             .ident     "GCC: (GNU) 3.4 20030330 (experimental)"

Again a table is used.

Now, if I add a "case 100:" for instance, GCC emits a series of compares
instead of a branch table:

             .file      "t.C"
             .section   ".text"
             .align 4
             .global main
             .type      main, #function
             .proc      04
     main:
     .LLFB3:
             !#PROLOGUE# 0
             !#PROLOGUE# 1
             sethi      %hi(e), %o5
             ld [%o5+%lo(e)], %g1
             cmp        %g1, 5
             be,a       .LL15
             mov        4, %g1
             bg .LL10
             cmp        %g1, 7
             cmp        %g1, 2
             be,a       .LL15
             mov        10, %g1
             bg .LL13
             cmp        %g1, 3
             cmp        %g1, 1
     .LL13:
             be,a       .LL2
             st %g0, [%o5+%lo(e)]
             b,a        .LL2
     .LL10:
             be .LL4
             cmp        %g1, 7
             bl,a       .LL15
             mov        2, %g1
             cmp        %g1, 100
             bne        .LL2
             mov        104, %g1
     .LL12:
     .LL15:
             b  .LL2
             st %g1, [%o5+%lo(e)]
     .LL4:
             b  .LL12
             mov        8, %g1
     .LL2:
             retl
             mov        0, %o0
     .LLFE3:
             .size      main, .-main
             .ident     "GCC: (GNU) 3.4 20030330 (experimental)"


| > This is to be taken with a grain of salt.  This may change in the
| > future as the compiler is improving and more optimizations are being
| > implemented (e.g. tree-ssa branch).
| 
| I hope so! Maybe that you have to give the user some control
| over the optimization. If the code is optimized for small size,

   Well, a big concern with GCC is that it already has far too many
options, and not all combinations of them are fully tested.  Anytime a
new option is added (at user level), it implies a new combinatorial
complexity.  So, there is some resistance to put options about any
single decision made by the compiler... 

| then a binary branch table (small tables) and an automatically
| generated hash table (big tables) would probably be best.
| If the code is optimized for speed, then a 10% full table
| should still be considered as dense in my opinion.

   Certainly GCC distinguishes between -Os and -On.  In the former
case, a switch is implemented as branch table only if the density is
something about, say, 1/3; whereas in the latter case the threasold is
somewhere above 1/10.  Also, some characteristics of the targets
weight in, but the above is a good picture. 

| Don't forget that
|   1) The number of large switch statements (more than 10 items)
|      is usually not that large: they happen at strategic dispatching
|      points in the code.

  The decision to implement a switch as a branch table is made
depending on the particular switch at hand, not on the frequency of
large switches.

|   2) A table with 200 items is not that large in memory for
|      present standards.

  Well, it is not clear off hand that even when you're optimizing for
speed a branch table of size 200 with only 5 relevant entries will be
faster, compared to a series of less than five compares.  It depends
on other factors like the costs of instructions or the cache lines,
or... all other low-level details you usually don't care about.

| Finally, I think that if you switch over an enum,
| then the likelyness of table lookup should be increased.

  I don't know.  I don't believe it makes any difference with C enums.
C++ enums are a bit different, since the C++ standard allows for short
ranges -- and g++ does take advantages of that in some minor cases.

| > | Another question: if I do not put the cases in the same order
| > | as the enumeration, does this alter the efficiency?
| >
| >   Not, as far as GCC is concerned, see the comment above.
| 
| OK, so you are sure that I don't need to worry about occasional
| permutations of cases?

  In the case of GCC, that does not matter.  I don't know what other
compilers actually implement.  My personal take is not to worry about
the order -- assuming it is not revelant to the algorithm semantics.
GCC itself is a big consumer of switches...

| > And what appears inefficient today may be very efficient tomorrow given
| > that the compiler is rapidly improving.  My rule of thumb in this are is:
| > design the algroithm, code, measure, optimize.
| 
| The point is that measuring/optimizing is harder for a program
| like TeXmacs than for, say, a mathematical library.

Yes, I know.  Actually the example I had in mind is GCC itself (as I
mentioned in my previous message) and I don't take it to be an example
of a mathematical library :-)  We've developed various measuring
routines in GCC to report various performance bits (time spent in
particular passes, in total, memory consumption, etc).  The name
lookup speedup work I did on GCC-3.3.x is largely based on
  (1) identifiying all name lookup functions and time them;
  (2) running the compiler on a "large" and "interesting" software (as far
      as name lookup is concerned) -- I mostly used Qt-3.1.1;
  (3) Identifying hot spots; 
  (4) Occasionally, I used gcov.

I'm not suggesting you should do something like that.  I'm just
reporting an actual experience on something as complex as GCC.  I got
net speed up.  I'm not alone doing something like -- most core GCC
developers now care about efficiency and use at least -ftime-report as
a first baby step when trying to optimize the compiler.  I don't know
if Texmacs has routines to report on time it spends in various parts...


-- Gaby





reply via email to

[Prev in Thread] Current Thread [Next in Thread]