texmacs-dev
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Texmacs-dev] Cache profiling of TeXmacs 1.0.3.9


From: David MENTRE
Subject: Re: [Texmacs-dev] Cache profiling of TeXmacs 1.0.3.9
Date: Fri, 21 May 2004 10:52:34 +0200
User-agent: Gnus/5.1006 (Gnus v5.10.6) Emacs/21.3 (gnu/linux)

David MENTRE <address@hidden> writes:

>>   2. TeXmacs has many cache misses, but they seem to happen all over the
>>      code. So, the issue are either general design problems, or are
>>      faults in commonly used data structures.
>>
>>      Here, my usual suspects are needless indirections. They might make
>>      not a big difference. But that's something which should be tried.
>
> I think I can give figure on that. (to be continued)

I have produced annotated sources for DATA_CACHE_ACCESSES and
DATA_CACHE_MISSES events. The sources are available at:
http://www.linux-france.org/~dmentre/texmacs/perf-texmacs/annotated-texmacs-1.3.0.9.tar.gz
 

I've made those sources with command:

$ opannotate event:DATA_CACHE_MISSES,DATA_CACHE_ACCESSES 
--base-dirs=/home/david/00-poubelle/TeXmacs-1.0.3.9-src/src 
--search-dirs=/home/david/00-poubelle/TeXmacs-1.0.3.9-src/src --source 
--output-dir=./annotated-texmacs-1.3.0.9 
/home/david/00-poubelle/texmacs/libexec/TeXmacs/bin/texmacs.bin 


taking as example src/Classes/Atomic/string.cpp:
  1821  0.1042     0 0.0e+00   :string::operator == (string a) { /* 
string::operator==(string) total:  23198  1.3268    48  1.5620 */
                               :  register int i;
  3240  0.1853     9  0.2929   :  if (rep->n!=a->n) return false;
  6425  0.3675    16  0.5207   :  for (i=0; i<rep->n; i++)
  9471  0.5417    18  0.5857   :    if (rep->a[i]!=a->a[i]) return false;
   945  0.0541     1  0.0325   :  return true;
  1296  0.0741     4  0.1302   :}
                               :
                               :bool
   108  0.0062     0 0.0e+00   :string::operator != (string a) { /* 
string::operator!=(string) total:   8957  0.5123    46  1.4969 */
                               :  register int i;
   489  0.0280     3  0.0976   :  if (rep->n!=a->n) return true;
  2410  0.1378     8  0.2603   :  for (i=0; i<rep->n; i++)
  4523  0.2587    35  1.1390   :    if (rep->a[i]!=a->a[i]) return true;
   791  0.0452     0 0.0e+00   :  return false;
   636  0.0364     0 0.0e+00   :}
[...]
/* 
 * Total samples for file : "Classes/Atomic/string.cpp"
 * 
 * 437346 25.0147   325 10.5760
 */


So, for this particular example, on 325 cache misses done on source in
string.cpp, 48 are done on operator== and 46 on operator!= (so about 1/3
on those two routines). 


Taking another example, on src/Classes/Atomic/tree.hpp:

 42406  2.4255    48  1.5620   :inline tree::tree (const tree& x): rep (x.rep) 
{ rep->ref_count++; } /* tree::tree(tree const&) total:  42406  2.4255    48  
1.5620 */
 16103  0.9210     9  0.2929   :inline tree::~tree () { /* tree::~tree() total: 
 40978  2.3438    45  1.4644 */
 25238  1.4435    40  1.3017   :  if ((--rep->ref_count)==0) { destroy_tree_rep 
(rep); rep= NULL; } }
  2869  0.1641     6  0.1952   :inline atomic_rep* tree::operator -> () { /* 
tree::operator->() total:   4015  0.2296     6  0.1952 */
                               :  CHECK_ATOMIC (*this, "tree::operator ->");
  2075  0.1187     1  0.0325   :  return static_cast<atomic_rep*> (rep); }
   286  0.0164     1  0.0325   :inline tree& tree::operator = (tree x) { /* 
tree::operator=(tree) total:   4380  0.2505     9  0.2929 */
   140  0.0080     0 0.0e+00   :  x.rep->ref_count++;
  2420  0.1384     3  0.0976   :  if ((--rep->ref_count)==0) destroy_tree_rep 
(rep);
   786  0.0450     1  0.0325   :  rep= x.rep;
  1232  0.0705     1  0.0325   :  return *this; }
                               :
  2219  0.1269    15  0.4881   :inline tree::tree (): /* tree::tree() total:   
5340  0.3054    22  0.7159 */
  3121  0.1785     7  0.2278   :  rep (new atomic_rep (string ())) {}
   175  0.0100     0 0.0e+00   :inline tree::tree (char *s): /* 
tree::tree(char*) total:   4054  0.2319     2  0.0651 */
  3032  0.1734     1  0.0325   :  rep (new atomic_rep (s)) {}
   246  0.0141     0 0.0e+00   :inline tree::tree (string s): /* 
tree::tree(string) total:    608  0.0348     4  0.1302 */
   370  0.0212     4  0.1302   :  rep (new atomic_rep (s)) {}

[...]

  7623  0.4360    24  0.7810   :  return N ((static_cast<compound_rep*> 
(t.rep))->a); }
                               :inline int arity (tree t) { /* arity(tree) 
total:      1 5.7e-05     0 0.0e+00 */
                               :  if (t.rep->op == STRING) return 0;
  1078  0.0617    12  0.3905   :  else return N ((static_cast<compound_rep*> 
(t.rep))->a); }
     1 5.7e-05     0 0.0e+00   :inline int right_index (tree t) { /* 
right_index(tree) total:      3 1.7e-04     0 0.0e+00 */
     2 1.1e-04     0 0.0e+00   :  return is_atomic (t)? N(t->label): 1; }
  4578  0.2618    23  0.7485   :inline tree_label L (tree t) { /* L(tree) 
total:  11077  0.6336    43  1.3993 */
  6499  0.3717    20  0.6508   :  return t.rep->op; }
  1859  0.1063     0 0.0e+00   :inline array<tree> A (tree t) { /* A(tree) 
total:   3750  0.2145     1  0.0325 */
                               :  CHECK_COMPOUND (t, "A (tree)");
  1891  0.1082     1  0.0325   :  return (static_cast<compound_rep*> 
(t.rep))->a; }
                               :inline array<tree>& AR (tree t) {
                               :  CHECK_COMPOUND (t, "AR (tree)");
                               :  return (static_cast<compound_rep*> 
(t.rep))->a; }
                               :
  1436  0.0821    11  0.3580   :inline bool is_atomic (tree t) { return 
(t.rep->op == STRING); } /* is_atomic(tree) total:   2513  0.1437    23  0.7485 
*/

[...]

  4847  0.2772    28  0.9112   :  return (t.rep->op == STRING) && (t->label == 
s); }

[...]
/* 
 * Total samples for file : "Classes/Atomic/tree.hpp"
 * 
 * 185898 10.6327   291  9.4696
 */

On this particular example, reference counting on "tree::tree (const
tree& x)" and "tree::~tree" accounts for 1/3 of cache misses
(40+48/291).


I have also output (partial) annotated assembler (see assembly.txt in
the archive, just add --assembly to previous command). Taking
"tree::~tree()" as example:

08051bf4 <_ZN4treeD1Ev>: /* tree::~tree() total:  40978  2.3438    45  1.4644 */
  5893  0.3371     3  0.0976   : 8051bf4:       push   %ebp
 10210  0.5840     6  0.1952   : 8051bf5:       mov    %esp,%ebp
                               : 8051bf7:       sub    $0x8,%esp
                               : 8051bfa:       mov    0x8(%ebp),%eax
  5697  0.3258     2  0.0651   : 8051bfd:       mov    (%eax),%eax
    54  0.0031     0 0.0e+00   : 8051bff:       decl   0x4(%eax)
  3822  0.2186     3  0.0976   : 8051c02:       cmpl   $0x0,0x4(%eax)
  1810  0.1035     2  0.0651   : 8051c06:       jne    8051c1e 
<_ZN4treeD1Ev+0x2a>
                               : 8051c08:       mov    0x8(%ebp),%eax
  3593  0.2055    14  0.4556   : 8051c0b:       mov    (%eax),%eax
   176  0.0101     0 0.0e+00   : 8051c0d:       mov    %eax,(%esp,1)
                               : 8051c10:       call   829a3cc 
<_Z16destroy_tree_repP8tree_rep>
   444  0.0254     0 0.0e+00   : 8051c15:       mov    0x8(%ebp),%eax
   638  0.0365     0 0.0e+00   : 8051c18:       movl   $0x0,(%eax)
     1 5.7e-05     0 0.0e+00   : 8051c1e:       leave  
  8640  0.4942    15  0.4881   : 8051c1f:       ret    


>From my understanding of this assembler, the cache misses come from the
"rep" argument passing to function "destroy_tree_rep" and from the final
"ret" (I don't know why).

I don't know if it helps much to understand perf issues. :-)

Yours,
d.
-- 
 address@hidden




reply via email to

[Prev in Thread] Current Thread [Next in Thread]