[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Texmacs-dev] Cache profiling of TeXmacs 1.0.3.9
From: |
David MENTRE |
Subject: |
Re: [Texmacs-dev] Cache profiling of TeXmacs 1.0.3.9 |
Date: |
Fri, 21 May 2004 10:52:34 +0200 |
User-agent: |
Gnus/5.1006 (Gnus v5.10.6) Emacs/21.3 (gnu/linux) |
David MENTRE <address@hidden> writes:
>> 2. TeXmacs has many cache misses, but they seem to happen all over the
>> code. So, the issue are either general design problems, or are
>> faults in commonly used data structures.
>>
>> Here, my usual suspects are needless indirections. They might make
>> not a big difference. But that's something which should be tried.
>
> I think I can give figure on that. (to be continued)
I have produced annotated sources for DATA_CACHE_ACCESSES and
DATA_CACHE_MISSES events. The sources are available at:
http://www.linux-france.org/~dmentre/texmacs/perf-texmacs/annotated-texmacs-1.3.0.9.tar.gz
I've made those sources with command:
$ opannotate event:DATA_CACHE_MISSES,DATA_CACHE_ACCESSES
--base-dirs=/home/david/00-poubelle/TeXmacs-1.0.3.9-src/src
--search-dirs=/home/david/00-poubelle/TeXmacs-1.0.3.9-src/src --source
--output-dir=./annotated-texmacs-1.3.0.9
/home/david/00-poubelle/texmacs/libexec/TeXmacs/bin/texmacs.bin
taking as example src/Classes/Atomic/string.cpp:
1821 0.1042 0 0.0e+00 :string::operator == (string a) { /*
string::operator==(string) total: 23198 1.3268 48 1.5620 */
: register int i;
3240 0.1853 9 0.2929 : if (rep->n!=a->n) return false;
6425 0.3675 16 0.5207 : for (i=0; i<rep->n; i++)
9471 0.5417 18 0.5857 : if (rep->a[i]!=a->a[i]) return false;
945 0.0541 1 0.0325 : return true;
1296 0.0741 4 0.1302 :}
:
:bool
108 0.0062 0 0.0e+00 :string::operator != (string a) { /*
string::operator!=(string) total: 8957 0.5123 46 1.4969 */
: register int i;
489 0.0280 3 0.0976 : if (rep->n!=a->n) return true;
2410 0.1378 8 0.2603 : for (i=0; i<rep->n; i++)
4523 0.2587 35 1.1390 : if (rep->a[i]!=a->a[i]) return true;
791 0.0452 0 0.0e+00 : return false;
636 0.0364 0 0.0e+00 :}
[...]
/*
* Total samples for file : "Classes/Atomic/string.cpp"
*
* 437346 25.0147 325 10.5760
*/
So, for this particular example, on 325 cache misses done on source in
string.cpp, 48 are done on operator== and 46 on operator!= (so about 1/3
on those two routines).
Taking another example, on src/Classes/Atomic/tree.hpp:
42406 2.4255 48 1.5620 :inline tree::tree (const tree& x): rep (x.rep)
{ rep->ref_count++; } /* tree::tree(tree const&) total: 42406 2.4255 48
1.5620 */
16103 0.9210 9 0.2929 :inline tree::~tree () { /* tree::~tree() total:
40978 2.3438 45 1.4644 */
25238 1.4435 40 1.3017 : if ((--rep->ref_count)==0) { destroy_tree_rep
(rep); rep= NULL; } }
2869 0.1641 6 0.1952 :inline atomic_rep* tree::operator -> () { /*
tree::operator->() total: 4015 0.2296 6 0.1952 */
: CHECK_ATOMIC (*this, "tree::operator ->");
2075 0.1187 1 0.0325 : return static_cast<atomic_rep*> (rep); }
286 0.0164 1 0.0325 :inline tree& tree::operator = (tree x) { /*
tree::operator=(tree) total: 4380 0.2505 9 0.2929 */
140 0.0080 0 0.0e+00 : x.rep->ref_count++;
2420 0.1384 3 0.0976 : if ((--rep->ref_count)==0) destroy_tree_rep
(rep);
786 0.0450 1 0.0325 : rep= x.rep;
1232 0.0705 1 0.0325 : return *this; }
:
2219 0.1269 15 0.4881 :inline tree::tree (): /* tree::tree() total:
5340 0.3054 22 0.7159 */
3121 0.1785 7 0.2278 : rep (new atomic_rep (string ())) {}
175 0.0100 0 0.0e+00 :inline tree::tree (char *s): /*
tree::tree(char*) total: 4054 0.2319 2 0.0651 */
3032 0.1734 1 0.0325 : rep (new atomic_rep (s)) {}
246 0.0141 0 0.0e+00 :inline tree::tree (string s): /*
tree::tree(string) total: 608 0.0348 4 0.1302 */
370 0.0212 4 0.1302 : rep (new atomic_rep (s)) {}
[...]
7623 0.4360 24 0.7810 : return N ((static_cast<compound_rep*>
(t.rep))->a); }
:inline int arity (tree t) { /* arity(tree)
total: 1 5.7e-05 0 0.0e+00 */
: if (t.rep->op == STRING) return 0;
1078 0.0617 12 0.3905 : else return N ((static_cast<compound_rep*>
(t.rep))->a); }
1 5.7e-05 0 0.0e+00 :inline int right_index (tree t) { /*
right_index(tree) total: 3 1.7e-04 0 0.0e+00 */
2 1.1e-04 0 0.0e+00 : return is_atomic (t)? N(t->label): 1; }
4578 0.2618 23 0.7485 :inline tree_label L (tree t) { /* L(tree)
total: 11077 0.6336 43 1.3993 */
6499 0.3717 20 0.6508 : return t.rep->op; }
1859 0.1063 0 0.0e+00 :inline array<tree> A (tree t) { /* A(tree)
total: 3750 0.2145 1 0.0325 */
: CHECK_COMPOUND (t, "A (tree)");
1891 0.1082 1 0.0325 : return (static_cast<compound_rep*>
(t.rep))->a; }
:inline array<tree>& AR (tree t) {
: CHECK_COMPOUND (t, "AR (tree)");
: return (static_cast<compound_rep*>
(t.rep))->a; }
:
1436 0.0821 11 0.3580 :inline bool is_atomic (tree t) { return
(t.rep->op == STRING); } /* is_atomic(tree) total: 2513 0.1437 23 0.7485
*/
[...]
4847 0.2772 28 0.9112 : return (t.rep->op == STRING) && (t->label ==
s); }
[...]
/*
* Total samples for file : "Classes/Atomic/tree.hpp"
*
* 185898 10.6327 291 9.4696
*/
On this particular example, reference counting on "tree::tree (const
tree& x)" and "tree::~tree" accounts for 1/3 of cache misses
(40+48/291).
I have also output (partial) annotated assembler (see assembly.txt in
the archive, just add --assembly to previous command). Taking
"tree::~tree()" as example:
08051bf4 <_ZN4treeD1Ev>: /* tree::~tree() total: 40978 2.3438 45 1.4644 */
5893 0.3371 3 0.0976 : 8051bf4: push %ebp
10210 0.5840 6 0.1952 : 8051bf5: mov %esp,%ebp
: 8051bf7: sub $0x8,%esp
: 8051bfa: mov 0x8(%ebp),%eax
5697 0.3258 2 0.0651 : 8051bfd: mov (%eax),%eax
54 0.0031 0 0.0e+00 : 8051bff: decl 0x4(%eax)
3822 0.2186 3 0.0976 : 8051c02: cmpl $0x0,0x4(%eax)
1810 0.1035 2 0.0651 : 8051c06: jne 8051c1e
<_ZN4treeD1Ev+0x2a>
: 8051c08: mov 0x8(%ebp),%eax
3593 0.2055 14 0.4556 : 8051c0b: mov (%eax),%eax
176 0.0101 0 0.0e+00 : 8051c0d: mov %eax,(%esp,1)
: 8051c10: call 829a3cc
<_Z16destroy_tree_repP8tree_rep>
444 0.0254 0 0.0e+00 : 8051c15: mov 0x8(%ebp),%eax
638 0.0365 0 0.0e+00 : 8051c18: movl $0x0,(%eax)
1 5.7e-05 0 0.0e+00 : 8051c1e: leave
8640 0.4942 15 0.4881 : 8051c1f: ret
>From my understanding of this assembler, the cache misses come from the
"rep" argument passing to function "destroy_tree_rep" and from the final
"ret" (I don't know why).
I don't know if it helps much to understand perf issues. :-)
Yours,
d.
--
address@hidden
- [Texmacs-dev] Cache profiling of TeXmacs 1.0.3.9, David MENTRE, 2004/05/20
- Re: [Texmacs-dev] Cache profiling of TeXmacs 1.0.3.9, David MENTRE, 2004/05/20
- Re: [Texmacs-dev] Cache profiling of TeXmacs 1.0.3.9, David MENTRE, 2004/05/20
- Re: [Texmacs-dev] Cache profiling of TeXmacs 1.0.3.9, Joris van der Hoeven, 2004/05/20
- Re: [Texmacs-dev] Cache profiling of TeXmacs 1.0.3.9, David Allouche, 2004/05/20
- Re: [Texmacs-dev] Cache profiling of TeXmacs 1.0.3.9, David MENTRE, 2004/05/21
- Re: [Texmacs-dev] Cache profiling of TeXmacs 1.0.3.9, Joris van der Hoeven, 2004/05/21
- Re: [Texmacs-dev] Cache profiling of TeXmacs 1.0.3.9,
David MENTRE <=
- [Texmacs-dev] Arch archive for libgc patch (was: Cache profiling), David Allouche, 2004/05/23
- Re: [Texmacs-dev] Arch archive for libgc patch, David MENTRE, 2004/05/23
- Re: [Texmacs-dev] Arch archive for libgc patch, David MENTRE, 2004/05/23
- Re: [Texmacs-dev] Arch archive for libgc patch, David Allouche, 2004/05/23
- Re: [Texmacs-dev] Cache profiling of TeXmacs 1.0.3.9, Joris van der Hoeven, 2004/05/21
- labeling of trees (was: Re: [Texmacs-dev] Cache profiling of TeXmacs 1.0.3.9), David MENTRE, 2004/05/21
- Re: labeling of trees (was: Re: [Texmacs-dev] Cache profiling of TeXmacs 1.0.3.9), Joris van der Hoeven, 2004/05/21
- Re: [Texmacs-dev] Cache profiling of TeXmacs 1.0.3.9, David Allouche, 2004/05/23
- Re: [Texmacs-dev] Cache profiling of TeXmacs 1.0.3.9, Joris van der Hoeven, 2004/05/23