[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Texmacs-dev] Performance questions, proposals, patches
From: |
Josef Weidendorfer |
Subject: |
Re: [Texmacs-dev] Performance questions, proposals, patches |
Date: |
Thu, 14 Oct 2004 18:53:09 +0200 |
User-agent: |
KMail/1.7.1 |
Hi Joris,
On Thursday 14 October 2004 13:08, Joris van der Hoeven wrote:
> Hi Josef,
>
> On Wed, 13 Oct 2004, Josef Weidendorfer wrote:
> > I wonder why with GCC >=2.96 on Linux/FreeBSD, the default compilation
> > flags for texmacs include "-fno-default-inline -fno-inline"?
> > At least here, if compiling with inlining, the code gains at least 25%
> > speedup without any negative effects.
>
> Certain versions of GCC 3.* are bugged and caused segmentation faults
> in combination with inlining. Maybe we can put inlining in again for
> the most recent version, if you did not notice any suspicious behaviour.
> Any patch for configure.in is welcome.
AFAIK, Suse uses gcc 3.3.3 for some time now (on 9.0 and on the latest 9.1),
and as I said, I have no problem with inlining switched on.
===================================================
--- configure.in.orig 2004-10-14 17:54:27.208601456 +0200
+++ configure.in 2004-10-14 17:56:04.009885424 +0200
@@ -781,6 +781,8 @@
optimize_default="yes"
case "$GXX_VERSION" in
+ 3.3.3)
+ ;;
2.96 | 3.0 | 3.0.* | 3.1 | 3.1.* | 3.2 | 3.2.* | 3.3 | 3.3.*)
case "${host}" in
i*86-*-linux-gnu* | i*86-*-freebsd*)
==================================================
> may still be optimized a bit further. I also noticed another possible
> optimization for arrays (and strings): instead of allocating an array
> of a size which depends on the size of the array (which is used for
> the << operator), it might be better to systematically allocate an array
> of the same size and only use over-allocation when the << operator
> is explicitly used. This might reduce the memory requirements of TeXmacs
> quite a lot.
Possible. I only looked at problems I saw on top in my profile, in different
use cases (loading, scrolling, ...).
Quite some time is used by the scheme lib (especially for garbage collection),
but that's a difficult to change for the better.
> You probably may use some of the testing routines in analyze.hpp
> for this kind of purpose too.
Ah, I only looked in string.hpp. Yes, with search_forwards(), its shorter:
===============================================
---
/home/weidendo/SW/CVS-SOFT/texmacs/src/src/Plugins/Ghostscript/ghostscript.cpp
20
03-10-24 12:43:48.000000000 +0200
+++ ./Plugins/Ghostscript/ghostscript.cpp 2004-10-14 18:31:28.021986504
+0200
@@ -43,10 +43,16 @@
static string
encapsulate_postscript (string s) {
int i, n=N(s);
- string r;
- for (i=0; i<n; ) {
- if ((i<(n-8)) && (s(i,i+8)=="showpage")) {i+=8; continue;}
- r << s[i++];
+ int last_begin = 0;
+ string r, showpage("showpage");
+ while(1) {
+ i = search_forwards(showpage, last_begin, s);
+ if (i<0) {
+ r << s(last_begin, n);
+ break;
+ }
+ r << s(last_begin, i);
+ last_begin = i+8;
}
return r;
}
===============================
> > least_upper_bound (rectangles l) {
> No problem, I can do that. Is it really here that we get the stack
> overflow? In that case, we might have to check why we do get such long
> lists of rectangles.
Originally, the stack overflow happens in requires_update() for me, but after
I changed that function into iterative, it happened in least_upper_bound().
I'm not sure why the list gets so long. But that should be easy to find out.
Perhaps it's better to compact the list when it is getting longer than a
given threeshould.
> A better solution might be to use something like
>
> if ((nr_painted%10 == 9) && dev->check_event (INPUT_EVENT)) return;
>
> I need to check though whether nr_painted cannot be increased during
> such interruptions of the painting process though.
Yes, may be better.
> Thanks a lot for all the work! I will apply your patches soon.
Actually, I'm interested in making my profiling tool and visualization better.
And texmacs is my current victim ;-)
The tool is based on the instrumentation framework Valgrind, which is used for
on-the-fly cache simulation and building up the call graph of unmodified
binaries. And here, I only use the number of x86 instructions executed in
functions to get some sorted list of hot points.
Of course, optimizations should be checked afterwards for real run time
improvement. But in contrast to OProfile alone, I'm getting exact call
numbers and call arcs, and in contrast to GProf this is working with shared
libraries. So I actually see e.g. how often a loop body in XCheckMaskEvent is
executed by looking at annotated assembler in the visualization.
Cheers,
Josef
>
> Best wishes, Joris