|
From: | Ralf Wildenhues |
Subject: | speed up large library linking |
Date: | Mon, 9 May 2005 22:58:01 +0200 |
User-agent: | Mutt/1.5.9i |
First off: My laptop was broke last week, then some of our department's hardware was destroyed, so: no mail reading, thus no patch checks, no 1.5.18 release. Also I'll most likely have little to no net connection for a yet unspecified time to come, and will surely miss some mails. OTOH, that meant time for libjava over the weekend, so here we go, in reverse logical order: Results: -------- Improvements so far for link mode only: (timings all done on a fast linux dual machine) linking libgcj0_convenience (roughly 2450 objects): old GCC libtool: 68.79user 42.56system 1:50.63elapsed 100%CPU (0avgtext+0avgdata 0maxresident)k 0inputs+0outputs (0major+6159792minor)pagefaults 0swaps old GCC libtool with -objectlist: 50.99user 38.50system 1:27.78elapsed 101%CPU (0avgtext+0avgdata 0maxresident)k 0inputs+0outputs (0major+5453057minor)pagefaults 0swaps HEAD after optimizations: 11.24user 0.98system 0:12.71elapsed 96%CPU (0avgtext+0avgdata 0maxresident)k 0inputs+0outputs (0major+80210minor)pagefaults 0swaps HEAD after optimizations, with -objectlist: 3.99user 3.51system 0:08.55elapsed 87%CPU (0avgtext+0avgdata 0maxresident)k 0inputs+0outputs (0major+273791minor)pagefaults 0swaps same, but dry run (i.e. the libtool overhead): 1.86user 0.76system 0:02.80elapsed 93%CPU (0avgtext+0avgdata 0maxresident)k 0inputs+0outputs (0major+71211minor)pagefaults 0swaps libtool overhead: 33 % libtool overhead improvement: 97 % linking libgcj.la (composed of some convenience archives, e.g. above): old GCC libtool: 57.12user 24.04system 1:21.12elapsed 100%CPU (0avgtext+0avgdata 0maxresident)k HEAD after optimizations: 10.86user 5.13system 0:18.16elapsed 88%CPU (0avgtext+0avgdata 0maxresident)k 0inputs+0outputs (0major+371214minor)pagefaults 0swaps same, but dry run: 0.78user 1.11system 0:02.66elapsed 71%CPU (0avgtext+0avgdata 0maxresident)k 0inputs+0outputs (0major+71053minor)pagefaults 0swaps libtool overhead: 15 % libtool overhead improvement: 96 % linking libgcj.la with reloading forced (disabled whole_archive, disabled GNU ld script): old GCC libtool: 41.09user 8.82system 0:50.25elapsed 99%CPU (0avgtext+0avgdata 0maxresident)k 0inputs+0outputs (0major+1205606minor)pagefaults 0swaps old HEAD with other optimizations above: 33.50user 12.83system 0:50.47elapsed 91%CPU (0avgtext+0avgdata 0maxresident)k 0inputs+0outputs (1major+1340357minor)pagefaults 0swaps same, but dry run: 22.93user 7.34system 0:30.39elapsed 99%CPU (0avgtext+0avgdata 0maxresident)k 0inputs+0outputs (0major+949194minor)pagefaults 0swaps HEAD, now with reload optimization: 8.68user 4.20system 0:18.17elapsed 70%CPU (0avgtext+0avgdata 0maxresident)k 0inputs+0outputs (0major+276950minor)pagefaults 0swaps same, but dry run: 1.33user 1.82system 0:03.08elapsed 102%CPU (0avgtext+0avgdata 0maxresident)k 0inputs+0outputs (0major+110107minor)pagefaults 0swaps libtool overhead: 17 % libtool overhead improvement: 91 % Discussion: ----------- While the overhead improvements look nice, all there is to them is a complexity reduction. For links with a small number of objects, there will hardly be any improvement, but possibly a small (but constant) degradation. Also we are far from Bob's 1% demand, but oh well. :-) Immediate consequence for the libjava folks: Use of -objectlist is to be preferred, with their (ancient!) libtool as well as with HEAD after the changes below. I have another optimization idea for -objectlist which will kill some of the 2.8s left, but it needs more work, and might not be immediately necessary. Changes: -------- My patches break one assumption held in libtool so far: that --dry-run will cause no file changes. I broke it because dry run will be of no value if you just skip all your arguments then. In order to make up for the breakage, I chose to create a new temporary directory and store all files in there. I'm open to suggestions whether it should remain under ${TMPDIR-/tmp} as it is now, or be stored somewhere below .libs / _libs. Problem with the latter is that, when we create the directory, we do not know the output directory just yet. Also I'm unsure whether it is ok to just remove the temp dir with a trap on signal 0 -- we might leave some unwanted leftovers here(?). My patches will require the build system to have working (SUSv3 conforming) join, fold, paste, and split utilities. Are any problems with these known (and not mentioned in autoconf.texi)? Does MinGW provide them? From a cursory glance I could not find join and paste -- we might have to keep the old, slow algorithm for renaming as special case or think of a fast one without them (or convince the MinGW people to include these tools. :-) It will also require that `tr' works on non-text files, i.e. files with long lines. I believe we have relied on this before, and do not know of any problems here, but I think this is not covered by POSIX. For the time being, I require $ECHO to be builtin. This has been implicitly assumed in several places already, but becomes visible only when command line length is exceeded. I'm working on a fix, but as of now I have only a patch to mark all occurences I could find. My changes will require you to either not use `\' in path and file names or have a shell that understands `read -r'. IOW, if you try to cross compile from Solaris for Cygwin, force use of bash instead of ksh. Also, newlines in file names are forbidden (but that is nothing new). Patches: -------- - Factor out detection of `read -r' support and POSIX or pre-POSIX `sort'. - Add FIXMEs to all places which implicitly assume builtin $ECHO. complexity reductions: - rewrite argument parsing to use temp files for long argument lists. - rewrite partial linking - rewrite duplicate object renaming - rewrite piecewise old archive linking; adjust pdemo test All of this has only been tested on a couple of systems (with pdemo), so many bugs ought to be left in there, and feedback is very much welcome. We'll probably also find some oddities in system's file utils. IOW: These patches most likely ought to carry "break frobnozzle" instead of "fix frobnozzle" as log entries. :-) OK to apply them all to HEAD? Regards, Ralf
speedup-features2.diff
Description: Text document
speedup-fixme2.diff
Description: Text document
speedup-parseargs2.diff
Description: Text document
speedup-reload2.diff
Description: Text document
speedup-rename2.diff
Description: Text document
speedup-piecewise-oldlibs2.diff
Description: Text document
[Prev in Thread] | Current Thread | [Next in Thread] |