|
From: | Jonathan Kinsey |
Subject: | Re: [Bug-gnubg] Removal of non-threaded code |
Date: | Tue, 16 Jun 2009 21:30:20 +0000 |
I've written some code to try and minimise the single threaded performance drop in the multi-threaded version, I've attached the numbers which seem good but are not that exhaustive. I ran a couple of tests in single and multithreaded builds with the existing code and with my change. I haven't checked in the change yet as it's a bit messy, if anyone wants to try it out first I could send them a patch I guess. If anyone has an idea of a good test I'll give it a go. Jon Jonathan Kinsey wrote: > I've got access to a single threaded machine and just did the same quick > test > with a few different builds (all on windows with gcc 3.4.5 -O3): > > set gnubgid 4HPwATDgc/ABMA:cAkNAAAAAAAA > eval > > And I got these results: > > st mt mt(1) mt(2) mt(3) > 54.254 55.259 54.000 55.094 53.379 > 54.098 54.972 53.974 54.755 53.096 > ------ ------ ------ ------ ------ > 54.176 55.116 53.987 54.925 53.238 > % diff 1.73% -0.35% 1.38% -1.73% > > Notes > 1 Cache locking code commented out to establish if this is cause of slower > times 2 Cache locking switched off by function pointer (attempt to speed > up mt > times) > 3 Same as 2 but with experimental position key code > > > The first two rows are separate tests and then the average below, a > quick look > shows that they aren't particularly accurate but it gives a rough idea. > > The problem with (2) is that this would slow down multiple thread runs. > I think > we should be optimising for multiple core use. We could duplicate even more > code and then either do a evaluation with/without cache locking > (depending on > the number of eval threads) - and this should give the same performance > as the > single threaded builds. Maybe some clever use of the preprocessor could > minimise the amount of duplicated source. > > My rewrite of the PositionKey functions seems to give about a 3% increase so > with the new sigmoid function we might have a compelling reason for > people to > upgrade to the latest version. > > Jon > > Christian Anthon wrote: >> I have timed some simple evaluations of the opening positions using >> various compile settings. The following times is reported for each of >> the compile settings. >> >> A. 3x 4ply evaluation (clearing the cache in between with a command that >> is not in the present code) >> B. 3x clearing the cache without any evaluation >> C 1000x 2ply evaluation (clearing the cache in between) >> D 1000x clearing the cache >> >> The lost time is from locking and unlocking the cache, I believe. >> >> threaded >> 146.307531 >> 0.011090 >> 104.297596 >> 3.803742 >> >> non-threaded >> 138.310104 >> 0.010516 >> 92.876412 >> 3.614214 >> >> threaded-sigmoidSSE >> 139.664481 >> 0.011588 >> 95.686871 >> 3.824007 >> >> non-threaded-sigmoidSSE >> 131.947215 >> 0.010806 >> 87.237141 >> 3.605156 >> >> from timeit import * >> >> gnubg.command("set gnubgid 4HPwATDgc/ABMA:cAkNAAAAAAAA") >> >> gnubg.command("set evaluation cube evaluation plies 4") >> t = Timer('gnubg.command("clear cache"); gnubg.command("eval")', 'import >> gnubg') >> print "%f" % t.timeit(3) >> >> t = Timer('gnubg.command("clear cache")', 'import gnubg') >> print "%f" % t.timeit(3) >> >> gnubg.command("set evaluation cube evaluation plies 2") >> t = Timer('gnubg.command("clear cache"); gnubg.command("eval")', 'import >> gnubg') >> print "%f" % t.timeit(1000) >> >> t = Timer('gnubg.command("clear cache")', 'import gnubg') >> print "%f" % t.timeit(1000) >> >> >> >> On Wed, Apr 29, 2009 at 2:38 PM, Massimiliano Maini >>> >> wrote: >> >> >> >> Jonathan Kinsey >>> wrote on 29/04/2009 12:54:26: >> >> >>> Massimiliano Maini wrote: >>>> >>>> Christian Anthon wrote on 29/04/2009 10:23:59: >>>> >>>>> On Wed, Apr 29, 2009 at 10:04 AM, Massimiliano Maini >>>>> address@hidden> wrote: >>>>> >>>>> bug-gnubg-bounces+massimiliano.maini=amadeus.com >> @gnu.org wrote on >>>>> 28/04/2009 22:01:23: >>>>> >>>>> MaX build with single thread : ~32400 eval/s >>>>> MaX build with MT code, 1 thread : ~24800 eval/s >>>>> MaX build with MT code, 2 threads : ~34600 eval/s >>>>> >>>>> However, a quick rollout (648 trials, expert, full, 2 top moves of >>>> postion >>>>> t60BYCButycAAA:cAnnAWAASAAA) has shown the following: >>>>> >>>>> MaX build with single thread : 2m04s >>>>> MaX build with MT code, 1 thread : 2m04s >>>>> MaX build with MT code, 2 threads : 1m48s >>>>> >>>>> I'm much more worried about the last two numbers here. MT code >>>>> should give close to twice the speed, or we are doing something >> wrong. >>>> >>>> Here at office the PC is single core, don't know if this >> explains the >>>> "poor" result. I'll check at home (dual core). >>> >>> You did say the pc was "1 core, 2 threads", does this mean it's a >>> hyper-threaded >>> machine? That would match a small increase for 2 threads, >> >> Yes, 1 core with hyper-thread. I wasn't really surprised by the >> small increase. >> >>> note also that the 1 >>> thread test will be using 2 threads (one for the gui and one for the >>> evaluations >>> - the gui thread will only be redrawing the screen). >> >> I run the calibrate on the command line version and the rollout in >> the gui >> one. Not sure it's a big deal however ... just a progress bar and a >> few numbers >> updated from time to time ... >> >>> The best test would be on a simple single core/processor machine, >> these are >>> getting quite rare, all the pcs I see are multi-core now. >> >> MaX. >> >> > > > > > ------------------------------------------------------------------------ > Upgrade to Internet Explorer 8 Optimised for MSN. Download Now > > > > ------------------------------------------------------------------------ > > _______________________________________________ > Bug-gnubg mailing list > address@hidden > http://lists.gnu.org/mailman/listinfo/bug-gnubg Beyond Hotmail - see what else you can do with Windows Live. Find out more. |
figures.pdf
Description: Binary data
[Prev in Thread] | Current Thread | [Next in Thread] |