bug-gnubg
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Bug-gnubg] Re: Getting gnubg to use all available cores


From: Christian Anthon
Subject: Re: [Bug-gnubg] Re: Getting gnubg to use all available cores
Date: Fri, 7 Aug 2009 09:36:10 +0200

Hi Michael,

thx for investigating this. My answer would have been along the lines
of "beyond our control". Jon coded most of the threading stuff and I
believe that the MAX_NUMTHREADS is indeed somewhat arbitrary. However
I believe there is a bit of memory consumption and possible also extra
cpu time involved in setting it higher. Hopefully Jon will pitch in.

Christian.

On Fri, Aug 7, 2009 at 6:55 AM, Michael Petch<address@hidden> wrote:
> Howdy Louis,
>
> I think that MAX_NUMTHREADS was an artificial limit set by the hardware of
> the day. Christian can likely tell you why it is 16 specifically but I am
> assuming that it was a someone arbitrary(and reasonable) value based on
> cores available on most systems.
>
> Onto your OS/X issue. I did a bit of research and my original view on
> waiting for Snow Leopard may actually be all that is required.
>
> Nehalem processors diverge from the previous generation of Intel processors
> because they no longer based on SMP (Symmetric MultiProcessor) designs. In
> an SMP system, generally all processors have access to main memory (RAM) via
> a single data bus. The problem of course is that the more cores you have,
> the more contention for memory read/writes that have to occur on that one
> bus.
>
> Intel decided that SMP designs likely will not scale properly in the future
> when dealing with large core counts (32, 62, 128 cores etc) so they moved
> their Nehalem design to NUMA type systems instead of SMP. NUMA is non
> uniform memory access. In this type of design cores may not necessarily be
> able to share memory with other processors without some help. I'm nto going
> to get into the gorey details but the bus system Intel is pushing is the QPI
> (QuickPath interconnect) bus. This literally replaces the good old FSB
> (Front side Bus)
>
> NUMA architectures do allow for the concept of "Remote" and "Local" data.
> Shared data may not be directly available by a processor but it can be
> retrieved (remotely) but it will be slower. Operating System Kernels need
> NUMA support in order for shared data access on different buses to work
> properly.
>
> So your asking, why tell me all this? Well the answer is simple. Apple in
> their infinite wisdom started using new QPI/Numa hardware without actually
> fully implementing NUMA in its current kernel! This hasn't been well
> documented by Apple but it was discovered when companies started running
> Xserve on the new QPI/Nehalem systems.
>
> Without proper NUMA support, processors can't arbitrarily share memory with
> all other processors. Which seems to be the case here with GnuBG. Gnubg
> launches in a single process and then asks the OS/X to create threads (with
> shared memory requirements). It appears by default that each processor is
> considered as a separate entity without sharing (On OS/X Leopard). The
> exception is that eacg core appears as 2 virtual cores. Virtual cores are on
> the same processor, thus the same bus so one can share memory across them.
>
> It seems when Gnubg launches, all the threads are created on one processor
> (the processor is originally chosen by OS/X) and accessible by 2 virtual
> cores (Using Hyperthreading). It seems Apple did this so they could put out
> new equipment before the next OS (Snow Leopard) was released.
>
> So what does Snow Leopard have that Leapard doesn't? NUMA support.
>
> My guess is that if you got your hands on Snow Leopard you may find that
> what you are seeing changes. Apparently this very problem exists for people
> using CS4 (Adobes Creative Studio 4).
>
> Linux supports NUMA, you might be adventuresome and try to install Linux on
> your Apple Hardware and see what happens.
>
> Your chess program may work because of the way it splits up tasks (It may
> even use a combination of Posix Threads and separate process spaces). I
> haven't seen the source code so its very hard to say.
>
> Michael Petch
>
> On 06/08/09 10:29 AM, "Louis Zulli" <address@hidden> wrote:
>
>> Hi,
>>
>> I put
>>
>> #define MAX_NUMTHREADS 64
>>
>> in multithread.h and rebuilt.
>>
>> In Settings-->Options-->Other, I put Eval Threads to 64.
>>
>> I then let gnubg analyze a game using 4-ply analysis.
>>
>> According to my unix top command, gnubg had 69 threads and was using
>> 188%CPU. So apparently all the threads were running (into each other!)
>> in one physical core.
>>
>> In any case, increasing the max number of threads above 16 seems
>> trivial to do, unless I'm missing something.
>>
>> Louis
>>
>>
>> On Aug 6, 2009, at 11:34 AM, Ingo Macherius wrote:
>>
>>> Do you use the calibrate command or a batch analysis of matchfiles?
>>> The
>>> former was shown to be of no value for benchmarks, see here:
>>> http://lists.gnu.org/archive/html/bug-gnubg/2009-08/msg00006.html
>>>
>>> With calibrate I had the very same effect of high idle times during
>>> benchmarks, unless I used at least 8 threads per physical core.
>>>
>>> I am doing benchmark on a 4 core machine which iterates over #thread
>>> (1..6)
>>> and cache size (2^1 .. 2^27). Should be posted in say 3 hours, it
>>> literally
>>> is still running :)
>>>
>>> Ingo
>>>
>>>> -----Original Message-----
>>>> From: address@hidden
>>>> [mailto:address@hidden On
>>>> Behalf Of Louis Zulli
>>>> Sent: Thursday, August 06, 2009 3:21 PM
>>>> To: Michael Petch
>>>> Cc: address@hidden
>>>> Subject: [Bug-gnubg] Re: Getting gnubg to use all available cores
>>>>
>>>>
>>>>
>>>> On Aug 5, 2009, at 4:02 PM, Michael Petch wrote:
>>>>
>>>>> I'm unsure how the architecture is deployed and how OS/X
>>>> handles the
>>>>> physical cores, but it almost sounds like one Physical core is being
>>>>> used
>>>>> (Using Hyperthreads to run 2 threads simultaneously). I wonder if
>>>>> the memory
>>>>> is shared across all the cores? A friend of mine was
>>>> suggesting that
>>>>> people
>>>>> may have to wait for Snow Lapard to come out before OS/X properly
>>>>> utilizes
>>>>> the Nehalem architecture (whetehr that si true or not, I
>>>> don't know).
>>>>>
>>>>> Anyway, as an experiment. If you run 2 copies of Gnubg at the same
>>>>> time
>>>>> (using multiple threads) do you get 400% CPU usage?
>>>>>
>>>>
>>>>
>>>> Hi Mike,
>>>>
>>>> Sorry for the delay. I just had two copies of gnubg analyze the same
>>>> game, using 3 ply analysis. Each instance of gnubg used 200%
>>>> CPU. Each
>>>> copy was set to use 4 evaluation threads.
>>>>
>>>> So what's the verdict here? Is Leopard simply not directing threads
>>>> correctly?
>>>>
>>>> Louis
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> _______________________________________________
>>>> Bug-gnubg mailing list
>>>> address@hidden http://lists.gnu.org/mailman/listinfo/bug-gnubg
>>>
>>
>
>
>
>
> _______________________________________________
> Bug-gnubg mailing list
> address@hidden
> http://lists.gnu.org/mailman/listinfo/bug-gnubg
>




reply via email to

[Prev in Thread] Current Thread [Next in Thread]