bug-gnubg
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Bug-gnubg] Vectorization of neural nets just commited into CVS


From: Jim Segrave
Subject: Re: [Bug-gnubg] Vectorization of neural nets just commited into CVS
Date: Sun, 1 May 2005 10:35:04 +0200
User-agent: Mutt/1.4.2.1i

On Thu 28 Apr 2005 (22:51 +0200), Øystein Johansen wrote:
> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
> 
> Hi,
> 
> I've just added the code that adds neural net evaluation vectorized for
> sse. To compile: #define USE_SSE_VECTORIZE 1
> 
> If someone can add the Makefile magic that would have been fine.
> 
> I have done vectorization of both Evaluate and EvaluateFromBase. I not
> vectorized the evaluation with pruning nets.
> 
> Also, I have not aligned the ar and arInput arrays in neuralnet. This
> may lead to some problems.

I'm getting coredumps from this when analysing a match.

Core was generated by `gnubg'.
Program terminated with signal 10, Bus error.

#0  Evaluate128 (pnn=0x8650060, arInput=0xbfbfa4f0, ar=0xbfbfa2c0,
    arOutput=0xbfbfaf60, saveAr=0x8703410) at
    /usr/include/xmmintrin.h:852
#1  0x08124038 in NeuralNetEvaluate128 (pnn=0x8650060,
    arInput=0xbfbfa4f0,
    arOutput=0xbfbfaf60, t=140836960) at neuralnet.c:1256
#2  0x0807bee9 in EvalRace (anBoard=0xbfbfaf80, arOutput=0xbfbfaf60,
    bgv=VARIATION_STANDARD) at eval.c:2175
#3  0x0807ccc3 in EvaluatePositionFull (anBoard=0xbfbfaf80,
    arOutput=0xbfbfaf60, pci=0xbfbfaf20, pec=0x82419a4, nPlies=0,
    pc=CLASS_RACE) at eval.c:2899
#4  0x0807cf60 in EvaluatePositionCache (anBoard=0xbfbfaf80,
    arOutput=0xbfbfaf60, pci=0xbfbfaf20, pecx=0x82419a4, nPlies=0,
    pc=CLASS_RACE) at eval.c:3061
#5  0x0807d10f in EvaluatePosition (anBoard=0xbfbfaf80,
    arOutput=0xbfbfaf60,
    pci=0xbfbfaf20, pec=0x0) at eval.c:3125

(gdb) f 1
      case NNEVAL_SAVE:
      {
        memcpy(pnn->savedIBase, arInput, pnn->cInput * sizeof(*ar));
=>      Evaluate128(pnn, arInput, ar, arOutput, pnn->savedBase);
        break;
      }
      case NNEVAL_FROMBASE:
      {
        int i;


(gdb) f 0
#0  Evaluate128 (pnn=0x8650060, arInput=0xbfbfa4f0, ar=0xbfbfa2c0,
    arOutput=0xbfbfaf60, saveAr=0x8703410) at
    /usr/include/xmmintrin.h:852

/* Load four SPFP values from P.  The address must be 16-byte
aligned.  */
static __inline __m128
_mm_load_ps (float const *__P)
{
=>return (__m128) __builtin_ia32_loadaps (__P);
}

The inlines make it hard to know which mm_load_ps caused the failure,
but looking at the code for Evaluate128, this is one possibility -
that prWeight is not aligned on a 16 byte boundary

(gdb) p pr
$4 = (float *) 0xbfbfa2c0
(gdb) p prWeight
$5 = (float *) 0x2947944c
(gdb) p ari
$6 = 1.44269502

1091        /* Calculate activity at hidden nodes */
1092        memcpy(ar, pnn->arHiddenThreshold, HIDDEN_NODES *
sizeof(float));
1093
1094        prWeight = pnn->arHiddenWeight;
1095
1096            for (i = 0; i < pnn->cInput; i++)
1097            {
1098                    float const ari = arInput[i];
1099
(gdb) 
1100                    if (ari)
1101                    {
1102                            float *pr = ar;
1103                            if (ari == 1.0f)
1104                            {
1105                                    for( j = 32; j; j--, pr += 4,
prWeight \
+= 4 )
1106                                    {
1107                       vec0 = _mm_load_ps( pr );
1108                       vec1 = _mm_load_ps( prWeight );
1109                       sum =  _mm_add_ps(vec0, vec1);


-- 
Jim Segrave           address@hidden





reply via email to

[Prev in Thread] Current Thread [Next in Thread]