discuss-gnuradio
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Discuss-gnuradio] Segfault in volk_32fc_x2_multiply_32fc_a_avx2_fma


From: West, Nathan
Subject: Re: [Discuss-gnuradio] Segfault in volk_32fc_x2_multiply_32fc_a_avx2_fma
Date: Wed, 9 Mar 2016 11:06:20 -0500

Good news!

That branch now belongs in GNU Radio.

Cheers,
Nathan

On Wed, Mar 9, 2016 at 8:45 AM, devin kelly <address@hidden> wrote:
Thanks for the help, I don't think I could have figured this out on my own.

This is because I'm on RHEL7 (argh!).  My libfftw.so doesn't contain any references to AVX. For me there are a couple of options for fixing this:

1) Use Nathan's branch.
2) Rebuild fftw with AVX support
3) Rebuild GR and Volk without AVX.

I tried 2) first and noticed this in the spec file that was in the source RPM I was trying to rebuild:

%ifarch %{ix86} x86_64
# Enable SSE2 support for x86 and x86_64
# (no avx as it is claimed to drastically slower)
for((i=0;i<2;i++)); do
 prec_flags[i]+=" --enable-sse2"
done
%endif

Is the spec file author right?  Now I'm a little confused about the approach I should take.  I'll probably just go with 1) in the mean time.

Thanks again Nathan,
Devin

On Wed, Mar 9, 2016 at 1:06 AM, West, Nathan <address@hidden> wrote:
The a and c vectors come from gr:fft objects' internal buffers. These are internally created with fftwf_malloc (lines 152/156 of gr-fft/lib/fft.cc). fftwf_malloc is obviously not generating buffers with proper alignment so you're seeing a 50% (per buffer) that this segfaults. I'll note that this is also only an issue with fftwf buffers when fftwf isn't built with AVX support (and therefore nothing in fftwf requires  a 32-byte aligned buffer).

Andy Walls (thanks!) pointed out on IRC that we had a similar issue years ago with a QT sink.

I have a branch that should fix this (https://github.com/n-west/gnuradio/tree/fft-avx-alignment). I also suggest you look in to getting a version of fftwf built with AVX. I don't know if there's a good way to tell, but if I run readelf -a on my libfftw3.so I see some functions with avx in the name.

Cheers,
nw


On Tue, Mar 8, 2016 at 1:31 PM, devin kelly <address@hidden> wrote:
OK, here's my C program:

#include <stdio.h>
#include <stdlib.h>
#include <volk/volk.h>
#include <stdint.h>

int main() {

    size_t alignment = volk_get_alignment();

    uint8_t* ptr;

    ptr = (uint8_t*)volk_malloc(1000 * sizeof(uint8_t), alignment);
    printf("alignment = %lu, ptr = %x, *ptr = %u\n", alignment, ptr, *ptr);
    volk_free((void*)ptr);
    ptr = NULL;


    return 0;
}


Compile:

$ gcc volk_test.c -o volk_test -lvolk -L/local_disk/gr_3.7.9_debug/lib

It's output:

$ ./volk_test
Using Volk machine: avx2_64_mmx_orc
alignment = 32, ptr = 151b040, *ptr = 00

Also, I've attached the output from the preprocessor, this command:

$ /usr/bin/cc  -DHAVE_AVX_CVTPI32_PS -DHAVE_CPUID_H -DHAVE_DLFCN_H -DHAVE_FENV_H -DHAVE_POSIX_MEMALIGN -DHAVE_XGETBV -Wall -fvisibility=hidden -g -I/local_disk/gr_3.7.9_src/volk/build_debug/include -I/local_disk/gr_3.7.9_src/volk/include -I/local_disk/gr_3.7.9_src/volk/kernels -I/local_disk/gr_3.7.9_src/volk/build_debug/lib -I/local_disk/gr_3.7.9_src/volk/lib -I/usr/include/orc-0.4  -E  -fPIC -o volk_malloc_preprocessed   -c /local_disk/gr_3.7.9_src/volk/lib/volk_malloc.c

I just found the compiler step from from doing 'VERBOSE=1 make' then changed the output and added -E.  I attached volk_malloc_preprocessed as well.

It looks like this is my volk_malloc():


void *volk_malloc(size_t size, size_t alignment)
{
  void *ptr;




  if (alignment == 1)
    return malloc(size);

  int err = posix_memalign(&ptr, alignment, size);
  if(err == 0) {
    return ptr;
  }
  else {
    fprintf(stderr,
            "VOLK: Error allocating memory "
            "(posix_memalign: error %d: %s)\n", err, strerror(err));
    return ((void *)0);
  }
}



Devin



On Tue, Mar 8, 2016 at 11:37 AM, West, Nathan <address@hidden> wrote:

On Tue, Mar 8, 2016 at 10:58 AM, devin kelly <address@hidden> wrote:
Calling 'info variables' (or args or locals) the last few frames didn't give me any real info so I built a copy of GR/Volk with debug symbols.  I ran the FG again, this time from GDB, here's my back trace.  In this backtrace you can see the arguments passed in each call.  I have an i7-5600U CPU @ 2.60GHz, the volk_profile is appended at the bottom.

Excellent. Thanks for going through that extra step. It really helps.
 

I don't think so. I'm not familiar with the details of the fft_filter implementations, but usually these things will take in some history if they don't have enough points to operate on (in this case 512).

The much more worrying thing is your vector addresses.
 

Thanks,
Devin

(gdb) bt
#0  0x00007fffdcaccb57 in volk_32fc_x2_multiply_32fc_a_avx2_fma (__P=0x3b051b0)
    at /usr/lib/gcc/x86_64-redhat-linux/4.8.5/include/avxintrin.h:835
#1  0x00007fffdcaccb57 in volk_32fc_x2_multiply_32fc_a_avx2_fma (cVector=0x3b1f770, aVector=0x3b051b0, bVector=0x3b240e0, num_points=512)

0x3b1f770 % 32 = 16 (bad)
0x3b051b0 % 32 = 16 (bad)
0x3b240e0 % 32 = 0 (good)

Unfortunately it looks like volk_get_alignment is returning the wrong thing or there's a bug in volk_malloc. Can you tell us what volk_get_alignment returns? The easiest thing is probably to write a simple C program that prints out the result (hmm, I should add that to volk-config-info). I'd also like to know which volk_malloc implementation you're using. Unfortunately I don't think we have an easy way to discover that (hmm, something else that should be added to volk-config-info). I think the best way might be to look at volk_malloc.c intermediate files after the preprocessor has done its work.

If you want to move on while we figure this out then you can edit ~/.volk/volk_config and replace the avx2_fma with sse3 on the line that has this kernel name on it.
 
    at /local_disk/gr_3.7.9_src/volk/kernels/volk/volk_32fc_x2_multiply_32fc.h:242
#2  0x00007fffdc945a75 in __volk_32fc_x2_multiply_32fc_a (cVector=0x3b1f770, aVector=0x3b051b0, bVector=0x3b240e0, num_points=512)
    at /local_disk/gr_3.7.9_src/volk/build_debug/lib/volk.c:7010
#3  0x00007fffd3f8e360 in gr::filter::kernel::fft_filter_ccc::filter(int, std::complex<float> const*, std::complex<float>*) (this=0x3b02f40, address@hidden, address@hidden, address@hidden)
    at /local_disk/gr_3.7.9_src/gnuradio/gr-filter/lib/fft_filter.cc:323
#4  0x00007fffd42910df in gr::digital::corr_est_cc_impl::work(int, std::vector<void const*, std::allocator<void const*> >&, std::vector<void*, std::allocator<void*> >&) (this=0x3b01560, noutput_items=257, input_items=..., output_items=std::vector of length 1, capacity 1 = {...})
    at /local_disk/gr_3.7.9_src/gnuradio/gr-digital/lib/corr_est_cc_impl.cc:237
#5  0x00007fffdd064907 in gr::sync_block::general_work(int, std::vector<int, std::allocator<int> >&, std::vector<void const*, std::allocator<void const*> >&, std::vector<void*, std::allocator<void*> >&) (this=0x3b015b8, noutput_items=<optimized out>, ninput_items=..., input_items=..., output_items=...) at /local_disk/gr_3.7.9_src/gnuradio/gnuradio-runtime/lib/sync_block.cc:66
#6  0x00007fffdd02f70f in gr::block_executor::run_one_iteration() (address@hidden)
    at /local_disk/gr_3.7.9_src/gnuradio/gnuradio-runtime/lib/block_executor.cc:438
#7  0x00007fffdd06da8a in gr::tpb_thread_body::tpb_thread_body(boost::shared_ptr<gr::block>, int) (this=0x7fff83ffedb0, block=..., max_noutput_items=<optimized out>) at /local_disk/gr_3.7.9_src/gnuradio/gnuradio-runtime/lib/tpb_thread_body.cc:122
#8  0x00007fffdd062761 in boost::detail::function::void_function_obj_invoker0<gr::thread::thread_body_wrapper<gr::tpb_container>, void>::invoke(boost::detail::function::function_buffer&) (this=0x3bc3ec0)
    at /local_disk/gr_3.7.9_src/gnuradio/gnuradio-runtime/lib/scheduler_tpb.cc:44
#9  0x00007fffdd062761 in boost::detail::function::void_function_obj_invoker0<gr::thread::thread_body_wrapper<gr::tpb_container>, void>::invoke(boost::detail::function::function_buffer&) (this=0x3bc3ec0)
    at /local_disk/gr_3.7.9_src/gnuradio/gnuradio-runtime/include/gnuradio/thread/thread_body_wrapper.h:51
#10 0x00007fffdd062761 in boost::detail::function::void_function_obj_invoker0<gr::thread::thread_body_wrapper<gr::tpb_container>, void>::invoke(boost::detail::function::function_buffer&) (function_obj_ptr=...) at /usr/include/boost/function/function_template.hpp:153
#11 0x00007fffdd016cd0 in boost::detail::thread_data<boost::function0<void> >::run() (this=<optimized out>)
    at /usr/include/boost/function/function_template.hpp:767
#12 0x00007fffdd016cd0 in boost::detail::thread_data<boost::function0<void> >::run() (this=<optimized out>)
    at /usr/include/boost/thread/detail/thread.hpp:117
#13 0x00007fffdbe4f24a in thread_proxy () at /lib64/libboost_thread-mt.so.1.53.0
#14 0x00007ffff7800dc5 in start_thread () at /lib64/libpthread.so.0
#15 0x00007ffff6e2528d in clone () at /lib64/libc.so.6

Here are the locals on the last few frames:

(gdb) f 0
#0  0x00007fffdcaccb57 in _mm256_load_ps (__P=0x3b051b0) at /usr/lib/gcc/x86_64-redhat-linux/4.8.5/include/avxintrin.h:835
835       return *(__m256 *)__P;
(gdb) info locals
No locals.
(gdb) f 1
#1  volk_32fc_x2_multiply_32fc_a_avx2_fma (cVector=0x3b1f770, aVector=0x3b051b0, bVector=0x3b240e0, num_points=512)
    at /local_disk/gr_3.7.9_src/volk/kernels/volk/volk_32fc_x2_multiply_32fc.h:242
242         const __m256 x = _mm256_load_ps((float*)a); // Load the ar + ai, br + bi as ar,ai,br,bi
(gdb) info locals
y = {-4.87433296e+17, 4.59163468e-41, -3.92813517e+17, 4.59163468e-41, 5.15677835e-43, 0, 5.26888223e-43, 0}
tmp2x = {6.389921e-43, 0, -512.314453, 4.59163468e-41, 1.26116862e-44, 0, -4.87433296e+17, 4.59163468e-41}
x = {-512.314453, 4.59163468e-41, 0, 0, 2.76102662, -3.64918089, -4.92134571, -1.06491208}
yl = {4.14784345e-43, 0, 1.26116862e-44, 0, -4.87442367e+17, 4.59163468e-41, -4.87439343e+17, 4.59163468e-41}
yh = {-1674752, 4.59163468e-41, 0, 0, -1.50397414e-36, 4.59163468e-41, -3.31452625e+17, 4.59163468e-41}
tmp2 = {6.72623263e-44, 1.2751816e-43, 2.24207754e-44, 0, 7.17464814e-43, 0, -3.31440427e+17, 4.59163468e-41}
z = {0.794147611, 0, 0.263988227, 0, -0.380019426, 0, -0.953325868, 0}
number = 0
quarterPoints = 128
c = 0x3b1f770
a = 0x3b051b0
b = 0x3b240e0
(gdb) f 2
#2  0x00007fffdc945a75 in __volk_32fc_x2_multiply_32fc_a (cVector=0x3b1f770, aVector=0x3b051b0, bVector=0x3b240e0, num_points=512)
    at /local_disk/gr_3.7.9_src/volk/build_debug/lib/volk.c:7010
7010        volk_32fc_x2_multiply_32fc_a(cVector, aVector, bVector, num_points);
(gdb) info locals
No locals.
(gdb) f 3
#3  0x00007fffd3f8e360 in gr::filter::kernel::fft_filter_ccc::filter (this=0x3b02f40, address@hidden,
    address@hidden, address@hidden) at /local_disk/gr_3.7.9_src/gnuradio/gr-filter/lib/fft_filter.cc:323
323               volk_32fc_x2_multiply_32fc_a(c, a, b, d_fftsize);
(gdb) info locals
a = <optimized out>
b = <optimized out>
c = <optimized out>
i = 0
dec_ctr = 0
j = <optimized out>
ninput_items = 257

My volk profile results:

$  volk_profile -R 32fc_x2_multiply
Using Volk machine: avx2_64_mmx_orc
RUN_VOLK_TESTS: volk_32fc_x2_multiply_32fc(131071,1987)
u_avx2_fma completed in 220ms
u_avx completed in 220ms
u_sse3 completed in 240ms
generic completed in 2810ms
a_avx2_fma completed in 200ms
a_avx completed in 220ms
a_sse3 completed in 230ms
a_generic completed in 2810ms
u_orc completed in 280ms
Best aligned arch: a_avx2_fma
Best unaligned arch: u_avx2_fma
RUN_VOLK_TESTS: volk_32fc_x2_multiply_conjugate_32fc(131071,1987)
u_avx completed in 230ms
u_sse3 completed in 230ms
generic completed in 2790ms
a_avx completed in 220ms
a_sse3 completed in 230ms
a_generic completed in 2800ms
Best aligned arch: a_avx
Best unaligned arch: u_avx
Writing "/home/devin/.volk/volk_config"...


Well I'm both jealous and happy that AVX2 is actually an improvement on newer processors. Also matches the folklore that these new technologies are usually not faster in the first silicon products that they come out in. 


_______________________________________________
Discuss-gnuradio mailing list
address@hidden
https://lists.gnu.org/mailman/listinfo/discuss-gnuradio




_______________________________________________
Discuss-gnuradio mailing list
address@hidden
https://lists.gnu.org/mailman/listinfo/discuss-gnuradio



reply via email to

[Prev in Thread] Current Thread [Next in Thread]