octave-bug-tracker
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Octave-bug-tracker] [bug #57591] Segmentation faults when running the t


From: John W. Eaton
Subject: [Octave-bug-tracker] [bug #57591] Segmentation faults when running the test suite
Date: Sat, 17 Jul 2021 20:10:02 -0400 (EDT)
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:78.0) Gecko/20100101 Firefox/78.0

Update of bug #57591 (project octave):

             Open/Closed:                  Closed => Open                   

    _______________________________________________________

Follow-up Comment #167:

I'm reopening at least temporarily because after all this time I finally
encountered a crash in the gmres.m tests while running "make check" from the
command line and I happened to have core files enabled:


  ...
  sparse/gmres.m .................................................fatal:
caught signal Segmentation fault -- stopping myself...
/bin/bash: line 1: 2189756 Segmentation fault      (core dumped) /bin/bash
../run-octave --no-init-file --silent --no-history -p
/home/jwe/build/octave/test/mex /home/jwe/src/octave/test/fntests.m
/home/jwe/src/octave/test
make[3]: *** [Makefile:32312: check-local] Error 139
make[3]: Leaving directory '/net/devnull/scratch/jwe/build/octave'
make[2]: *** [Makefile:28396: check-am] Error 2
make[2]: Leaving directory '/net/devnull/scratch/jwe/build/octave'
make[1]: *** [Makefile:28098: check-recursive] Error 1
make[1]: Leaving directory '/net/devnull/scratch/jwe/build/octave'
make: *** [Makefile:28398: check] Error 2
-

If this were a problem with the Linux memory manager I would expect the
process to be killed, not to stop with a segfault.

I have the following OpenBLAS packages installed on the system where the crash
happened:


ii  libopenblas-dev:amd64         0.3.13+ds-3  amd64        Optimized BLAS
(linear algebra) library (dev, meta)
ii  libopenblas-pthread-dev:amd64 0.3.13+ds-3  amd64        Optimized BLAS
(linear algebra) library (dev, pthread)
ii  libopenblas0:amd64            0.3.13+ds-3  amd64        Optimized BLAS
(linear algebra) library (meta)
ii  libopenblas0-pthread:amd64    0.3.13+ds-3  amd64        Optimized BLAS
(linear algebra) library (shared lib, pthread)
ii  libopenblas0-serial:amd64     0.3.13+ds-3  amd64        Optimized BLAS
(linear algebra) library (shared lib, serial)


These should be the same on all of my buildbot systems.

Here is the top of the call stack at the point of the crash:


(gdb) where
#0  0x00007fd339cf804c in dgemm_incopy_PILEDRIVER () at
/usr/lib/x86_64-linux-gnu/libopenblas.so.0
#1  0x00007fd33ac45830 in gotoblas () at
/usr/lib/x86_64-linux-gnu/libopenblas.so.0
#2  0x0000000000000001 in  ()
#3  0x00007fd338b7b345 in dgemm_tn () at
/usr/lib/x86_64-linux-gnu/libopenblas.so.0
#4  0x00007fd33c2c5908 in dgemm_ () at /usr/lib/x86_64-linux-gnu/libblas.so.3
#5  0x00007fd33cb99f1c in dlarfb_ () at
/usr/lib/x86_64-linux-gnu/liblapack.so.3
#6  0x00007fd33cbd7909 in dormqr_ () at
/usr/lib/x86_64-linux-gnu/liblapack.so.3
#7  0x00007fd33cbd560a in dormbr_ () at
/usr/lib/x86_64-linux-gnu/liblapack.so.3
#8  0x00007fd33cb1e778 in dgelsd_ () at
/usr/lib/x86_64-linux-gnu/liblapack.so.3
#9  0x00007fd33ff37b7e in Matrix::lssolve(Matrix const&, long&, long&,
double&) const
    (this=0x7fd2f3fe1fb0, b=<optimized out>, info=@0x7fd2f3fe1f30: 0,
rank=@0x7fd2f3fe1e68: 0, rcon=@0x7fd2f3fe1f38: -1)
    at /home/jwe/src/octave/liboctave/array/dMatrix.cc:2051
#10 0x00007fd33ff3829b in Matrix::solve(MatrixType&, Matrix const&, long&,
double&, void (*)(double), bool, blas_trans_type) const
    (this=this@entry=0x7fd2f3fe1fb0, mattype=..., b=
    ..., info=@0x7fd2f3fe1f30: 0, rcon=@0x7fd2f3fe1f38: -1,
sing_handler=0x7fd3418606f0 <solve_singularity_warning(double)>,
singular_fallback=true, transt=blas_no_trans) at
/home/jwe/src/octave/liboctave/array/dMatrix.cc:1625
#11 0x00007fd341862093 in xleftdiv(Matrix const&, Matrix const&, MatrixType&,
blas_trans_type) (a=..., b=..., typ=..., transt=transt@entry=blas_no_trans)
    at /home/jwe/src/octave/libinterp/corefcn/xdiv.cc:353
#12 0x00007fd340ebc7df in oct_binop_ldiv(octave_base_value const&,
octave_base_value const&) (a1=<optimized out>, a2=...)
    at /home/jwe/src/octave/libinterp/operators/op-m-m.cc:91
#13 0x00007fd341210f34 in octave::binary_op(octave::type_info&,
octave_value::binary_op, octave_value const&, octave_value const&)
    (ti=..., op=octave_value::op_ldiv, v1=..., v2=...) at
/home/jwe/src/octave/libinterp/octave-value/ov.h:1417
#14 0x00007fd3412b4413 in
octave::tree_binary_expression::evaluate(octave::tree_evaluator&, int)
(this=0x7fd1bd2b2e40, tw=...)
    at /home/jwe/src/octave/libinterp/parse-tree/pt-binop.cc:140
#15 0x00007fd3412b02aa in
octave::tree_simple_assignment::evaluate(octave::tree_evaluator&, int)
(this=0x7fd1bd2b2ed0, tw=...)
    at /home/jwe/src/octave/libinterp/parse-tree/pt-assign.cc:101
#16 0x00007fd3412cd89d in
octave::tree_evaluator::visit_statement(octave::tree_statement&)
(this=0x7fd2e8005878, stmt=<optimized out>)
    at /home/jwe/src/octave/libinterp/parse-tree/pt-eval.cc:3772
#17 0x00007fd3412bbc04 in octave::tree_statement::accept(octave::tree_walker&)
(tw=..., this=0x7fd1bd2b2f10)
    at /home/jwe/src/octave/libinterp/parse-tree/pt-stmt.h:124
#18 octave::tree_evaluator::visit_statement_list(octave::tree_statement_list&)
(this=0x7fd2e8005878, lst=...)
    at /home/jwe/src/octave/libinterp/parse-tree/pt-eval.cc:3857
#19 0x00007fd3412ccc5f in
octave::tree_statement_list::accept(octave::tree_walker&) (tw=...,
this=<optimized out>)
--Type <RET> for more, q to quit, c to continue without paging--
    at /home/jwe/src/octave/libinterp/parse-tree/pt-stmt.h:201
#20 octave::tree_evaluator::visit_while_command(octave::tree_while_command&)
(this=0x7fd2e8005878, cmd=...)
    at /home/jwe/src/octave/libinterp/parse-tree/pt-eval.cc:4190
#21 0x00007fd3412cd806 in
octave::tree_evaluator::visit_statement(octave::tree_statement&)
(this=0x7fd2e8005878, stmt=...)
    at /home/jwe/src/octave/libinterp/parse-tree/pt-eval.cc:3747
#22 0x00007fd3412bbc04 in octave::tree_statement::accept(octave::tree_walker&)
(tw=..., this=0x7fd1bd2ba5f0)
    at /home/jwe/src/octave/libinterp/parse-tree/pt-stmt.h:124
#23 octave::tree_evaluator::visit_statement_list(octave::tree_statement_list&)
(this=0x7fd2e8005878, lst=...)
    at /home/jwe/src/octave/libinterp/parse-tree/pt-eval.cc:3857
#24 0x00007fd3412c4ef7 in
octave::tree_statement_list::accept(octave::tree_walker&) (tw=...,
this=0x7fd1bd25fc30)
    at /home/jwe/src/octave/libinterp/parse-tree/pt-stmt.h:201
#25 octave::tree_evaluator::execute_user_function(octave_user_function&, int,
octave_value_list const&) (this=this@entry=0x7fd2e8005878, user_function=
    ..., nargout=nargout@entry=2, xargs=...) at
/home/jwe/src/octave/libinterp/parse-tree/pt-eval.cc:3503


The Matrix:lssolve stack frame shows


#9  0x00007fd33ff37b7e in Matrix::lssolve (this=0x7fd2f3fe1fb0, b=...,
info=@0x7fd2f3fe1f30: 0, rank=@0x7fd2f3fe1e68: 0, rcon=@0x7fd2f3fe1f38: -1)
    at /home/jwe/src/octave/liboctave/array/dMatrix.cc:2051
2051              F77_XFCN (dgelsd, DGELSD, (m, n, nrhs, tmp_data, m, pretval,
(gdb) list
2046              rcon = octave::numeric_limits<double>::NaN ();
2047              retval = Matrix (n, b_nc, octave::numeric_limits<double>::NaN
());
2048            }
2049          else
2050            {
2051              F77_XFCN (dgelsd, DGELSD, (m, n, nrhs, tmp_data, m, pretval,
2052                                         maxmn, ps, rcon, tmp_rank,
2053                                         work.fortran_vec (), lwork,
2054                                         piwork, tmp_info));
2055
(gdb) p m
$1 = 76
(gdb) p n
$2 = 75
(gdb) p nrhs
$3 = 1
(gdb) p tmp_data
$4 = <optimized out>
(gdb) p m
$5 = 76
(gdb) p maxmn
$6 = 76
(gdb) p ps
$7 = <optimized out>
(gdb) p rcon
$8 = (double &) @0x7fd2f3fe1f38: -1
(gdb) p lwork
$9 = 6601
(gdb) p piwork
$10 = <optimized out>


In lssolve, I see that we want to query the LAPACK functions for the workspace
requirements but there are comments there about the calculation being broken
in some versions of LAPACK so we also compute the sizes.  Are the formulas
correct?  I can't tell from the info on the stack after the crash whether we
end up using our own calculated values or the ones that LAPACK provides.  Our
calculation is complicated, so it seems quite possible that it is incorrect,
or that the requirements may have changed since this code was written.

If it is not a problem with the workspace size that results in a memory fault,
then the problem begins to look more like a bug in DGELSD or OpenBLAS.

This crash occurred after my changes here:

http://hg.savannah.gnu.org/hgweb/octave/rev/3ab696e02f55

but I don't think those have anything to do with this crash as they are
completely unrelated to matrix calculations.

With some effort, I might be able to use gdb to extract the data necessary to
recreate the call to lssolve that resulted in the crash, but if this is a
random memory issue the crash may still not be easily reproducible.


    _______________________________________________________

Reply to this item at:

  <https://savannah.gnu.org/bugs/?57591>

_______________________________________________
  Message sent via Savannah
  https://savannah.gnu.org/




reply via email to

[Prev in Thread] Current Thread [Next in Thread]