Hello,
I reproduced the issues on a publicly available machine.
tar xf lightning-2.1.3.tar.gz cd lightning-2.1.3 ./configure --enable-assertions make DYLD_LIBRARY_PATH=$PWD/lib/.libs ./doc/.libs/rfib # The above invocation exits an assertion about MAP_FAILED
sed -i '' s/-O2// configure sed -i '' 's/MAP_ANON,/MAP_JIT | &/' lib/lightning.c ./configure --enable-assertions make DYLD_LIBRARY_PATH=$PWD/lib/.libs ./doc/.libs/rfib # The above invocation exits with a bus error
I will continue to try when I can to debug the issue, but maybe someone who has access to cfarm and who knows lightning will be able to see what I am missing.
Darren Kulp
Hello again,
I think I have learned that my original problem is that MAP_JIT seems to be required on M1 Macs (at least on my macOS 11.2.2) when combining PROT_WRITE and PROT_EXEC, but there might also be another issue.
I did not originally understand how to build correctly with debugging (since `./configure --help` does not seem to show anything related to debugging), but after I compiled with `./configure --enable-assertions`, I found that the mmap() call was actually failing the first time (with a _jit->code.length of 4096) :
kulp@ego lightning-2.1.3 % DYLD_LIBRARY_PATH=$PWD/lib/.libs ./doc/.libs/rfib Assertion failed: (_jit->code.ptr != MAP_FAILED), function _jit_emit, file lightning.c, line 2027. zsh: abort DYLD_LIBRARY_PATH=$PWD/lib/.libs ./doc/.libs/rfib
I found out that macOS has a MAP_JIT flag for mmap() in order to allow combining PROT_WRITE and PROT_EXEC :
See also comments in this pull request I found :
When I added MAP_JIT flag like this at the affected mmap() call :
_jit->code.ptr = mmap(NULL, _jit->code.length, PROT_EXEC | PROT_READ | PROT_WRITE, MAP_PRIVATE | MAP_ANON | MAP_JIT, mmap_fd, 0);
then I no longer saw that assertion. Instead I see a bus error later :
* thread #1, queue = 'com.apple.main-thread', stop reason = EXC_BAD_ACCESS (code=2, address=0x1001d0000) frame #0: 0x0000000100128ec0 liblightning.1.dylib`_emit_code [inlined] _oxxx7(_jit=0x00000001002069f0, Op=-1451229184, Rt=29, Rt2=30, Rn=31, Simm7=-20) at jit_aarch64-cpu.c:1027:5 [opt] 1024 i.Rt2.b = Rt2; 1025 i.Rn.b = Rn; 1026 i.imm7.b = Simm7; -> 1027 ii(i.w);
but the debugger seems to get mismatching DWARF info when optimizations are enabled.
(lldb) p i error: Couldn't materialize: couldn't get the value of variable i: DW_OP_piece for offset 1 but top of stack is of size 9 error: errored out in DoExecute, couldn't PrepareToExecuteJITExpression (lldb) frame variable (jit_state_t *) _jit = 0x0000000100304160 (jit_int32_t) Op = -1451229184 (jit_int32_t) Rt = 29 (jit_int32_t) Rt2 = 30 (jit_int32_t) Rn = 31 (jit_int32_t) Simm7 = -20 (instr_t) i = <DW_OP_piece for offset 1 but top of stack is of size 9>
I edited the `configure` to remove `-O2` and rebuilt. Now I get the same bus error but I get more information, which I attached in “debugger-state.txt”.
<debugger-state.txt>
I did attach some build logs in case they are helpful (these are with -O2 still enabled).
kulp@ego lightning-2.1.3 % ./configure --enable-assertions &> configure.output kulp@ego lightning-2.1.3 % make V=1 &> make.output
<make.output><config.log><configure.output>
When I get some more time I will look into this further, since I am sure it is hard for others to debug it with this information.
Darren Kulp
Thanks for your response. I picked a busy time for me (starting a new job in a new city) so it will take me a bit longer to get back to this than I hoped, but I expect to get you a fuller response within a few weeks.
Darren Kulp
Em dom., 7 de mar. de 2021 às 19:17, Darren Kulp <darren@kulp.ch> escreveu: Hello,
Hi,Thank you for GNU lightning. It is a great tool and I have appreciated how things generally just work. Right now, I am seeing a rare exception to that rule: when I build GNU lightning 2.1.3 on my M1 Macbook (arm64 architecture), the generated example codes appear to hang inside jit_emit().
The script below shows what I know so far.
curl -O http://ftp.gnu.org/gnu/lightning/lightning-2.1.3.tar.gz tar xf lightning-2.1.3.tar.gz cd lightning-2.1.3 CFLAGS=-g3 LDFLAGS=-g3 ./configure && make DYLD_LIBRARY_PATH=$PWD/lib/.libs ./doc/.libs/rfib & sleep 5 # arbitrary sleep time to allow to get stuck lldb -p $(pgrep rfib)
After those commands, an LLDB backtrace shows this :
(lldb) bt * thread #1, queue = 'com.apple.main-thread', stop reason = signal SIGSTOP * frame #0: 0x000000018e6259c8 libsystem_kernel.dylib`__mmap + 8 frame #1: 0x000000018e625954 libsystem_kernel.dylib`mmap + 52 frame #2: 0x000000010025b870 liblightning.1.dylib`_jit_emit(_jit=0x0000000127e06970) at lightning.c:2065:23 frame #3: 0x0000000100237d2c rfib`main(argc=1, argv=0x000000016fbcb940) at rfib.c:47:9 frame #4: 0x000000018e679f34 libdyld.dylib`start + 4
Stepping through the code, it appears that the code is stuck in the loop starting at line 2033 of lightning.c, and that emit_code() continues to return NULL each time it is called.
This should only happen if while generating jit, it notices the instructionpointer in the mmap'ed area would overflow. Can you run under a debug environment? It would be very valuableto know what value _jit.length has in the first call to emit_code. Itshould have calculated a sane value, but to enter an infinite loop,it probably has a negative, and very small value, as it incrementsthe size in 4k at a time, and tries again. It really should not evenloop, as it should never miscalculate that bad. To debug this issue, it should be enough to set a breakpoint in_jit_emit, then a watchpoint on *(long*)_jit->code.length Should also check what value _jit->code.end has, as it might alsobe somehow getting an incorrect value, but in all conditions, itshould be due to bad code generation. Can you also share allbuild logs? Maybe the compiler its giving some advice of someissue that my test environment on Linux and gcc did not have.I noticed this problem first when I tried to use Homebrew to install GNU lightning on my mac. Homebrew has “bottles” (binary distributions) compiled for Intel platforms including macOS Big Sur, but not for M1 Macs as of this writing.
kulp@ego /tmp % brew install lightning Error: lightning: no bottle available! You can try to install from source with: brew install --build-from-source lightning
When I tried to install using `brew install --build-from-source lightning`, I noticed that the `check` process took 100% CPU for a long time (over an hour), so I guessed it must be in an infinite loop, and tried to build it myself as I had previously done successfully on Intel Macs. That is when I discovered the details I show above.
I would have tried to reproduce the issue on master, but I get stuck with autoconf (my autoconf 2.69 rejects some directives in configure.ac).
In a few days I will regain access to my Intel Mac (with older macOS version of High Sierra instead of Big Sur), for comparison. Until then, can anyone suggest something else I could try in order to narrow things down ?
Darren Kulp
Thanks!Paulo
|