qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH] KVM: x86: do not fail if software breakpoint has already bee


From: Maxim Levitsky
Subject: Re: [PATCH] KVM: x86: do not fail if software breakpoint has already been removed
Date: Thu, 11 Mar 2021 14:53:15 +0200
User-agent: Evolution 3.36.5 (3.36.5-2.fc32)

On Thu, 2021-03-04 at 11:15 +0100, Stefano Garzarella wrote:
> On Wed, Mar 03, 2021 at 02:07:24PM +0200, Maxim Levitsky wrote:
> > On Tue, 2021-03-02 at 15:52 +0100, Stefano Garzarella wrote:
> > > On Mon, Mar 01, 2021 at 02:56:40PM +0200, Maxim Levitsky wrote:
> > > > On Mon, 2021-03-01 at 12:17 +0100, Paolo Bonzini wrote:
> > > > > If kvm_arch_remove_sw_breakpoint finds that a software breakpoint 
> > > > > does not
> > > > > have an INT3 instruction, it fails.  This can happen if one sets a
> > > > > software breakpoint in a kernel module and then reloads it.  gdb then
> > > > > thinks the breakpoint cannot be deleted and there is no way to add it
> > > > > back.
> > > > > 
> > > > > Suggested-by: Maxim Levitsky <mlevitsk@redhat.com>
> > > > > Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
> > > > > ---
> > > > >  target/i386/kvm/kvm.c | 9 +++++++--
> > > > >  1 file changed, 7 insertions(+), 2 deletions(-)
> > > > > 
> > > > > diff --git a/target/i386/kvm/kvm.c b/target/i386/kvm/kvm.c
> > > > > index 0b5755e42b..c8d61daf68 100644
> > > > > --- a/target/i386/kvm/kvm.c
> > > > > +++ b/target/i386/kvm/kvm.c
> > > > > @@ -4352,8 +4352,13 @@ int kvm_arch_remove_sw_breakpoint(CPUState 
> > > > > *cs, struct kvm_sw_breakpoint *bp)
> > > > >  {
> > > > >      uint8_t int3;
> > > > > 
> > > > > -    if (cpu_memory_rw_debug(cs, bp->pc, &int3, 1, 0) || int3 != 0xcc 
> > > > > ||
> > > > > -        cpu_memory_rw_debug(cs, bp->pc, (uint8_t *)&bp->saved_insn, 
> > > > > 1, 1)) {
> > > > > +    if (cpu_memory_rw_debug(cs, bp->pc, &int3, 1, 0)) {
> > > > > +        return -EINVAL;
> > > > > +    }
> > > > > +    if (int3 != 0xcc) {
> > > > > +        return 0;
> > > > > +    }
> > > > > +    if (cpu_memory_rw_debug(cs, bp->pc, (uint8_t *)&bp->saved_insn, 
> > > > > 1, 1)) {
> > > > >          return -EINVAL;
> > > > >      }
> > > > >      return 0;
> > > > 
> > > > There still remains a philosopical question if 
> > > > kvm_arch_remove_sw_breakpoint
> > > > should always return 0, since for the usual case of kernel debugging 
> > > > where
> > > > a breakpoint is in unloaded module, the above will probably still fail
> > > > as paging for this module is removed as well by the kernel.
> > > > It is still better though so:
> > > > 
> > > > Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com>
> > > > 
> > > > Note that I managed to make lx-symbols to work in a very stable way
> > > > with attached WIP patch I hacked on this Sunday.
> > > > I will send a cleaned up version of it to upstream when I have time.
> > > > 
> > > > Since I make gdb unload the symbols, it works even without this patch.
> > > > 
> > > > Added Stefano Garzarella to CC as I know that he tried to make this 
> > > > work as well.
> > > > https://lkml.org/lkml/2020/10/5/514
> > > 
> > > Thanks Maxim for CCing me!
> > > 
> > > Just a report when I tried these patches, but I'm not sure they are
> > > related.
> > > 
> > > I found that gdb 10 has some problem with QEMU:
> > > 
> > >      $ gdb --version
> > >      GNU gdb (GDB) Fedora 10.1-2.fc33
> > > 
> > >      (gdb) lx-symbols
> > >      loading vmlinux
> > >      scanning for modules in linux/build
> > >      ../../gdb/dwarf2/frame.c:1085: internal-error: Unknown CFA rule.
> > > 
> > > With gdb 9 'lx-symbols' works well, but I still have some issue when I
> > > put a breakpoint to a symbol not yet loaded (vsock_core_register in this
> > > example), then I load the module (vsock_loopback in this example) in the
> > > guest.
> > > 
> > > Whit your patch gdb stuck after loading the symbols of the first new
> > > module:
> > >      (gdb) b vsock_core_register
> > >      Function "vsock_core_register" not defined.
> > >      Make breakpoint pending on future shared library load? (y or [n]) y
> > >      Breakpoint 1 (vsock_core_register) pending.
> > >      (gdb) c
> > >      Continuing.
> > >      loading @0xffffffffc00a1000: linux/build/net/vmw_vsock/vsock.ko
> > > 
> > > Without your patch, gdb loops infinitely reloading the new module:
> > >      refreshing all symbols to reload module 'vsock'
> > >      loading vmlinux
> > >      loading @0xffffffffc00a1000: linux/build/net/vmw_vsock/vsock.ko
> > >      loading @0xffffffffc00ad000: linux/build/drivers/net/tun.ko
> > >      loading @0xffffffffc007e000: linux/build/net/bridge/bridge.ko
> > >      loading @0xffffffffc0077000: linux/build/net/802/stp.ko
> > >      loading @0xffffffffc0007000: linux/build/net/llc/llc.ko
> > >      loading @0xffffffffc0013000: linux/build/net/sunrpc/sunrpc.ko
> > >      loading @0xffffffffc000d000: 
> > > linux/build/net/ipv4/netfilter/ip_tables.ko
> > >      loading @0xffffffffc0000000: linux/build/net/netfilter/x_tables.ko
> > >      refreshing all symbols to reload module 'vsock'
> > >      loading vmlinux
> > >      loading @0xffffffffc00a1000: linux/build/net/vmw_vsock/vsock.ko
> > >      loading @0xffffffffc00ad000: linux/build/drivers/net/tun.ko
> > >      ...
> > > 
> > > I'll try to get a better look at what's going on.
> > 
> > Let me then explain what I found:
> > 
> > First of all initial execution of lx-symbols works and always worked for me
> > (I use gdb 9 though from fedora 32 (9.1-7.fc32))
> > 
> > 
> > Then a special breakpoint is placed on do_init_module
> > (it is hidden from the user)
> > 
> > Once it is hit two things can happen:
> > 
> > 1. if a not yet seen module is loaded,
> >   (module that wasn't loaded last time all symbols were reloaded)
> >   its symbols are loaded to gdb with 'add-symbol-file' command.
> > 
> > 2. if module that was already loaded to gdb, is loaded (see above),
> > then 'big hammer' is used:
> > 
> > a. all existing breakpoints (including the hidden one)
> >    are disabled since reloading everything
> >    indeed messes up the gdb state
> > 
> > b. the executable's symbols (the kernel) are reloaded,
> >    which makes gdb unload all symbols, and then all symbols are loaded
> >    again (in the same way as lx-symbols works),
> >    including the symbols of currently loading module.
> > 
> > c. all breakpoints are enabled again
> > 
> > 
> > Now the issue that you originally reported on LKML is because the (1)
> > apparently also messes up the software breakpoints state in gdb,
> > and that makes gdb to not to temporary remove the breakpoint
> > in do_init_module once the execution is resumed, and then
> > the guest starts executing garbage (bytes after 'int3' instruction).
> > 
> > The second issue is that (2), aka the big hammer isn't really needed.
> > GDB does have (maybe this is recent command?) a 'remove-symbol-file'
> > command.
> > 
> > So it is possible to do remove-symbol-file/add-symbol-file'
> > on known module reload instead of reloading everything.
> > 
> > But this has a small issue. The issue is that such known module
> > was already reloaded, so all int3 instructions that gdb placed into
> > it are already gone, so sofware breakpoints placed to it won't work
> > This is what the patch that Paulo sent fixes.
> > 
> > However it is even better to create another hidden breakpoint on module 
> > removal
> > path (I used free_module for that) and unload the symbols there.
> > This allows the gdb to cleanly remove all software breakpoints
> > in that module, show them again as pending, and even re-install
> > them again once the module is loaded again.
> > 
> > So those are the two changes I made:
> > 
> > 1. I added a breakpoint on module removal which also does
> >  a. disable all breakpoints
> >  b. unload the module's symbols
> >  c. enable all breakpoints
> > 
> > 2. On module load I also do
> >  a. disable all breakpoints
> >  b. load the module's symbols
> >  c. enable all breakpoints
> > 
> > 
> > There is still an issue that both 'load' and 'unload' breakpoint
> > can hit more that once.
> > This happens because these are software breakpoints and
> > load/unload code in the kernel is executed with interrupts enabled.
> > 
> > So what is happening is that once the hidden breakpoint is hit, gdb script
> > attached to it is done, and VM is resumed, gdb does more or less this:
> > 
> > a. remove the breakpoint
> > b. do a single step
> > c. re-install the breakpoint.
> > 
> > However the single step more often than not, lands us into an interrupt
> > handler, and so once the handler is finished we end up re-executing the
> > instruction on which breakpoint was set.
> > On a single vCPU it works more or less (with several tries) on my machine,
> > but with many vCPUs, I can end up in live lock like state
> > when the above prevents the VM from progressing.
> > 
> > I think we can fix this on kvm side by not injecting interrupts
> > when single step is done.
> 
> Thank you so much for this detailed description, very much appreciated!
> 
> > In fact the below patch works for me,
> > Not only it fixes the live lock but makes these hidden breakpoints
> > hit only once in my testing.
> > 
> > diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> > index eec62c0dafc36..8b7a4e27bcf66 100644
> > --- a/arch/x86/kvm/x86.c
> > +++ b/arch/x86/kvm/x86.c
> > @@ -8501,6 +8501,12 @@ static void inject_pending_event(struct kvm_vcpu 
> > *vcpu, bool *req_immediate_exit
> >                        goto busy;
> >        }
> > 
> > +       /*
> > +        * Don't inject interrupts while single stepping to make guest 
> > debug easier
> > +        */
> > +       if (vcpu->guest_debug & KVM_GUESTDBG_SINGLESTEP)
> > +               can_inject = false;
> > +
> > 
> > With this patch lx-symbols works almost perfectly for me (on AMD).
> 
> I'll try this patch!

Note that the patch to disable interrupt injection while single
stepping is wrong, due to some whatever mistake I made while rebasing things.
I attached an updated version.

With it (and the patch to lx-symbols itself which should be applied to the 
guest kernel),
I was able to run lx-symbols very well on both intel and AMD.

Note that I compile the guest kernel in the guest and install, but then I copy 
it
to the host, and I run the gdb from there. This is my debug.sh script:


vm adm hmp "gdbserver tcp::2345"
gdb ~/Kernel/vm-fedora/src/vmlinux \
        -ex "target remote :2345" \
        -ex "cd /home/mlevitsk/Kernel/vm-fedora/src" \
        -ex "lx-symbols" \
        -ex "cont"

Note that the debug must start after guest kernel was loaded.

Best regards,
        Maxim Levitsky


> 
> > The only remaing issue (that might be not easy to fix)
> > is that I still can't place breakpoints in __init module code.
> > That code is physically removed from the kernel once the module is done 
> > loading,
> > and it seems that its debug info isn't correct to even make a hardware
> > breakpoint work on it.
> > (gdb shows very small addresses for these symbols).
> > 
> > As for why this doesn't work for you I have 3 theories:
> > 
> > 1. The whole 'reload symbols on breakpoint' is forbidden
> > by gdb in the manual, that is one of the reasons that I had
> > to disable software breakpoints prior to doing this.
> > There might be other things that break in different gdb versions.
> > I don't see a way to make it work without doing this.
> 
> Maybe this can also be the reason why gdb 10 doesn't work for me.
> I should investigate further.
> 
> > 2. Maybe you test that on Intel and something is broken there
> >   on KVM level (I tested only AMD).
> 
> Yes, I'm on Intel.
> 
> > 3. Maybe you debug a nested guest? I haven't tested if guest debug
> > works fine in this configuration.
> 
> Nope, I'm not debugging nested guest in that case.
> 
> Thanks for your help,
> Stefano
> 

Attachment: 0001-gdb-rework-gdb-debug-script.patch
Description: Text Data

Attachment: 0002-KVM-x86-Guest-debug-don-t-inject-interrupts-while-si.patch
Description: Text Data


reply via email to

[Prev in Thread] Current Thread [Next in Thread]