qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH] KVM: x86: do not fail if software breakpoint has already bee


From: Stefano Garzarella
Subject: Re: [PATCH] KVM: x86: do not fail if software breakpoint has already been removed
Date: Fri, 12 Mar 2021 12:59:52 +0100

On Thu, Mar 11, 2021 at 02:53:15PM +0200, Maxim Levitsky wrote:
On Thu, 2021-03-04 at 11:15 +0100, Stefano Garzarella wrote:
On Wed, Mar 03, 2021 at 02:07:24PM +0200, Maxim Levitsky wrote:
> On Tue, 2021-03-02 at 15:52 +0100, Stefano Garzarella wrote:
> > On Mon, Mar 01, 2021 at 02:56:40PM +0200, Maxim Levitsky wrote:
> > > On Mon, 2021-03-01 at 12:17 +0100, Paolo Bonzini wrote:
> > > > If kvm_arch_remove_sw_breakpoint finds that a software breakpoint does 
not
> > > > have an INT3 instruction, it fails.  This can happen if one sets a
> > > > software breakpoint in a kernel module and then reloads it.  gdb then
> > > > thinks the breakpoint cannot be deleted and there is no way to add it
> > > > back.
> > > >
> > > > Suggested-by: Maxim Levitsky <mlevitsk@redhat.com>
> > > > Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
> > > > ---
> > > >  target/i386/kvm/kvm.c | 9 +++++++--
> > > >  1 file changed, 7 insertions(+), 2 deletions(-)
> > > >
> > > > diff --git a/target/i386/kvm/kvm.c b/target/i386/kvm/kvm.c
> > > > index 0b5755e42b..c8d61daf68 100644
> > > > --- a/target/i386/kvm/kvm.c
> > > > +++ b/target/i386/kvm/kvm.c
> > > > @@ -4352,8 +4352,13 @@ int kvm_arch_remove_sw_breakpoint(CPUState *cs, 
struct kvm_sw_breakpoint *bp)
> > > >  {
> > > >      uint8_t int3;
> > > >
> > > > -    if (cpu_memory_rw_debug(cs, bp->pc, &int3, 1, 0) || int3 != 0xcc ||
> > > > -        cpu_memory_rw_debug(cs, bp->pc, (uint8_t *)&bp->saved_insn, 1, 
1)) {
> > > > +    if (cpu_memory_rw_debug(cs, bp->pc, &int3, 1, 0)) {
> > > > +        return -EINVAL;
> > > > +    }
> > > > +    if (int3 != 0xcc) {
> > > > +        return 0;
> > > > +    }
> > > > +    if (cpu_memory_rw_debug(cs, bp->pc, (uint8_t *)&bp->saved_insn, 1, 
1)) {
> > > >          return -EINVAL;
> > > >      }
> > > >      return 0;
> > >
> > > There still remains a philosopical question if 
kvm_arch_remove_sw_breakpoint
> > > should always return 0, since for the usual case of kernel > > > debugging where
> > > a breakpoint is in unloaded module, the above will probably still fail
> > > as paging for this module is removed as well by the kernel.
> > > It is still better though so:
> > >
> > > Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com>
> > >
> > > Note that I managed to make lx-symbols to work in a very stable way
> > > with attached WIP patch I hacked on this Sunday.
> > > I will send a cleaned up version of it to upstream when I have time.
> > >
> > > Since I make gdb unload the symbols, it works even without this patch.
> > >
> > > Added Stefano Garzarella to CC as I know that he tried to make this work 
as well.
> > > https://lkml.org/lkml/2020/10/5/514
> >
> > Thanks Maxim for CCing me!
> >
> > Just a report when I tried these patches, but I'm not sure they are
> > related.
> >
> > I found that gdb 10 has some problem with QEMU:
> >
> >      $ gdb --version
> >      GNU gdb (GDB) Fedora 10.1-2.fc33
> >
> >      (gdb) lx-symbols
> >      loading vmlinux
> >      scanning for modules in linux/build
> >      ../../gdb/dwarf2/frame.c:1085: internal-error: Unknown CFA rule.
> >
> > With gdb 9 'lx-symbols' works well, but I still have some issue when I
> > put a breakpoint to a symbol not yet loaded (vsock_core_register in this
> > example), then I load the module (vsock_loopback in this example) in the
> > guest.
> >
> > Whit your patch gdb stuck after loading the symbols of the first new
> > module:
> >      (gdb) b vsock_core_register
> >      Function "vsock_core_register" not defined.
> >      Make breakpoint pending on future shared library load? (y or [n]) y
> >      Breakpoint 1 (vsock_core_register) pending.
> >      (gdb) c
> >      Continuing.
> >      loading @0xffffffffc00a1000: linux/build/net/vmw_vsock/vsock.ko
> >
> > Without your patch, gdb loops infinitely reloading the new module:
> >      refreshing all symbols to reload module 'vsock'
> >      loading vmlinux
> >      loading @0xffffffffc00a1000: linux/build/net/vmw_vsock/vsock.ko
> >      loading @0xffffffffc00ad000: linux/build/drivers/net/tun.ko
> >      loading @0xffffffffc007e000: linux/build/net/bridge/bridge.ko
> >      loading @0xffffffffc0077000: linux/build/net/802/stp.ko
> >      loading @0xffffffffc0007000: linux/build/net/llc/llc.ko
> >      loading @0xffffffffc0013000: linux/build/net/sunrpc/sunrpc.ko
> >      loading @0xffffffffc000d000: 
linux/build/net/ipv4/netfilter/ip_tables.ko
> >      loading @0xffffffffc0000000: linux/build/net/netfilter/x_tables.ko
> >      refreshing all symbols to reload module 'vsock'
> >      loading vmlinux
> >      loading @0xffffffffc00a1000: linux/build/net/vmw_vsock/vsock.ko
> >      loading @0xffffffffc00ad000: linux/build/drivers/net/tun.ko
> >      ...
> >
> > I'll try to get a better look at what's going on.
>
> Let me then explain what I found:
>
> First of all initial execution of lx-symbols works and always worked for me
> (I use gdb 9 though from fedora 32 (9.1-7.fc32))
>
>
> Then a special breakpoint is placed on do_init_module
> (it is hidden from the user)
>
> Once it is hit two things can happen:
>
> 1. if a not yet seen module is loaded,
>   (module that wasn't loaded last time all symbols were reloaded)
>   its symbols are loaded to gdb with 'add-symbol-file' command.
>
> 2. if module that was already loaded to gdb, is loaded (see above),
> then 'big hammer' is used:
>
> a. all existing breakpoints (including the hidden one)
>    are disabled since reloading everything
>    indeed messes up the gdb state
>
> b. the executable's symbols (the kernel) are reloaded,
>    which makes gdb unload all symbols, and then all symbols are loaded
>    again (in the same way as lx-symbols works),
>    including the symbols of currently loading module.
>
> c. all breakpoints are enabled again
>
>
> Now the issue that you originally reported on LKML is because the (1)
> apparently also messes up the software breakpoints state in gdb,
> and that makes gdb to not to temporary remove the breakpoint
> in do_init_module once the execution is resumed, and then
> the guest starts executing garbage (bytes after 'int3' instruction).
>
> The second issue is that (2), aka the big hammer isn't really > needed.
> GDB does have (maybe this is recent command?) a 'remove-symbol-file'
> command.
>
> So it is possible to do remove-symbol-file/add-symbol-file'
> on known module reload instead of reloading everything.
>
> But this has a small issue. The issue is that such known module
> was already reloaded, so all int3 instructions that gdb placed into
> it are already gone, so sofware breakpoints placed to it won't work
> This is what the patch that Paulo sent fixes.
>
> However it is even better to create another hidden breakpoint on module 
removal
> path (I used free_module for that) and unload the symbols there.
> This allows the gdb to cleanly remove all software breakpoints
> in that module, show them again as pending, and even re-install
> them again once the module is loaded again.
>
> So those are the two changes I made:
>
> 1. I added a breakpoint on module removal which also does
>  a. disable all breakpoints
>  b. unload the module's symbols
>  c. enable all breakpoints
>
> 2. On module load I also do
>  a. disable all breakpoints
>  b. load the module's symbols
>  c. enable all breakpoints
>
>
> There is still an issue that both 'load' and 'unload' breakpoint
> can hit more that once.
> This happens because these are software breakpoints and
> load/unload code in the kernel is executed with interrupts enabled.
>
> So what is happening is that once the hidden breakpoint is hit, gdb script
> attached to it is done, and VM is resumed, gdb does more or less this:
>
> a. remove the breakpoint
> b. do a single step
> c. re-install the breakpoint.
>
> However the single step more often than not, lands us into an interrupt
> handler, and so once the handler is finished we end up re-executing the
> instruction on which breakpoint was set.
> On a single vCPU it works more or less (with several tries) on my machine,
> but with many vCPUs, I can end up in live lock like state
> when the above prevents the VM from progressing.
>
> I think we can fix this on kvm side by not injecting interrupts
> when single step is done.

Thank you so much for this detailed description, very much appreciated!

> In fact the below patch works for me,
> Not only it fixes the live lock but makes these hidden breakpoints
> hit only once in my testing.
>
> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> index eec62c0dafc36..8b7a4e27bcf66 100644
> --- a/arch/x86/kvm/x86.c
> +++ b/arch/x86/kvm/x86.c
> @@ -8501,6 +8501,12 @@ static void inject_pending_event(struct > kvm_vcpu *vcpu, bool *req_immediate_exit
>                        goto busy;
>        }
>
> +       /*
> +        * Don't inject interrupts while single stepping to make guest debug 
easier
> +        */
> +       if (vcpu->guest_debug & KVM_GUESTDBG_SINGLESTEP)
> +               can_inject = false;
> +
>
> With this patch lx-symbols works almost perfectly for me (on AMD).

I'll try this patch!

Note that the patch to disable interrupt injection while single
stepping is wrong, due to some whatever mistake I made while rebasing things.
I attached an updated version.

With it (and the patch to lx-symbols itself which should be applied to the 
guest kernel),
I was able to run lx-symbols very well on both intel and AMD.

Note that I compile the guest kernel in the guest and install, but then I copy 
it
to the host, and I run the gdb from there. This is my debug.sh script:


vm adm hmp "gdbserver tcp::2345"
gdb ~/Kernel/vm-fedora/src/vmlinux \
        -ex "target remote :2345" \
        -ex "cd /home/mlevitsk/Kernel/vm-fedora/src" \
        -ex "lx-symbols" \
        -ex "cont"

Note that the debug must start after guest kernel was loaded.


Yeah! The patches fix my issues on Intel cpu!

I applied the kvm patch to my 5.10.21-200.fc33.x86_64 (just to rebuild only the kvm module) and the gdb-script patch to the source of my guest kernel (v5.12-rc2) and I can set a breakpoint to a symbol of a module not yet loaded, then when I load the module the execution stop on it.

Thank you very much for that!

Feel free to add my Tested-by tag when you send them.

Thanks,
Stefano




reply via email to

[Prev in Thread] Current Thread [Next in Thread]