qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] [PATCH v6 00/25] Fixing record/replay and adding revers


From: Artem Pisarenko
Subject: Re: [Qemu-devel] [PATCH v6 00/25] Fixing record/replay and adding reverse debugging
Date: Tue, 9 Oct 2018 18:59:01 +0600

It wasn't so easy to apply this patch due to problems in compilation of
version you pointed to, and due to content distortions introduced by mail
archive, but I got it worked finally :)

Applying this patch finally made all my tests succeed... almost :)

Now qemu may hang in random moment of emulation, but not hard. Symptoms
looks like I've already reported here:
https://bugs.launchpad.net/qemu/+bug/1790460 . So, this isn't
record/replay-specific. Although, without rr= option I wasn't able cause
this issue to reveal itself, but it doesn't make much sense due to
instability of issue's nature and its hard reproducibility.

Commit I tested against (with patches
applied): 53a19a9a5f9811a911e9b69ef36afb0d66b5d85c .


вт, 9 окт. 2018 г. в 17:26, Pavel Dovgalyuk <address@hidden>:

> Maybe this will help?
>
>
>
> https://www.mail-archive.com/address@hidden/msg560780.html
>
>
>
> Pavel Dovgalyuk
>
>
>
> *From:* Artem Pisarenko [mailto:address@hidden
> *Sent:* Tuesday, October 09, 2018 2:24 PM
> *To:* Pavel Dovgalyuk
>
>
> *Cc:* address@hidden; address@hidden
> *Subject:* Re: [Qemu-devel] [PATCH v6 00/25] Fixing record/replay and
> adding reverse debugging
>
>
>
> (Since all previous patches are already merged to master, I'm running
> tests against latest (almost) version from master branch. Following results
> are based on master commit dafd95053611aa14dda40266857608d12ddce658 .)
>
>
>
> Applying this patch made Tests 1 and 2 succeed (at least I wasn't able to
> acheive failures with several attempts).
>
> Also I've tried few tests without sleep=off and/or rtc base options. All
> of them succeed too, except one case - removing sleep=off (regardless of
> -rtc option values or its presence at all) causes qemu to hang hard in
> recording mode at very startup. Process needs to be killed.
>
>
>
> Some info from debugger:
>
>     qemu-system-x86_64 [13231] [cores: 2,4,5,7]
>
>           Thread #1 [qemu-system-x86] 13231 [core: 2] (Suspended :
> Container)
>
>                       __lll_lock_wait() at lowlevellock.S:135
> 0x7f00b116626d
>
>                       __GI___pthread_mutex_lock() at
> pthread_mutex_lock.c:80 0x7f00b115fdbd
>
>                       qemu_mutex_lock_impl() at qemu-thread-posix.c:66
> 0x947ac4
>
>                       replay_mutex_lock() at replay-internal.c:206
> 0x7f3dea
>
>                       os_host_main_loop_wait() at main-loop.c:235
> 0x94335e
>
>                       main_loop_wait() at main-loop.c:497 0x943429
>
>                       main_loop() at vl.c:1,853 0x5be70f
>
>                       main() at vl.c:4,575 0x5c56e0
>
>           Thread #2 [qemu-system-x86] 13282 [core: 4] (Suspended :
> Container)
>
>           Thread #3 [qemu-system-x86] 13283 [core: 5] (Suspended :
> Container)
>
>           Thread #4 [qemu-system-x86] 13284 [core: 7] (Suspended : Step)
>
>                       cpu_get_icount_raw() at cpus.c:301 0x45a0a0
>
>                       replay_get_current_step() at replay.c:67 0x7f2f14
>
>                       replay_save_instructions() at replay-internal.c:225
> 0x7f3ea0
>
>                       replay_save_clock() at replay-time.c:24 0x7f483d
>
>                       icount_warp_rt() at cpus.c:512 0x45a745
>
>                       qemu_account_warp_timer() at cpus.c:690
> 0x45ad55
>
>                       qemu_tcg_rr_cpu_thread_fn() at cpus.c:1,498
> 0x45c554
>
>                       qemu_thread_start() at qemu-thread-posix.c:504
> 0x9485cf
>
>                       start_thread() at pthread_create.c:333
> 0x7f00b115d6ba
>
>                       clone() at clone.S:109 0x7f00b0e9341d
>
>     gdb (7.11.1)
>
>
>
> Threads #2,3 are just waiting in poll or similar. Nothing extraordinary.
>
>
>
> Thread #4 cycles inside do {} while() loop of cpu_get_icount_raw()
> function:
>
>     do {
>
>         start = seqlock_read_begin(&timers_state.vm_clock_seqlock);
>
>         icount = cpu_get_icount_raw_locked();
>
>     } while (seqlock_read_retry(&timers_state.vm_clock_seqlock, start));
>
>
>
> Value of timers_state.vm_clock_seqlock.sequence is always 3.
>
>
>
> вт, 9 окт. 2018 г. в 15:04, Pavel Dovgalyuk <address@hidden>:
>
> Please try the following patch.
>
> There was a problem with rtc option in record/replay mode.
>
>
>
> diff --git a/vl.c b/vl.c
>
> index 40d5d0f..afe1c20 100644
>
> --- a/vl.c
>
> +++ b/vl.c
>
> @@ -2885,6 +2885,7 @@ int main(int argc, char **argv, char **envp)
>
>      DisplayState *ds;
>
>      QemuOpts *opts, *machine_opts;
>
>      QemuOpts *icount_opts = NULL, *accel_opts = NULL;
>
> +    QemuOpts *rtc_opts = NULL;
>
>      QemuOptsList *olist;
>
>      int optind;
>
>      const char *optarg;
>
> @@ -3691,12 +3692,11 @@ int main(int argc, char **argv, char **envp)
>
>                  warn_report("This option is ignored and will be removed
> soon");
>
>                  break;
>
>              case QEMU_OPTION_rtc:
>
> -                opts = qemu_opts_parse_noisily(qemu_find_opts("rtc"),
> optarg,
>
> -                                               false);
>
> -                if (!opts) {
>
> +                rtc_opts = qemu_opts_parse_noisily(qemu_find_opts("rtc"),
>
> +                                                   optarg, false);
>
> +                if (!rtc_opts) {
>
>                      exit(1);
>
>                  }
>
> -                configure_rtc(opts);
>
>                  break;
>
>              case QEMU_OPTION_tb_size:
>
> #ifndef CONFIG_TCG
>
> @@ -3907,6 +3907,9 @@ int main(int argc, char **argv, char **envp)
>
>      loc_set_none();
>
>      replay_configure(icount_opts);
>
> +    if (rtc_opts) {
>
> +        configure_rtc(rtc_opts);
>
> +    }
>
>      if (incoming && !preconfig_exit_requested) {
>
>          error_report("'preconfig' and 'incoming' options are "
>
>
>
> Pavel Dovgalyuk
>
>
>
> *From:* Artem Pisarenko [mailto:address@hidden
> *Sent:* Thursday, October 04, 2018 4:16 PM
> *To:* dovgaluk
> *Cc:* address@hidden; address@hidden
> *Subject:* Re: [Qemu-devel] [PATCH v6 00/25] Fixing record/replay and
> adding reverse debugging
>
>
>
> No, it didn't changed test results, at least for
> https://github.com/ispras/qemu/tree/rr-180911 . Even step values it
> stucks on are same for most runs.
>
> Playing with master and my own branch gives different results for tests
> without sleep=off and -rtc base. It seems that patch you mentioned didn't
> changed them very much.
>
> The only thing can be said for sure, is that this patch does not fix
> issues completely. But MAY fix them partially or in some other specific
> cases...
>
>
>
> ср, 3 окт. 2018 г. в 12:47, dovgaluk <address@hidden>:
>
> Can you try applying this patch?
> https://www.mail-archive.com/address@hidden/msg563798.html
>
> I also encountered the problems with x86_64 replaying and found the
> misprint in
> the code which was fixed later, than sending the series to the mailing
> list.
>
> Pavel Dovgalyuk
>
>
> Artem Pisarenko писал 2018-10-02 10:02:
> > I've added "-monitor stdio" option to command line of Test 1 and
> > repeated entering command during execution:
> >
> >   QEMU 3.0.50 monitor - type 'help' for more information
> >   (qemu) info replay
> >   Replaying execution 'icount_rr_capture.bin': current step =
> > 311736195
> >   (qemu) info replay
> >   Replaying execution 'icount_rr_capture.bin': current step =
> > 318198367
> >   (qemu) info replay
> >   Replaying execution 'icount_rr_capture.bin': current step =
> > 324737211
> >   (qemu) info replay
> >   Replaying execution 'icount_rr_capture.bin': current step =
> > 329890795
> >   (qemu) info replay
> >   Replaying execution 'icount_rr_capture.bin': current step =
> > 607069789
> >   (qemu) info replay
> >   Replaying execution 'icount_rr_capture.bin': current step =
> > 607069789
> >   (qemu) info replay
> >   Replaying execution 'icount_rr_capture.bin': current step =
> > 607069789
> >   ...
> >
> > Some notes on value of step it stucks on:
> > - mostly it's same (even across different record-replay pairs);
> > - stressing host during replay may cause it to change even for same
> > record-replay pair (i.e. different replay executions for same file
> > recorded).
> >
> > This specific case seems to be stable to reproduce.
> >
> > вт, 2 окт. 2018 г. в 0:22, Artem Pisarenko
> > <address@hidden>:
> >
> >> I've posted bug report with extended tests (incl. case without
> >> sleep=off). You may find guest image (kernel) in bug description.
> >> https://bugs.launchpad.net/qemu/+bug/1795369 [1]
> >>
> >> The most annoying thing is that some issues are almost not
> >> reproducible. There are definitely race conditions somewhere in qemu
> >> code. Running 'stress-ng' utility with CPU and I/O stressors in
> >> parallel with qemu execution greatly minimizes amount of attempts
> >> when I'm trying to trigger some of issues I encounter.
> >>
> >> I'll try 'info monitor' command tomorrow, but no guarantees that
> >> I'll be able to reproduce issue again.
> >>
> >> Speaking about '-nographic' and SDL... I've noted that UI greatly
> >> minimizes possibility of hanging (but not avoids it completely) when
> >> using icount in general, so this effect isn't rr-specific. I've
> >> already reported this bug too.
> >>
> >> пн, 1 окт. 2018 г., 20:14 dovgaluk <address@hidden>:
> >>
> >>> Artem Pisarenko писал 2018-09-30 14:01:
> >>>> Feature still broken :(
> >>>
> >>> Thanks for testing.
> >>>
> >>>>
> >>>> Brief description of my tests.
> >>>>
> >>>> Guest image is Linux, which just powers off after kernel boots
> >>>> (instead of proceeding to user-space /init or /sbin/init).
> >>>> Base cmdline:
> >>>> qemu-system-x86_64 -nodefaults -machine pc,accel=tcg -m 2048
> >>> -cpu
> >>>> qemu64 -rtc clock=vm,base=2000-01-01T00:00:00 -kernel bzImage
> >>> -initrd
> >>>> rootfs -append 'nokaslr console=ttyS0 rdinit=/init_poweroff'
> >>>> -nographic -serial SERIAL_VALUE -icount
> >>>> 1,sleep=off,rr=RR_VALUE,rrfile=icount_rr_capture.bin
> >>>
> >>> I've never tried it with sleep=off. Can you remove it and try
> >>> again?
> >>>
> >>> We also seen a problem with '-nographic'. When we remove this
> >>> option and
> >>> QEMU runs with SDL
> >>> window, everything is ok. There is some problem with main loop
> >>> which may
> >>> sleep when there
> >>> is no GUI to update, or something like that. We couldn't fix it
> >>> yet.
> >>>
> >>>>
> >>>> Test 1. When SERIAL_VALUE=none
> >>>> Running with RR_VALUE=record completes successfully.
> >>>> Running with RR_VALUE=replay doesn't completes. qemu process
> >>> just
> >>>> eating ~100% cpu and memory usage doesn't grow after some
> >>> moment. I
> >>>> don't see what happens because of problem no.2 (see below).
> >>>
> >>> Try 'info replay' monitor command. Does instruction counter
> >>> increases?
> >>>
> >>>>
> >>>> Test 2. When SERIAL_VALUE=stdio
> >>>> Running with RR_VALUE=record completes successfully.
> >>>>
> >>>> Running with RR_VALUE=replay caues exit with error:
> >>>>
> >>>> "qemu-system-x86_64: Missing character write event in the replay
> >>> log"
> >>>>
> >>>> These problems are same with qemu 2.12 (both vanilla and with
> >>> previous
> >>>> versions of these patches applied). Furthemore, I consider whole
> >>>> icount mode broken and determinism isn't achievable.
> >>>> The irony is that I actually don't need record/replay feature.
> >>> I've
> >>>> tried to use it only as instrument to debug failing determinism
> >>> in
> >>>> qemu code. But since replay/record feature itself relies on
> >>>> determinism, which is broken, it's no wonder why it fails also
> >>> (I just
> >>>> hoped to bypass it).
> >>>>
> >>>> Contact me if you need more details. I just tired a lot trying
> >>> to get
> >>>> all these things working... Hope is leaving me...
> >>>
> >>> Can you share the kernel in case the icount still broken?
> >>>
> >>> Pavel Dovgalyuk
> >> --
> >>
> >> С уважением,
> >> Артем Писаренко
> >  --
> >
> > С уважением,
> >   Артем Писаренко
> >
> > Links:
> > ------
> > [1] https://bugs.launchpad.net/qemu/+bug/1795369
>
> --
>
> С уважением,
>   Артем Писаренко
>
> --
>
> С уважением,
>   Артем Писаренко
>
-- 

С уважением,
  Артем Писаренко


reply via email to

[Prev in Thread] Current Thread [Next in Thread]