qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] [PATCH v3 4/5] Enable fw_cfg DMA interface for ARM


From: Gabriel L. Somlo
Subject: Re: [Qemu-devel] [PATCH v3 4/5] Enable fw_cfg DMA interface for ARM
Date: Thu, 22 Oct 2015 17:22:16 -0400
User-agent: Mutt/1.5.24 (2015-08-30)

On Sat, 19 Sep 2015, Laszlo Ersek wrote:
> Got some good news: with those two fixups in place (register block
> size corrected, and dma_enabled set via device property), I could
> test the AAVMF / ArmVirtPkg / <insert your favorite synonym here>
> patches.
>
> On my APM Mustang, downloading a decompressed kernel (14,475,776
> bytes), a decompressed initrd (18,177,264), and a cmdline (104 bytes :)),
> in total 32,653,144 bytes, takes approx. 24 seconds with the 8-byte wide
> MMIO data register. (Yeah, it's *really* slow.)
>
> Using the DMA interface, the same takes about 52 milliseconds, and
> that still includes one progress message per 1 MB downloaded :)
>
> It's a factor of approx. 450. Not bad. Not bad. :)

So I've been catching up (after a several-week-long day-job related detour :)
with the latest developments in fw_cfg -- and the DMA stuff looks good, and
makes for a very educational read!

I was re-reading the documentation for fw_cfg_add_file_callback(),
and noticed that non-dma read operations check for the presence
of a callback (and call it if present) for *every* *single* *byte*,
even on 64-bit MMIO reads. That's also what the documentation says
(in docs/specs/fw_cfg.txt, being moved into fw_cfg.h as per
 http://lists.nongnu.org/archive/html/qemu-devel/2015-10/msg05315.html).

During DMA reads, however, the callback is only checked once before
each chunk, effectively once per DMA read operation.

Now, typical callbacks I found throughout the qemu source tend to return
immediately except for the first time they're invoked, but I wonder if
skipping over all those extra "do I have a callback, if so call it,
mostly so it can return without doing anything" per-byte operations
account in some significant part for the dramatically faster transfers?

Not sure how I'd test for that -- besides my not having anything
resembling a viable ARM setup, I'm not sure if limiting the callbacks
to only be invoked if (s->cur_offset == 0) would make sense, just as a
test ?

Either way, I'll send out a v2 of my fw_cfg function-call doc patch
to additionally say something like:

  * structure residing at key value FW_CFG_FILE_DIR, containing the
  * item name,
  * data size, and assigned selector key value.
  * Additionally, set a callback function (and argument) to be called
  * each
- * time a byte is read by the guest from this particular item.
+ * time a byte is read by the guest from this particular item, or once per
+ * each DMA guest read operation.
  * NOTE: In addition to the opaque argument set here, the callback
  * function
  * takes the current data offset as an additional argument, allowing
  * it the
  * option of only acting upon specific offset values (e.g., 0, before
  * the

Let me know what you all think...

Thanks much,
--Gabriel



reply via email to

[Prev in Thread] Current Thread [Next in Thread]