[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Qemu-devel] Re: [PATCH v5 4/5] qed: Read/write support
From: |
Stefan Hajnoczi |
Subject: |
Re: [Qemu-devel] Re: [PATCH v5 4/5] qed: Read/write support |
Date: |
Fri, 3 Dec 2010 20:29:47 +0000 |
On Fri, Dec 3, 2010 at 4:06 PM, Kevin Wolf <address@hidden> wrote:
> Am 24.11.2010 12:11, schrieb Stefan Hajnoczi:
>> +static void qed_aio_cancel(BlockDriverAIOCB *blockacb)
>> +{
>> + QEDAIOCB *acb = (QEDAIOCB *)blockacb;
>> + bool finished = false;
>> +
>> + /* Wait for the request to finish */
>> + acb->finished = &finished;
>> + while (!finished) {
>> + qemu_aio_wait();
>> + }
>> +}
>
> Hm, you don't even try to cancel? I wonder how useful the individual
> bdrv_aio_cancel implementations actually are when nobody implements
> cancellation. It seems to be pretty hard to do it right.
>
> Maybe we should consider implementing a default bdrv_aio_cancel
> implementation in block.c that waits for completion?
Cancellation is difficult to implement correctly. Even
qemu_aio_wait() is bad because IDE/scsi will block until the request
completes. We really should have something better.
Also from what I can tell only USB gadget fs implements it in Linux
native aio! In posix-aio-compat.c we can only cancel requests if they
haven't been issued yet.
So in summary, cancel is a nasty interface :).
>> +/**
>> + * Construct an iovec array for a given length
>> + *
>> + * @acb: I/O request
>> + * @len: Maximum number of bytes
>> + *
>> + * This function can be called several times to build subset iovec arrays of
>> + * acb->qiov. For example:
>> + *
>> + * acb->qiov->iov[] = {{0x100000, 1024},
>> + * {0x200000, 1024}}
>> + *
>> + * qed_acb_build_qiov(acb, 512) =>
>> + * {{0x100000, 512}}
>> + *
>> + * qed_acb_build_qiov(acb, 1024) =>
>> + * {{0x100200, 512},
>> + * {0x200000, 512}}
>> + *
>> + * qed_acb_build_qiov(acb, 512) =>
>> + * {{0x200200, 512}}
>> + */
>> +static void qed_acb_build_qiov(QEDAIOCB *acb, size_t len)
>> +{
>> + struct iovec *iov_end = &acb->qiov->iov[acb->qiov->niov];
>> + size_t iov_offset = acb->cur_iov_offset;
>> + struct iovec *iov = acb->cur_iov;
>> +
>> + while (iov != iov_end && len > 0) {
>> + size_t nbytes = MIN(iov->iov_len - iov_offset, len);
>> +
>> + qemu_iovec_add(&acb->cur_qiov, iov->iov_base + iov_offset, nbytes);
>> + iov_offset += nbytes;
>> + len -= nbytes;
>> +
>> + if (iov_offset >= iov->iov_len) {
>> + iov_offset = 0;
>> + iov++;
>> + }
>> + }
>> +
>> + /* Stash state for next time */
>> + acb->cur_iov = iov;
>> + acb->cur_iov_offset = iov_offset;
>> +}
>
> Wouldn't it be much simpler to just store the offset into acb->qiov and
> to use qemu_iovec_copy to get the subset qiov?
Good point, fixed in v6.
>> +/**
>> + * Write data to the image file
>> + */
>> +static void qed_aio_write_main(void *opaque, int ret)
>> +{
>> + QEDAIOCB *acb = opaque;
>> + BDRVQEDState *s = acb_to_s(acb);
>> + uint64_t offset = acb->cur_cluster;
>> + BlockDriverCompletionFunc *next_fn;
>> + BlockDriverAIOCB *file_acb;
>> +
>> + trace_qed_aio_write_main(s, acb, ret, offset, acb->cur_qiov.size);
>
> Why does the trace use a different offset...
>
>> +
>> + if (ret) {
>> + qed_aio_complete(acb, ret);
>> + return;
>> + }
>> +
>> + if (acb->find_cluster_ret == QED_CLUSTER_FOUND) {
>> + next_fn = qed_aio_next_io;
>> + } else {
>> + if (s->bs->backing_hd) {
>> + next_fn = qed_aio_write_flush_before_l2_update;
>> + } else {
>> + next_fn = qed_aio_write_l2_update;
>> + }
>> + }
>> +
>> + offset += qed_offset_into_cluster(s, acb->cur_pos);
>
> ...than the write uses?
>
> I missed this line at first because offset is initialized above, so I
> didn't expect that this was only half of its initialization.
Fixed in v6.
>> +/**
>> + * Populate back untouched region of new data cluster
>> + */
>> +static void qed_aio_write_postfill(void *opaque, int ret)
>> +{
>> + QEDAIOCB *acb = opaque;
>> + BDRVQEDState *s = acb_to_s(acb);
>> + uint64_t start = acb->cur_pos + acb->cur_qiov.size;
>> + uint64_t len = qed_start_of_cluster(s, start + s->header.cluster_size -
>> 1) - start;
>> + uint64_t offset = acb->cur_cluster + qed_offset_into_cluster(s,
>> acb->cur_pos) + acb->cur_qiov.size;
>
> This look like more than 80 characters.
Fixed in v6.
Stefan