qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] [PATCH 3/8] quorum: Implement .bdrv_co_readv/writev


From: Kevin Wolf
Subject: Re: [Qemu-devel] [PATCH 3/8] quorum: Implement .bdrv_co_readv/writev
Date: Tue, 22 Nov 2016 12:32:51 +0100
User-agent: Mutt/1.5.21 (2010-09-15)

Am 21.11.2016 um 18:58 hat Eric Blake geschrieben:
> On 11/21/2016 11:31 AM, Kevin Wolf wrote:
> > This converts the quorum block driver from implementing callback-based
> > interfaces for read/write to coroutine-based ones. This is the first
> > step that will allow us further simplification of the code.
> > 
> > Signed-off-by: Kevin Wolf <address@hidden>
> > ---
> >  block/quorum.c | 192 
> > ++++++++++++++++++++++++++++++++++-----------------------
> >  1 file changed, 115 insertions(+), 77 deletions(-)
> > 
> 
> > @@ -174,14 +162,14 @@ static bool quorum_64bits_compare(QuorumVoteValue *a, 
> > QuorumVoteValue *b)
> >  static QuorumAIOCB *quorum_aio_get(BlockDriverState *bs,
> >                                     QEMUIOVector *qiov,
> >                                     uint64_t sector_num,
> > -                                   int nb_sectors,
> > -                                   BlockCompletionFunc *cb,
> > -                                   void *opaque)
> > +                                   int nb_sectors)
> >  {
> >      BDRVQuorumState *s = bs->opaque;
> > -    QuorumAIOCB *acb = qemu_aio_get(&quorum_aiocb_info, bs, cb, opaque);
> > +    QuorumAIOCB *acb = g_new(QuorumAIOCB, 1);
> 
> Worth using g_new0() here...
> 
> >      int i;
> >  
> > +    acb->co = qemu_coroutine_self();
> > +    acb->bs = bs;
> >      acb->sector_num = sector_num;
> >      acb->nb_sectors = nb_sectors;
> >      acb->qiov = qiov;
> > @@ -191,6 +179,7 @@ static QuorumAIOCB *quorum_aio_get(BlockDriverState *bs,
> >      acb->rewrite_count = 0;
> >      acb->votes.compare = quorum_sha256_compare;
> >      QLIST_INIT(&acb->votes.vote_list);
> > +    acb->has_completed = false;
> >      acb->is_read = false;
> >      acb->vote_ret = 0;
> 
> ...to eliminate 0-assignments here? Not a show-stopper to leave it
> as-is, though.

Not in this patch anyway. I could add a cleanup patch at the end of
series or as a follow-up, though. As you probably know by now, my style
of writing this in new code would use a compound literal:

    QuorumAIOCB *acb = g_new(QuorumAIOCB, 1);
    *acb = (QuorumAIOCB) {
        ...
    };

> > -static BlockAIOCB *read_fifo_child(QuorumAIOCB *acb);
> > +static int read_fifo_child(QuorumAIOCB *acb);
> >  
> >  static void quorum_copy_qiov(QEMUIOVector *dest, QEMUIOVector *source)
> >  {
> > @@ -272,14 +261,14 @@ static void quorum_report_bad_acb(QuorumChildRequest 
> > *sacb, int ret)
> >      QuorumAIOCB *acb = sacb->parent;
> >      QuorumOpType type = acb->is_read ? QUORUM_OP_TYPE_READ : 
> > QUORUM_OP_TYPE_WRITE;
> >      quorum_report_bad(type, acb->sector_num, acb->nb_sectors,
> > -                      sacb->aiocb->bs->node_name, ret);
> > +                      sacb->bs->node_name, ret);
> >  }
> >  
> > -static void quorum_fifo_aio_cb(void *opaque, int ret)
> > +static int quorum_fifo_aio_cb(void *opaque, int ret)
> >  {
> >      QuorumChildRequest *sacb = opaque;
> >      QuorumAIOCB *acb = sacb->parent;
> > -    BDRVQuorumState *s = acb->common.bs->opaque;
> > +    BDRVQuorumState *s = acb->bs->opaque;
> >  
> >      assert(acb->is_read && s->read_pattern == QUORUM_READ_PATTERN_FIFO);
> >  
> > @@ -288,8 +277,7 @@ static void quorum_fifo_aio_cb(void *opaque, int ret)
> >  
> >          /* We try to read next child in FIFO order if we fail to read */
> >          if (acb->children_read < s->num_children) {
> > -            read_fifo_child(acb);
> > -            return;
> > +            return read_fifo_child(acb);
> >          }
> 
> Question unrelated to this patch: in FIFO mode, are we doing work
> sequentially or in parallel?  That is, does the quorum code kick off all
> children simultaneously, then wait until the first child answers with
> success (and abort all remaining children) or failure (at which point
> moving to the second child may already have an answer)?  Or does it only
> kick of the first child, wait for a response, and not start the second
> child until after the first child fails?

It's the latter. This is quite easy to see in the new model (at the
end of this patch series) because in FIFO mode, reads don't spawn
coroutines, but just have a loop of bdrv_co_preadv() calls.

> I guess one way has more
> potentially wasted work (and a stress test of our ability to cancel work
> on secondary children), while the other has higher latencies, so maybe
> it is something that a future quorum patch may want to make configurable?

Our ability to cancel work barely exists, so I'm not too sure whether
the other way would really be worth implementing.

> >  
> > -static BlockAIOCB *read_fifo_child(QuorumAIOCB *acb)
> > +static int read_fifo_child(QuorumAIOCB *acb)
> >  {
> > -    BDRVQuorumState *s = acb->common.bs->opaque;
> > +    BDRVQuorumState *s = acb->bs->opaque;
> >      int n = acb->children_read++;
> > +    int ret;
> >  
> > -    acb->qcrs[n].aiocb = bdrv_aio_readv(s->children[n], acb->sector_num,
> > -                                        acb->qiov, acb->nb_sectors,
> > -                                        quorum_fifo_aio_cb, &acb->qcrs[n]);
> > +    acb->qcrs[n].bs = s->children[n]->bs;
> > +    ret = bdrv_co_preadv(s->children[n], acb->sector_num * 
> > BDRV_SECTOR_SIZE,
> > +                         acb->nb_sectors * BDRV_SECTOR_SIZE, acb->qiov, 0);
> > +    ret = quorum_fifo_aio_cb(&acb->qcrs[n], ret);
> 
> somewhat answering myself - it looks like the current fifo approach is
> high-latency rather than parallel, in that at most one child is being
> run at a time.

Yes, you can see it in this patch already, even if it's even clearer at
the end of the series.

Kevin

Attachment: pgprstvOcaZsU.pgp
Description: PGP signature


reply via email to

[Prev in Thread] Current Thread [Next in Thread]