qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] [PATCH v6 2/2] block: Support GlusterFS as a QEMU block


From: Kevin Wolf
Subject: Re: [Qemu-devel] [PATCH v6 2/2] block: Support GlusterFS as a QEMU block backend
Date: Tue, 14 Aug 2012 10:29:26 +0200
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:13.0) Gecko/20120605 Thunderbird/13.0

Am 14.08.2012 06:38, schrieb Bharata B Rao:
> Kevin, Thanks for your review. I will address all of your comments
> in the next iteration, but have a few questions/comments on the others...
> 
> On Mon, Aug 13, 2012 at 02:50:29PM +0200, Kevin Wolf wrote:
>>> +static int parse_server(GlusterURI *uri, char *server)
>>> +{
>>> +    int ret = -EINVAL;
>>> +    char *token, *saveptr;
>>> +    char *p, *q = server;
>>> +
>>> +    p = strchr(server, '[');
>>> +    if (p) {
>>> +        /* [ipv6] */
>>> +        if (p != server) {
>>> +            /* [ not in the beginning */
>>> +            goto out;
>>> +        }
>>> +        q++;
>>> +        p = strrchr(p, ']');
>>> +        if (!p) {
>>> +            /* No matching ] */
>>> +            goto out;
>>> +        }
>>> +        *p++ = '\0';
>>> +        uri->server = g_strdup(q);
>>> +
>>> +        if (*p) {
>>> +            if (*p != ':') {
>>> +                /* [ipv6] followed by something other than : */
>>> +                goto out;
>>> +            }
>>> +            uri->port = strtoul(++p, NULL, 0);
>>> +            if (uri->port < 0) {
>>> +                goto out;
>>> +            }
>>
>> This accepts inputs where the colon isn't followed by any number.
> 
> Yes, and that will result in port=0, which is default. So this is to
> cater for cases like gluster://[1:2:3:4:5]:/volname/image

So you consider this a valid URL? I would have expected it to invalid.
But let me see, there must be some official definition of an URL...

Alright, so RFC 2234 says that having no digits after the colon is
valid. It also says that you shouldn't generate such URLs. And it
doesn't say what it means when it's there... Common interpretation seems
to be that it's treated as if it wasn't specified, i.e. the default port
for the schema is used.

So if 0 is the default port for glusterfs, your code looks okay. But it
doesn't seem to be a very useful default port number.

> In any case, let me see if I can get rid of this altogether and reuse
> qemu-sockets.c:inet_parse().

Cool, thanks!

>>> +        if (token) {
>>> +            uri->port = strtoul(token, NULL, 0);
>>> +            if (uri->port < 0) {
>>> +                goto out;
>>> +            }
>>> +        } else {
>>> +            uri->port = 0;
>>> +        }
>>
>> The port parsing code is duplicated in IPv4 and IPv6, even though it's
>> really the same.
> 
> Being such a small piece of code, I didn't think it deserves to be made a
> function on its own and re-used. But if you really want it that way, I can do.

It's not worth a separate function, but it can just be code outside the
if statement.

>>> +static struct glfs *qemu_gluster_init(GlusterURI *uri, const char 
>>> *filename)
>>> +{
>>> +    struct glfs *glfs = NULL;
>>> +    int ret;
>>> +
>>> +    ret = qemu_gluster_parseuri(uri, filename);
>>> +    if (ret < 0) {
>>> +        error_report("Usage: file=gluster://server[:port]/volname/image"
>>> +            "[?transport=socket]");
>>
>> Is 'socket' really the only valid transport and will it stay like this
>> without changes to qemu?
> 
> There are others like 'unix' and 'rdma'. I will fix this error message to
> reflect that.
> 
> However QEMU needn't change for such transport types because I am not
> interpreting the transport type in QEMU but instead passing it on directly
> to GlusterFS.

Maybe then just specify "[?transport=...]" instead of giving a specific
option value?

>>> +
>>> +static void qemu_gluster_aio_event_reader(void *opaque)
>>> +{
>>> +    BDRVGlusterState *s = opaque;
>>> +    GlusterAIOCB *event_acb;
>>> +    int event_reader_pos = 0;
>>> +    ssize_t ret;
>>> +
>>> +    do {
>>> +        char *p = (char *)&event_acb;
>>> +
>>> +        ret = read(s->fds[GLUSTER_FD_READ], p + event_reader_pos,
>>> +                   sizeof(event_acb) - event_reader_pos);
>>
>> So you're reading in a pointer address from a pipe? This is fun.
>>
>>> +        if (ret > 0) {
>>> +            event_reader_pos += ret;
>>> +            if (event_reader_pos == sizeof(event_acb)) {
>>> +                event_reader_pos = 0;
>>> +                qemu_gluster_complete_aio(event_acb);
>>> +                s->qemu_aio_count--;
>>> +            }
>>> +        }
>>> +    } while (ret < 0 && errno == EINTR);
>>
>> In case of a short read the read data is just discarded? Maybe
>> event_reader_pos was supposed to static?
> 
> In earlier versions event_reader_pos was part of BDRVGlusterState and I made
> it local in subsequent versions and that is causing this problem. Will fix.

Ah, yes, it needs to be in BDRVGlusterState, static doesn't work when
you have multiple gluster drives.

>>> +
>>> +static void qemu_gluster_aio_cancel(BlockDriverAIOCB *blockacb)
>>> +{
>>> +    GlusterAIOCB *acb = (GlusterAIOCB *)blockacb;
>>> +
>>> +    acb->common.cb(acb->common.opaque, -ECANCELED);
>>> +    acb->canceled = true;
>>> +}
>>
>> After having called acb->common.cb you must not write any longer to the
>> memory pointed at by the qiov. Either you can really cancel the request,
>> or you need to wait until it completes.
> 
> I don't think I can cancel the request that has already been sent to
> gluster server. block/qed.c introduces acb->finished and waits on it in
> qed_aio_canel. Do you suggest I follow that model ?

This is a possible solution, yes.

>>> +
>>> +static AIOPool gluster_aio_pool = {
>>> +    .aiocb_size = sizeof(GlusterAIOCB),
>>> +    .cancel = qemu_gluster_aio_cancel,
>>> +};
>>> +
>>> +static int qemu_gluster_send_pipe(BDRVGlusterState *s, GlusterAIOCB *acb)
>>> +{
>>> +    int ret = 0;
>>> +    while (1) {
>>> +        fd_set wfd;
>>> +        int fd = s->fds[GLUSTER_FD_WRITE];
>>> +
>>> +        ret = write(fd, (void *)&acb, sizeof(acb));
>>> +        if (ret >= 0) {
>>> +            break;
>>> +        }
>>> +        if (errno == EINTR) {
>>> +            continue;
>>> +        }
>>> +        if (errno != EAGAIN) {
>>> +            break;
>>> +        }
>>> +
>>> +        FD_ZERO(&wfd);
>>> +        FD_SET(fd, &wfd);
>>> +        do {
>>> +            ret = select(fd + 1, NULL, &wfd, NULL, NULL);
>>> +        } while (ret < 0 && errno == EINTR);
>>
>> What's the idea behind this? While we're hanging in this loop noone will
>> read anything from the pipe, so it's unlikely that it magically becomes
>> ready.
> 
> I write to the pipe and wait for the reader to read it. The reader
> (qemu_gluster_aio_event_reader) is already waiting on the other end of the
> pipe.

qemu_gluster_aio_even_reader() isn't called while we're looping here. It
will only be called from the main loop, after this function has returned.

>>
>>> +    }
>>> +    return ret;
>>> +}
>>> +
>>> +static void gluster_finish_aiocb(struct glfs_fd *fd, ssize_t ret, void 
>>> *arg)
>>> +{
>>> +    GlusterAIOCB *acb = (GlusterAIOCB *)arg;
>>> +    BDRVGlusterState *s = acb->common.bs->opaque;
>>> +
>>> +    acb->ret = ret;
>>> +    if (qemu_gluster_send_pipe(s, acb) < 0) {
>>> +        /*
>>> +         * Gluster AIO callback thread failed to notify the waiting
>>> +         * QEMU thread about IO completion. Nothing much can be done
>>> +         * here but to abruptly abort.
>>> +         *
>>> +         * FIXME: Check if the read side of the fd handler can somehow
>>> +         * be notified of this failure paving the way for a graceful exit.
>>> +         */
>>> +        error_report("Gluster failed to notify QEMU about IO completion");
>>> +        abort();
>>
>> In the extreme case you may choose to make this disk inaccessible
>> (something like bs->drv = NULL), but abort() kills the whole VM and
>> should only be called when there is a bug.
> 
> There have been concerns raised about this earlier too. I settled for this
> since I couldn't see a better way out and I could see the precedence
> for this in posix-aio-compat.c
> 
> So I could just do the necessary cleanup, set bs->drv to NULL and return from
> here ? But how do I wake up the QEMU thread that is waiting on the read side
> of the pipe ? W/o that, the QEMU thread that waits on the read side of the
> pipe is still hung.

There is no other thread. But you're right, you should probably
unregister the aio_fd_handler and any other pending callbacks.

Kevin



reply via email to

[Prev in Thread] Current Thread [Next in Thread]