qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] [PATCH v11 1/2] virtio-crypto: Add virtio crypto device


From: gong lei
Subject: Re: [Qemu-devel] [PATCH v11 1/2] virtio-crypto: Add virtio crypto device specification
Date: Tue, 4 Oct 2016 12:24:02 +0000

On 2016/10/4 17:05, Stefan Hajnoczi wrote:
> On Tue, Oct 04, 2016 at 02:53:25AM +0000, gong lei wrote:
>> Hi Stefan,
>>
>> Thanks for your comments.
>>
>>> On Wed, Sep 28, 2016 at 05:08:24PM +0800, Gonglei wrote:
>>>> /+For scatter/gather list support, a buffer can be represented by /
>>>> /virtio_crypto_iovec structure./
>>>> /+/
>>>> /+The structure is defined as follows:/
>>>> /+/
>>>> /+\begin{lstlisting}/
>>>> /+struct virtio_crypto_iovec {/
>>>> /+ /* Guest physical address *//
>>>> /+ le64 addr;/
>>>> /+ /* Length of guest physical address *//
>>>> /+ le32 len;/
>>>> /+/* This marks a buffer as continuing via the next field *//
>>>> /+#define VIRTIO_CRYPTO_IOVEC_F_NEXT 1/
>>>> /+ /* The flags as indicated above VIRTIO_CRYPTO_IOVEC_F_*. *//
>>>> /+ le32 flags;/
>>>> /+ /* Pointer to next struct virtio_crypto_iovec if flags & NEXT *//
>>>> /+ le64 next_iovec;/
>>>> /+};/
>>>> /+\end{lstlisting}/
>>> The vring already provides scatter-gather I/O.  It is usually not
>>> necessary to define scatter-gather I/O at the device level. Addresses
>>> can be translated by the virtio transport (PCI, MMIO, CCW).  For example
>>> PCI bus addresses with IO-MMU.  Therefore it's messy to use guest
>>> physical addresses in the device specification.
>>>
>>>> /+Each data request uses virtio_crypto_hash_data_req structure to
>>>> store /
>>>> /information/
>>>> /+used to run the HASH operations. The request only occupies one entry/
>>>> /+in the Vring Descriptor Table in the virtio crypto device's dataq,
>>>> which /
>>>> /improves/
>>>> /+the throughput of data transmitted for the HASH service, so that the
>>> virtio /
>>>> /crypto/
>>>> /+device can be better accelerated./
>>> Indirect vrings also only require one entry in the descriptor table.  I
>>> don't understand why you are reinventing scatter-gather I/O.
>>>
>>> Am I missing something?
>> Yes, I knew indirect vring. But for virtio-crypto device' request, each
>> crypto request include
>> many buffers. Take algorithm chain' request as an examle, the driver
>> need to transmit source
>> data, dstination data, initializaion vector, additional authentication
>> data, digest result
>> data etc. to the device. In those data, the source data and destionation
>> data etc. may be composed
>> by scatter-gather I/O.  That's the background.
>>
>> In virtio-crypto spec, we use a structure to store all those contents so
>> that we can put all those data
>> into one entry in the Descriptor Table, otherwise the effect of
>> acceleration is not good. We
>> thought other methods to inprove the performance as well, such as
>> increasing the virng size
>> of dataq, but the maxinum is 1024 at present, and it maybe influence the
>> latency if the vring
>> size is too big.
>>
>> For the indirect ving in existing Virtio spec, is only used for one
>> buffer, which can't cover
>> our situation.
> Indirect vring uses 1 descriptor in the vring descriptor table, but that
> descriptor points to a separate table that can have many descriptors.
> You are not limited to just 1 scatter-gather element.
>
> Also, I've noticed that the structs in the spec are mixing
> device-readable and device-writable fields in structs.  This is not
> allowed since virtio requires that all device-readable data comes before
> all all device-writable data.
The spec 2.4.5 says something related, but not forced.

A device MUST NOT write to a device-readable buffer, and a device SHOULD 
NOT read a device-writable
buffer (it MAY do so for debugging or diagnostic purposes).
> I think you'll be able to use indirect vring with the same performance
> as customer scatter-gather I/O once you think through the layout of
> device-readable and device-writable data.
>
> In order to lay out requests correctly the device-readable headers
> struct must contain the length of all variable-sized fields.
>
> For example, the header would have an iv_len field and the device will
> expect that number of bytes after the header struct:
>
> Device-readable:
> [header]
> [iv]
> [input data]
>
> Device-writeable:
> [output data]
> [return/error values]
>
> One more thing to keep in mind is that according to the VIRTIO
> specification the device must not rely on the framing (or boundaries of
> scatter-gather elements).  In the example above it means that the [iv]
> could either be a single vring_desc or it could be multiple descs, or it
> could share vring_descs with [header] and/or [input data].
>
> In other words, the device emulation code must not assume elem->in[0] is
> [header], elem->in[1] is [iv], and elem->in[2] is [input data].  It has
> to process without giving special meaning to scatter-gather buffer size.
>
> Stefan

Yes, I knew that and I did that last year, but I didn't get the good 
performance unfortunately.
Because we need to handle multiple buffers respectively on one request, 
which add the overhead
of software layer. It's obviously for vhost-user cryptodev backend in DPDK.

Maybe I didn't express my meaning clearly. Let's assume that a have 6 
buffers
in one crypto request:

BufA: output data, a single buffer
BufB: output data, a scatter-gather I/O
BufC: output data, a single buffer
BufD: in data, a scatter-gather I/O
BufE: in data, a single buffer
BufF: in data, a single buffer

We need following steps to translate the request to the device:

1. add_outbuf() for BufA, which will occupy one entry in the descryptor 
table
2.add_outbuf() for BufB, which will occupy one entry in the descryptor 
table if using
    indirect vring, or multiple entries if not using indirecting vring
3. add_outbuf() for BufC, which will occupy one entry in the descryptor 
table
4. add_inbuf() for BufD, which will occupy one entry in the descryptor 
table if using
    indirect vring, or multiple entries if not using indirecting vring
5. add_inbuf() for BufE, which will occupy one entry in the descryptor table
6. add_outbuf() for BufF, which will occupy one entry in the descryptor 
table

It means that one request will occupy 6 entries at least. Am I right?

-- 
Regards,
-Gonglei




reply via email to

[Prev in Thread] Current Thread [Next in Thread]