qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] [PATCH 4/6] crypto: convert xts_tweak_encdec to use xts


From: Alberto Garcia
Subject: Re: [Qemu-devel] [PATCH 4/6] crypto: convert xts_tweak_encdec to use xts_uint128 type
Date: Thu, 11 Oct 2018 14:16:24 +0200
User-agent: Notmuch/0.18.2 (http://notmuchmail.org) Emacs/24.4.1 (i586-pc-linux-gnu)

On Tue 09 Oct 2018 05:31:32 PM CEST, Daniel P. Berrangé wrote:
> On Tue, Oct 09, 2018 at 05:30:25PM +0200, Alberto Garcia wrote:
>> >> >      for (i = 0; i < lim; i++) {
>> >> > -        xts_tweak_encdec(datactx, decfunc, src, dst, (uint8_t *)&T);
>> >> > +        xts_uint128 S, D;
>> >> > +
>> >> > +        memcpy(&S, src, XTS_BLOCK_SIZE);
>> >> > +        xts_tweak_encdec(datactx, decfunc, &S, &D, &T);
>> >> > +        memcpy(dst, &D, XTS_BLOCK_SIZE);
>> >> 
>> >> Why do you need S and D?
>> >
>> > I think src & dst pointers can't be guaranteed to be aligned
>> > sufficiently for int64 operations, if we just cast from uint8t*.
>> 
>> I see. I did a quick test without the memcpy() calls and it doesn't
>> seem to have a visible effect on performance, but if it turns out
>> that it does then maybe this is worth investigating further. I
>> suspect all buffers received by this code are allocated with
>> qemu_try_blockalign() anyway, so it should be safe.
>
> The extra memcpy() calls certainly had a perf impact when I added
> them, so if we can determine that we can safely do without, that would
> be desirable.

So I was having a look at this. From the block layer everything comes
properly aligned. Then there's VirtioCrypto, which seems to allow XTS
mode but I couldn't quite tell from virtio_crypto_sym_op_helper() if all
buffers are guaranteed to be aligned.

What you could do is a runtime check (with QEMU_PTR_IS_ALIGNED()) and
decide what implementation to use.

A couple of additional thoughts:

- x86_64 (and others) allow unaligned memory accesses, and that might be
  faster than copying the buffer using memcpy(). I haven't measured it
  however.

- qcrypto_block_{encrypt,decrypt}_helper() (used for encrypted block
  I/O) use the same buffer for input and output, so maybe it's worth
  exploring if this fact allows for additional optimization if you still
  need to use memcpy().

Berto



reply via email to

[Prev in Thread] Current Thread [Next in Thread]