As I understand the above algorithm, we find a vfio_dma
overlapping the request and populate the bitmap for that range. Then
we go back and put_user() for each byte that we touched. We could
instead simply work on a one byte buffer as we enumerate the requested
range and do a put_user() ever time we reach the end of it and have bits
set. That would greatly simplify the above example. But I would expect
that we're a) more likely to get asked for ranges covering a single
vfio_dma
QEMU ask for single vfio_dma during each iteration.
If we restrict this ABI to cover single vfio_dma only, then it
simplifies the logic here. That was my original suggestion. Should we
think about that again?
But we currently allow unmaps that overlap multiple vfio_dmas as long
as no vfio_dma is bisected, so I think that implies that an unmap while
asking for the dirty bitmap has even further restricted semantics. I'm
also reluctant to design an ABI around what happens to be the current
QEMU implementation.
If we take your example above, ranges {0x0000,0xa000} and
{0xa000,0x10000} ({start,end}), I think you're working with the
following two bitmaps in this implementation:
00000011 11111111b
00111111b
And we need to combine those into:
11111111 11111111b
Right?
But it seems like that would be easier if the second bitmap was instead:
11111100b
Then we wouldn't need to worry about the entire bitmap being shifted by
the bit offset within the byte, which limits our fixes to the boundary
byte and allows us to use copy_to_user() directly for the bulk of the
copy. So how do we get there?
I think we start with allocating the vfio_dma bitmap to account for
this initial offset, so we calculate bitmap_base_iova as:
(iova & ~((PAGE_SIZE << 3) - 1))
We then use bitmap_base_iova in calculating which bits to set.
The user needs to follow the same rules, and maybe this adds some value
to the user providing the bitmap size rather than the kernel
calculating it. For example, if the user wanted the dirty bitmap for
the range {0xa000,0x10000} above, they'd provide at least a 1 byte
bitmap, but we'd return bit #2 set to indicate 0xa000 is dirty.
Effectively the user can ask for any iova range, but the buffer will be
filled relative to the zeroth bit of the bitmap following the above
bitmap_base_iova formula (and replacing PAGE_SIZE with the user
requested pgsize). I'm tempted to make this explicit in the user
interface (ie. only allow bitmaps starting on aligned pages), but a
user is able to map and unmap single pages and we need to support
returning a dirty bitmap with an unmap, so I don't think we can do that.