qemu-stable
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH v2 1/1] ide: check DMA transfer size in ide_dma_cb() to preve


From: Alexander Popov
Subject: Re: [PATCH v2 1/1] ide: check DMA transfer size in ide_dma_cb() to prevent qemu DoS from quests
Date: Sat, 30 Nov 2019 13:04:51 +0300
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:68.0) Gecko/20100101 Thunderbird/68.1.1

On 27.11.2019 01:09, Kevin Wolf wrote:
> Am 26.11.2019 um 22:24 hat Alexander Popov geschrieben:
>> Hello Kevin,
>>
>> Thanks for your review,
>>
>> On 21.11.2019 18:03, Kevin Wolf wrote:
>>> Am 14.11.2019 um 18:25 hat Alexander Popov geschrieben:
>>>> The commit a718978ed58a from July 2015 introduced the assertion which
>>>> implies that the size of successful DMA transfers handled in ide_dma_cb()
>>>> should be multiple of 512 (the size of a sector). But guest systems can
>>>> initiate DMA transfers that don't fit this requirement.
>>>>
>>>> PoC for Linux that uses SCSI_IOCTL_SEND_COMMAND to perform such an ATA
>>>> command and crash qemu:
>> ...
>>>
>>> It would be nicer to turn the reproducer into a test case for
>>> tests/ide-test.c.
>>
>> Yes, I can do that.
>>
>>>> For fixing that let's check the number of bytes prepared for the transfer
>>>> by the prepare_buf() handler. If it is not a multiple of 512 then end
>>>> the DMA transfer with an error.
>>>>
>>>> That also fixes the I/O stall in guests after a DMA transfer request
>>>> for less than the size of a sector.
>>>>
>>>> Signed-off-by: Alexander Popov <address@hidden>
>>>
>>> This patch makes ide-test fail:
>>>
>>>   TEST    check-qtest-x86_64: tests/ide-test
>>> **
>>> ERROR:tests/ide-test.c:469:test_bmdma_short_prdt: assertion failed (status 
>>> == 0): (0x00000004 == 0x00000000)
>>> ERROR - Bail out! ERROR:tests/ide-test.c:469:test_bmdma_short_prdt: 
>>> assertion failed (status == 0): (0x00000004 == 0x00000000)
>>
>> Thanks for the notice.
>> Yes, I can reproduce it too with `make check-qtest-i386`.
>>
>>>> diff --git a/hw/ide/core.c b/hw/ide/core.c
>>>> index 754ff4dc34..85aac614f0 100644
>>>> --- a/hw/ide/core.c
>>>> +++ b/hw/ide/core.c
>>>> @@ -849,6 +849,7 @@ static void ide_dma_cb(void *opaque, int ret)
>>>>      int64_t sector_num;
>>>>      uint64_t offset;
>>>>      bool stay_active = false;
>>>> +    int32_t prepared = 0;
>>>>  
>>>>      if (ret == -EINVAL) {
>>>>          ide_dma_error(s);
>>>> @@ -892,12 +893,10 @@ static void ide_dma_cb(void *opaque, int ret)
>>>>      n = s->nsector;
>>>>      s->io_buffer_index = 0;
>>>>      s->io_buffer_size = n * 512;
>>>> -    if (s->bus->dma->ops->prepare_buf(s->bus->dma, s->io_buffer_size) < 
>>>> 512) {
>>>> -        /* The PRDs were too short. Reset the Active bit, but don't raise 
>>>> an
>>>> -         * interrupt. */
>>>> -        s->status = READY_STAT | SEEK_STAT;
>>>> -        dma_buf_commit(s, 0);
>>>> -        goto eot;
>>>> +    prepared = s->bus->dma->ops->prepare_buf(s->bus->dma, 
>>>> s->io_buffer_size);
>>>> +    if (prepared % 512) {
>>>> +        ide_dma_error(s);
>>>
>>> Which I assume is because you changed the error mode here compared to
>>> the old version.
>>
>> Yes, you are right.
>>
>>> I'm not sure offhand what the correct behaviour is for non-aligned
>>> values > 512. I think we actually have two cases here: Either a short or
>>> a long PRD. The commit message should explain this with spec references
>>> and a test case should be added for both cases.
>>
>> I've found the "Programming Interface for Bus Master IDE Controller" 
>> (revision
>> 1.0 5/16/94). The chapter 3.1 (Status Bit Interpretation) provides some 
>> answers.
> 
> Yes, I think that's the same that I've used before. I assume it's the
> relevant spec.
> 
>> It says that:
>>   1. If PRD's specified a smaller size than the IDE transfer size, then the
>> Interrupt and Active bits in the Controller status register are not set.
> 
>>   2. If the size of the physical memory regions was larger than the IDE 
>> device
>> transfer size, the Interrupt and Active bits in the Controller status 
>> register
>> are both set to 1.
>>
>> So my changing of the error mode in short PRD's case was wrong, and the
>> test_bmdma_short_prdt() is correct.
> 
> Yes, I think 1. is implemented correctly for PRDs that are too small and
> smaller than a sector.
> 
> I think the assumption may have been that if the PRDT contains at least
> one more full sector, we'll do that one sector before coming back to the
> same place and hitting the code path for a too short PRDT.
> 
> However, the code neglects to actually use the return value of
> .prepare_buf() to limit the number of sectors accessed. So if we ask for
> a scatter/gather list for 5 sectors and we get 3 sectors, we still
> assume we can write to all 5. This is obviously wrong.
> 
>> Now let's think about the proper fix of the qemu crash.
>>
>> Currently I don't really understand how ide_dma_cb() emulates the logic
>> described in Status Bit Interpretation chapter. I don't see any comparison
>> between the DMA transfer size and PRD's size.
>>
>> We only have this check against the size of a sector (512 bytes), which 
>> doesn't
>> catch all short PRD's cases (PRD in my PoC is 1337 bytes).
> 
> I think for making the above assumption work, we'd have to check the
> return value, which gets us something like:
> 
>     ret = s->bus->dma->ops->prepare_buf()
>     if (ret < 512) {
>         ... short PRDT code ...
>     } else if (ret < n * 512) {
>         n = ret / 512;
>     }
> 
> Instead of doing the extra iteration and executing I/O for the first
> part of the request, maybe this would work, too:
> 
>     ret = s->bus->dma->ops->prepare_buf()
>     if (ret < n * 512) {
>         ... short PRDT code ...
>     }
> 
> We need to check in the spec whether we're supposed to actually do
> partial I/O for short PRDTs. I couldn't find a clear answer with a quick
> look, but I'm leaning towards doing the partial I/O (i.e. implementing
> the first pseudo-code piece above).
> 
> 
> As for handling long PRDTs, we have code that looks like it's meant to
> handle the case:
> 
>     n = s->io_buffer_size >> 9;
>     if (n > s->nsector) {
>         /* The PRDs were longer than needed for this request. Shorten them so
>          * we don't get a negative remainder. The Active bit must remain set
>          * after the request completes. */
>         n = s->nsector;
>         stay_active = true;
>     }
> 
> bmdma_prepare_buf() does potentially set s->io_buffer_size to a value
> larger than the passed limit, so maybe this is already correct. We have
> a basic test for it in test_bmdma_long_prdt(), but I can't rule out that
> there are more complicated cases where it fails.
> 
> I'm pretty sure we must handle the long PRDT case only after doing I/O
> (like we currently do) because the operation is supposed to have
> completed by the time we signal that the PRDT was long, so the guest can
> trust that a read has actually read something and a write has reached
> the disk. The spec says "This is a valid completion case".
> 
> 
> Does this make sense to you?

Thanks a lot, Kevin!

First of all I'll improve the unit-tests to cover all cases.

Then I'll try both approaches you described and return with the results.

I'll also try to find more info about partial I/O behavior in other datasheets.

Best regards,
Alexander



reply via email to

[Prev in Thread] Current Thread [Next in Thread]