[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [RFC 0/4] POC: Generating realistic block errors
From: |
Kevin Wolf |
Subject: |
Re: [RFC 0/4] POC: Generating realistic block errors |
Date: |
Tue, 26 Nov 2019 20:28:52 +0100 |
User-agent: |
Mutt/1.12.1 (2019-06-15) |
Am 26.11.2019 um 19:19 hat Tony Asleson geschrieben:
> On 11/21/19 4:30 AM, Stefan Hajnoczi wrote:
> > blkdebug can inject EIO when a specific LBA is accessed. Is that
> > enough for what you want to do? Then you can reuse and maybe extend
> > blkdebug.
>
> Not exactly. For SCSI, I would like to be able to return different
> types of device errors on reads eg. 03/1101, 03/1600 and writes. The
> SCSI sense data needs to include the first block in error for the
> transfer. It would be good to also have the ability to include things
> like SCSI check conditions with recoverable errors too.
>
> I've been experimenting with blkdebug, to learn more and to see how it
> would need to be extended. One thing that I was trying to understand is
> how an EIO from blkdebug gets translated into a bus/device specific
> error. At the moment I'm not sure. I've been trying to figure out the
> layering. I think that blkdebug sits between the device specific model
> and the underlying block representation on disk. Thus it injects error
> return values when accessing the underlying data, but that could be
> incorrect. If it is correct I should see some code that translates the
> EIO to something transport/device specific.
The point where the device calls into the generic block layer is where
the functions that start with blk_ are called (blk_aio_pwritev() and
blk_aio_preadv() are probably the most interesting ones).
The callback path in scsi-disk is not that easy to follow, but in the
end, error returns should result in scsi_handle_rw_error() being called
where error codes are translated into SCSI sense codes.
> Although I don't understand how returning an ENOSPC from read_aio in
> blkdebug would get translated for a SCSI disk as it doesn't make sense
> to me (one of the examples in the documentation). Actually I don't
> know how getting ENOSPC on a read could happen?
That scenario doesn't make a lot of sense to me either, but blkdebug can
just inject any error code, even nonsensical ones.
> During my blkdebug experimentation, I've been using lsi53c895a with
> scsi-disk and thus far I've not been able to generate a read error back
> to the guest kernel. I've managed to abort qemu with an assert and hang
> qemu without being able to get an error back to the guest kernel. I
> wrote up one of them: https://bugs.launchpad.net/qemu/+bug/1853898 .
> Specifying a specific sector hasn't worked for me yet. I'm still trying
> to figure out how to enable tracing/debugging etc. to see what I'm going
> incorrectly.
Note that depending on the rerror/werror options, QEMU may not deliver
errors to the guest, but stop VMs instead. If the monitor is still
responsive, it's likely that you just got a stopped VM rather than a
hanging QEMU.
The default is that the VM is stopped for ENOSPC and other errors are
delivered to the guest.
Kevin