qemu-block
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH v4 2/2] file-posix: add sg_get_max_segments that actually wor


From: Maxim Levitsky
Subject: Re: [PATCH v4 2/2] file-posix: add sg_get_max_segments that actually works with sg
Date: Sun, 06 Sep 2020 15:53:11 +0300
User-agent: Evolution 3.36.3 (3.36.3-1.fc32)

On Sun, 2020-09-06 at 19:04 +0800, Tom Yan wrote:
> Maybe you want to add some condition for this:
> https://github.com/qemu/qemu/blob/v5.1.0/nbd/server.c#L659
> Or not clamp it at all.
> 
> On Sun, 6 Sep 2020 at 18:58, Tom Yan <tom.ty89@gmail.com> wrote:
> > In commit 867eccfed84f96b54f4a432c510a02c2ce03b430, Levitsky appears
> > to have assumed that the only "SCSI Passthrough" is `-device
> > scsi-generic`, while the fact is there's also `-device scsi-block`
> > (passthrough without the sg driver). Unlike `-device scsi-hd`, getting
> > max_sectors is necessary to it (more precisely, hw_max_sectors might
> > what matters, but BLKSECTGET reports max_sectors, so).
> > 
> > I'm unsure about how qemu-nbd works, but the commit clearly wasn't the
> > right approach to fix the original issue it addresses. (It should for
> > example ignore the max_transfer if it will never matter in to it, or
> > overrides it in certain cases; when I glimpsed over
> > https://bugzilla.redhat.com/show_bug.cgi?id=1647104, I don't see how
> > it could be file-posix problem when it is reporting the right thing,
> > regardless of whether "removing" the code helps.)
> > 
> > I don't think we want to "mark" `-device scsi-block` as sg either. It
> > will probably bring even more unexpected problems, because they are
> > literally different sets of things behind the scene / in the kernel.

Yes, I did overlook the fact that scsi-block is kind of hybrid passthough 
device,
doing SG_IO for some things and regular IO for others.

I don't think that my approach was wrong way to fix the problem, but as you 
found
out, abusing 'bs->sg' hack (which I would be very happy to remove completely)
wasn't a good idea.
I actualy was aware of scsi-block and that it does SG_IO but it 
was forgotten some where on the way.

So in summary what the problem is and what I think is the right solution:


Each qemu block driver exposes block limits and assumes that they are the same 
for two IO interfaces the block driver can expose:

1. Regular qemu blk_pread/pwrite alike functions
2. blk_ioctl (tiny wrapper against SG_IO), supported by posix-raw on 
   host block devices/sg char devices, and by iscsi

The problem is that these two interfaces can have different block limits.

I don't know about iscsi, but for files, doing regular IO is always unlimited,
since it passes through the kernel block layer and segemented there on 
demand which is faster that doing it in userspace, while SG_IO is passed as is
to the underlying SCSI device and lacks this segmentation.

Regardless of how NBD uses these limits, I think that these limits should be 
correctly
exposed by the block drivers, and therefore I propose is that each qemu block 
driver 
would expose a pair of block limits.
One for the regular block IO, and other for SG_IO.

Then block driver clients (like scsi devices that you mention, nbd, etc) 
can choose which limit to use, depending on which IO api they use.
The scsi-generic, and scsi-block can use the SG_IO limits, 
while the rest  can use the normal (unlimited for file I/O) limits, including 
the NBD.

Best regards,
        Maxim Levitsky

> > 
> > On Fri, 4 Sep 2020 at 10:09, Tom Yan <tom.ty89@gmail.com> wrote:
> > > sg devices have different major/minor than their corresponding
> > > block devices. Using sysfs to get max segments never really worked
> > > for them.
> > > 
> > > Fortunately the sg driver provides an ioctl to get sg_tablesize,
> > > which is apparently equivalent to max segments.
> > > 
> > > Signed-off-by: Tom Yan <tom.ty89@gmail.com>
> > > ---
> > >  block/file-posix.c | 17 ++++++++++++++++-
> > >  1 file changed, 16 insertions(+), 1 deletion(-)
> > > 
> > > diff --git a/block/file-posix.c b/block/file-posix.c
> > > index 411a23cf99..9e37594145 100644
> > > --- a/block/file-posix.c
> > > +++ b/block/file-posix.c
> > > @@ -1178,6 +1178,21 @@ static int sg_get_max_transfer_length(int fd)
> > >  #endif
> > >  }
> > > 
> > > +static int sg_get_max_segments(int fd)
> > > +{
> > > +#ifdef SG_GET_SG_TABLESIZE
> > > +    long max_segments = 0;
> > > +
> > > +    if (ioctl(fd, SG_GET_SG_TABLESIZE, &max_segments) == 0) {
> > > +        return max_segments;
> > > +    } else {
> > > +        return -errno;
> > > +    }
> > > +#else
> > > +    return -ENOSYS;
> > > +#endif
> > > +}
> > > +
> > >  static int get_max_transfer_length(int fd)
> > >  {
> > >  #if defined(BLKSECTGET) && defined(BLKSSZGET)
> > > @@ -1268,7 +1283,7 @@ static void hdev_refresh_limits(BlockDriverState 
> > > *bs, Error **errp)
> > >          bs->bl.max_transfer = pow2floor(ret);
> > >      }
> > > 
> > > -    ret = get_max_segments(s->fd);
> > > +    ret = bs->sg ? sg_get_max_segments(s->fd) : get_max_segments(s->fd);
> > >      if (ret > 0) {
> > >          bs->bl.max_transfer = MIN_NON_ZERO(bs->bl.max_transfer,
> > >                                             ret * 
> > > qemu_real_host_page_size);
> > > --
> > > 2.28.0
> > > 





reply via email to

[Prev in Thread] Current Thread [Next in Thread]