>>On 22.03.21 10:25, ChangLimin wrote:
>>> For Linux 5.10/5.11, qemu write zeros to a multipath device using
>>> ioctl(fd, BLKZEROOUT, range) with cache none or directsync return -EBUSY
>>> permanently.
>>
>>So as far as I can track back the discussion, Kevin asked on v1 why we’d
>>set has_write_zeroes to false, i.e. whether the EBUSY might not go away
>>at some point, and if it did, whether we shouldn’t retry BLKZEROOUT then.
>>You haven’t explicitly replied to that question (as far as I can see),
>>so it kind of still stands.
>>
>>Implicitly, there are two conflicting answers in this patch: On one
>>hand, the commit message says “permanently”, and this is what you told
>>Nir as a realistic case where this can occur.
>
>For Linux 5.10/5.11, the EBUSY is permanently, the reproduce step is below.
>For other Linux version, the EBUSY may be temporary.
>Because Linux 5.10/5.11 is not used widely, so do not set has_write_zeroes to false.
>
>>I'm afraid ChangLimin did not answer my question. I'm looking for real
>>world used case when qemu cannot write zeros to multipath device, when
>>nobody else is using the device.
>>
>>I tried to reproduce this on Fedora (kernel 5.10) with qemu-img convert,
>>once with a multipath device, and once with logical volume on a vg created
>>on the multipath device, and I could not reproduce this issue.
>
>The following is steps to reproduct the issue on Fedora 34.
>
># uname -a
>Linux fedora-34 5.11.3-300.fc34.x86_64 #1 SMP Thu Mar 4 19:03:18 UTC 2021 x86_64 x86_64 x86_64 GNU/Linux
>
>Is this the most recent kernel? I have 5.11.7 in fedora 32.
>
>
># qemu-img -V
>qemu-img version 5.2.0 (qemu-5.2.0-5.fc34.1)
>
>1. Login in an ISCSI LUN created using targetcli on ubuntu 20.04
># iscsiadm -m discovery -t st -p 192.169.1.109
>
># iscsiadm -m node -l -T iqn.2003-01.org.linux-iscsi:lio-lv100
># iscsiadm -m session
>
>2. start multipathd service
># mpathconf --enable
># systemctl start multipathd
>
>3. add multipath path
># multipath -a `/lib/udev/scsi_id -g /dev/sdb` # sdb means the ISCSI LUN
>wwid '36001405b76856e4816b48b99c6a77de3' added
>
># multipathd add path /dev/sdb
>ok
>
># multipath -ll # /dev/dm-1 is the multipath device based on /dev/sdb
>mpatha (36001405bebfc3a0522541cda30220db9) dm-1 LIO-ORG,lv102
>size=1.0G features='0' hwhandler='1 alua' wp=rw
>`-+- policy='service-time 0' prio=50 status=active
> `- 5:0:0:0 sdd 8:48 active ready running
>
>You are using user_friendly_names which is (sadly) the default.
>But I don't think it should matter.
>
>4. qemu-img return EBUSY both to dm-1 and sdb
># qemu-img convert -O raw -t none cirros-0.4.0-x86_64-disk.img /dev/dm-1
>qemu-img: error while writing at byte 0: Device or resource busy
>
># qemu-img convert -O raw -t none cirros-0.4.0-x86_64-disk.img /dev/sdb
>qemu-img: error while writing at byte 0: Device or resource busy
>
>5. blkdiscard also return EBUSY both to dm-1 and sdb
># blkdiscard -o 0 -l 4096 /dev/dm-1
>blkdiscard: cannot open /dev/dm-1: Device or resource busy
>
># blkdiscard -o 0 -l 4096 /dev/sdb
>blkdiscard: cannot open /dev/sdb: No such file or directory
>
>6. dd write zero is good, because it does not use blkdiscard
># dd if=/dev/zero of=/dev/dm-1 bs=1M count=100 oflag=direct
>100+0 records in
>100+0 records out
>104857600 bytes (105 MB, 100 MiB) copied, 2.33623 s, 44.9 MB/s
>
>7. The LUN should support blkdiscard feature, otherwise it will not write zero
>with ioctl(fd, BLKZEROOUT, range)
>
>Thanks!
>
>I could not reproduce this with kernel 5.10, but now I'm no 5.11:
># uname -r
>5.11.7-100.fc32.x86_64
>
># qemu-img --version
>qemu-img version 5.2.0 (qemu-5.2.0-6.fc32.1)
>Copyright (c) 2003-2020 Fabrice Bellard and the QEMU Project developers
>
># cat /etc/multipath.conf
>defaults {
>user_friendly_names no
>find_multipaths no
>}
>
>blacklist_exceptions {
> property "(SCSI_IDENT_|ID_WWN)"
>}
>
>blacklist {
>}
>
># multipath -ll 36001405e884ab8ff4b44fdba6901099c
>36001405e884ab8ff4b44fdba6901099c dm-8 LIO-ORG,3-09
>size=6.0G features='0' hwhandler='1 alua' wp=rw
>`-+- policy='service-time 0' prio=50 status=active
> `- 1:0:0:9 sdk 8:160 active ready running
>
>$ lsblk /dev/sdk
>NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT
>sdk 8:160 0 6G 0 disk
>└─36001405e884ab8ff4b44fdba6901099c 253:13 0 6G 0 mpath
>
>$ virt-builder fedora-32 -o disk.img
>[ 3.8] Planning how to build this image
>[ 3.8] Uncompressing
>[ 11.1] Opening the new disk
>[ 16.1] Setting a random seed
>[ 16.1] Setting passwords
>virt-builder: Setting random password of root to TcikQqRxAaIqS1kF
>[ 17.0] Finishing off
> Output file: disk.img
> Output size: 6.0G
> Output format: raw
> Total usable space: 5.4G
> Free space: 4.0G (74%)
>
>$ qemu-img info disk.img
>image: disk.img
>file format: raw
>virtual size: 6 GiB (6442450944 bytes)
>disk size: 1.29 GiB
>
># qemu-img convert -p -f raw -O raw -t none -T none disk.img /dev/mapper/36001405e884ab8ff4b44fdba6901099c
> (100.00/100%)
>
>Works.
>
># lsblk /dev/sdk
>NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT
>sdk 8:160 0 6G 0 disk
>└─36001405e884ab8ff4b44fdba6901099c 253:13 0 6G 0 mpath
> ├─36001405e884ab8ff4b44fdba6901099c1 253:14 0 1M 0 part
> ├─36001405e884ab8ff4b44fdba6901099c2 253:15 0 1G 0 part
> ├─36001405e884ab8ff4b44fdba6901099c3 253:16 0 615M 0 part
> └─36001405e884ab8ff4b44fdba6901099c4 253:17 0 4.4G 0 part
>
># qemu-img convert -p -f raw -O raw -t none -T none disk.img /dev/mapper/36001405e884ab8ff4b44fdba6901099c
> (100.00/100%)
>
>Works, wiping the partitions.
>
># mount /dev/mapper/36001405e884ab8ff4b44fdba6901099c4 /tmp/mnt
>
># mount | grep /dev/mapper/36001405e884ab8ff4b44fdba6901099c4
>/dev/mapper/36001405e884ab8ff4b44fdba6901099c4 on /tmp/mnt type xfs (rw,relatime,seclabel,attr2,inode64,logbufs=8,logbsize=32k,noquota)
>
>This is invalid use, converting to device with mounted file system, but let's try:
>
># qemu-img convert -p -f raw -O raw -t none -T none disk.img /dev/mapper/36001405e884ab8ff4b44fdba6901099c
> (100.00/100%)
>
>Still works?!
>
># wipefs -a /dev/mapper/36001405e884ab8ff4b44fdba6901099c
>wipefs: error: /dev/mapper/36001405e884ab8ff4b44fdba6901099c: probing initialization failed: Device or resource busy
>
># blkdiscard /dev/mapper/36001405e884ab8ff4b44fdba6901099c
>
>Works.
>
>This the configuration of the LUN on the server side:
>
> {
> "aio": false,
> "alua_tpgs": [
> {
> "alua_access_state": 0,
> "alua_access_status": 0,
> "alua_access_type": 3,
> "alua_support_active_nonoptimized": 1,
> "alua_support_active_optimized": 1,
> "alua_support_offline": 1,
> "alua_support_standby": 1,
> "alua_support_transitioning": 1,
> "alua_support_unavailable": 1,
> "alua_write_metadata": 0,
> "implicit_trans_secs": 0,
> "name": "default_tg_pt_gp",
> "nonop_delay_msecs": 100,
> "preferred": 0,
> "tg_pt_gp_id": 0,
> "trans_delay_msecs": 0
> }
> ],
> "attributes": {
> "block_size": 512,
> "emulate_3pc": 1,
> "emulate_caw": 1,
> "emulate_dpo": 1,
> "emulate_fua_read": 1,
> "emulate_fua_write": 1,
> "emulate_model_alias": 1,
> "emulate_pr": 1,
> "emulate_rest_reord": 0,
> "emulate_tas": 1,
> "emulate_tpu": 1,
> "emulate_tpws": 1,
> "emulate_ua_intlck_ctrl": 0,
> "emulate_write_cache": 1,
> "enforce_pr_isids": 1,
> "force_pr_aptpl": 0,
> "is_nonrot": 0,
> "max_unmap_block_desc_count": 1,
> "max_unmap_lba_count": 8192,
> "max_write_same_len": 65335,
> "optimal_sectors": 16384,
> "pi_prot_format": 0,
> "pi_prot_type": 0,
> "pi_prot_verify": 0,
> "queue_depth": 128,
> "unmap_granularity": 1,
> "unmap_granularity_alignment": 0,
> "unmap_zeroes_data": 0
> },
> "dev": "/target/3/09",
> "name": "3-09",
> "plugin": "fileio",
> "size": 6442450944,
> "write_back": true,
> "wwn": "e884ab8f-f4b4-4fdb-a690-1099c072c86d"
> },
>
>Maybe this upstream change is not in all downstream 5.11 kernels, or 5.11.7
>already includes the fix?
Linux 5.10.24/5.11.7 already merged the fix on 2021-03-17 by Greg Kroah-Hartman.
# git describe d44c9780ed40db88626c9354868eab72159c7a7f
v5.10.23-184-gd44c9780ed40
# git describe 7e0815797656f029fab2edc309406cddf931b9d8
v5.11.6-178-g7e0815797656