qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] [libvirt patch] qemu: adds support for virtfs 9p argume


From: Christian Schoenebeck
Subject: Re: [Qemu-devel] [libvirt patch] qemu: adds support for virtfs 9p argument 'vii'
Date: Fri, 17 May 2019 22:53:41 +0200

On Freitag, 17. Mai 2019 16:47:46 CEST Greg Kurz wrote:
> Potentially yes if another approach is satisfying enough, as I wouldn't
> want to over-engineer too much around this 9p imposed limitation. The
> right thing to do would be to come up with a new version of the protocol
> with variable sized QIDs and call it a day.

Yes, but that's the long-term solution which will still take a very long 
while. This patch set is already a functional solution until that happens, and 
this 9p issue is IMO quite severe (as minor as data corruption, data loss and 
exists for several years already).

> > plus it completely destroys the fundamental idea about 9p, which is about
> > transparency of the higher level(s).
> 
> That's a point indeed, even if again I'm not sure if this is a frequent
> case to share a directory tree spanning over multiple devices.

When the use case is mass storage then it is very likely that you will have 
several devices. Times have changed. With modern file systems like ZFS it is 
very common to create a large amount of "data sets" for all kinds of 
individual purposes and mount points which is cheap to get. Each fs data set 
is a separate "device" from OS (i.e. Linux) point of view, even if all those 
FS data sets are in the same FS pool and even on the same physical IO device.

Background: The concept of FS "data sets" combines the benefits of classical  
partitions (e.g. logical file space separation, independent fs configurations 
like compression on/off/algorithm, data deduplication on/off, snapshot 
isolation, snapshots on/off) without the disadvantages of classical real 
partitions (physical space is dynamically shared, no space wasted on fixed 
boundaries; physical device pool management is transparent for all data sets, 
configuration options can be inherited from parent data sets).

> I don't have that much time to spend on 9p maintainership, for both
> reviewing and fixing bugs (CVEs most of the time). So, yes it may
> sound like I want to drop the patchset, but it's just I need to be
> convinced I won't regret having merged a huge change like this...
> when I'll have to support it alone later ;-)

Actually I already assumed this to be the actual root cause.

I see that you are currently the only maintainer, and my plan was not to just 
provide a one time shot but eventually hang along helping with maintaining it 
due my use of 9p and its huge potential I see (as universal and efficient root 
file system for all guests, not just for exotically sharing a small tree 
portion between guests). I also have plans for supporting native file forks 
BTW. But if you are seriously suggesting to simply omit a fundamental issue in 
9p, then my plans would obviously no longer make sense. :)

I mean I am completely tolerant to all kinds of religious views on bit level, 
languages, code style, libs, precise implementation details, parameters, 
source structure, etc.; but saying to simply leave a fundamental bug open for 
years to come, that's where I have to drop out.

> For the moment, I'm not convinced by the "vii" solution. It even
> motivated my suggestion of having several devices actually, since
> the paths you would put in there are known before starting QEMU.

Well, that "vii" configuration and the QID persistency are 2 optional patches 
on top of the core fixes. It is a huge difference to suggest dropping these 2 
patches than saying to completely drop the entire patch set and to leave this 
bug open.

The mandatory core fixes that I see (for the short/mid term) are at least 
Antonios' patches plus my variable length prefix patch, the latter 
significantly 
reduces the inode numbers on guest and significantly increases the inode name 
space, and hence also significanlty reduces the propability that 9p might ever 
need to kick in with any kind of expensive remapping actions (or even 
something worse like stat fatally returning -ENFILE to the user).

About the "vii" configuration, even though you don't like the idea: there is 
also a big difference giving the user the _optional_ possibility to define e.g. 
one path (not device) on guest said to be sensitive regarding high inode 
numbers on guest; and something completely different telling the user that he 
_must_ configure every single device from host that is ever supposed to pop up 
with 9p on guest and forcing the user to update that configuration whenever a 
new device is added or removed on host. The "vii" configuration feature does 
not require any knowledge of how many and which kinds of devices are actually 
ever used on host (nor on any higher level host in case of nested 
virtualization), nor does that "vii" config require any changes ever when host 
device setup changes. So 9p's core transparency feature would not be touched 
at all.

> It might take me some more rounds of discussion to decide. I understand
> it is frustrating but bear with me :)

Let me make a different suggestion: how about putting these fixes into a 
separate C unit for now and making the default behaviour (if you really want) 
to not use any of that code by default at all. So the user would just get an 
error message in the qemu log files by default if he tries to export several 
devices with one 9p device, suggesting him either to map them as separate 9p 
devices (like you suggested) and informing the user about the alternative of 
enabling support for the automatic inode remapping code (from that separate C 
unit) instead by adding one convenient config option if he/she really wants.

That way we would have a fix now, not in years, people can decide to use the 
automatic and hardware transparent solution, instead of being forced to write 
dozens of config lines for each single guest, or they might decide to stick 
with your "solution" ;-).

And once the long term solution for this issue with variable length QIDs is in 
place, the inode remapping code can very simply be dropped from the code base 
completley by just deleting the C unit and about a hand full of lines in 9p.c 
or so, and hence this fix would not bloat the existing 9p units nor cause 
maintenance nightmares of any kind.

Best regards,
Christian Schoenebeck



reply via email to

[Prev in Thread] Current Thread [Next in Thread]