qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

RE: [PATCH v4 04/16] hw/9pfs: Implement Windows specific xxxdir() APIs


From: Shi, Guohuai
Subject: RE: [PATCH v4 04/16] hw/9pfs: Implement Windows specific xxxdir() APIs
Date: Tue, 7 Feb 2023 17:55:41 +0000


> -----Original Message-----
> From: Christian Schoenebeck <qemu_oss@crudebyte.com>
> Sent: Tuesday, February 7, 2023 18:12
> To: Greg Kurz <groug@kaod.org>; qemu-devel@nongnu.org
> Cc: Meng, Bin <Bin.Meng@windriver.com>; Marc-André Lureau
> <marcandre.lureau@redhat.com>; Daniel P. Berrangé <berrange@redhat.com>; Shi,
> Guohuai <Guohuai.Shi@windriver.com>
> Subject: Re: [PATCH v4 04/16] hw/9pfs: Implement Windows specific xxxdir()
> APIs
> 
> CAUTION: This email comes from a non Wind River email account!
> Do not click links or open attachments unless you recognize the sender and
> know the content is safe.
> 
> On Monday, February 6, 2023 6:37:16 AM CET Shi, Guohuai wrote:
> [...]
> > > I know, it's an n-square performance issue and what I already wrote
> > > in the summary of the linked original suggestion [1] in v3 before, quote:
> > >
> > >   + Relatively straight-forward to implement.
> > >
> > >   + No (major) changes in 9pfs code base required.
> > >
> > >   - Still n-square performance issue (neglectable to land Windows
> > > host support
> > >     IMO).
> > >
> > >   o Consistency assured for "most" cases, except one: if hardlinks are
> > >     inserted in between then it might fail
> >
> > readdir() on Linux host may also return the deleted entries.
> > And POSIX specification does not mention about the consistency issue.
> 
> POSIX explicitly specifies that 1. new and 2. deleted entries may or may not
> appear in result and leaves that implementation specific. That was never our
> concern.
> 
> And yes, POSIX does not explicitly discuss consistency concerning entries
> that have neither been added or removed, but this expectation is implied. In
> practice double entries are probably less of an issue, client might be able
> to handle that without misbehaviour (haven't checked this at all yet), but if
> the implementation would lead to chances that entries may *never* appear to
> clients at all, even after refreshing periodically, I mean how could you work
> with a file system like that?
> 
> > NTFS file id is the $MFT index id. It will keen unique until file is
> deleted.
> > But the index id may be reuse if delete and re-create many files.
> >
> > Saving file id instead of name will make consistency better, but may not
> cover all status.
> > Because read directory is not a "atomic" operation.
> 
> I don't see an issue with that, because these are entries that were either
> added or removed, we don't care about them. And their file IDs would not
> affect fetching the other directory entries that have not been touched in
> between.
> 
> And we are also not questioning atomicity here, but consistency.
> 
> > > [1] https://lore.kernel.org/qemu-devel/2468168.SvRIHAoRfs@silver/
> > >
> > > The idea was to use that just as a starting point to land Windows
> > > host support ASAP, slower on large dirs compared to other solutions,
> > > yes, but with guaranteed correct and deterministic behaviour. And
> > > then on the long run we would of course replace that with a more
> performant solution.
> > >
> > > I mean, this is really simple to implement, so I would at least test
> > > it. If it really runs horribly slow we could still discuss faster
> > > solutions, which are however all much more tricky.
> > >
> >
> > I did a basic test on Windows host, here is the code:
> >
> >     st = clock();
> >     pDir = opendir_win32(TEST_DIR);
> >
> >     if (pDir == NULL)
> >         return -1;
> >
> >     while ((pEnt = readdir_win32(pDir)) != NULL)
> >     {
> >         totals++;
> >     }
> >     closedir_win32(pDir);
> >     ed = clock();
> >
> >     printf("total = %d clocks = %d %d\n", totals, ed - st,
> > CLOCKS_PER_SEC);
> >
> > My local storage is SSD disk.
> >
> > Run this test for 100, 1000, 10000 entries.
> > For file name cache solution, the time cost is: 2, 9, 44 (in ms).
> > For file id cache solution, the time cost: 3, 438, 4338 (in ms).
> > I already used OpenFileById() to make it faster instead of CreateFile(). If
> I use CreateFile, it need more than 80 seconds.
> >
> > The performance looks like not good.
> > And actually, it would be worse in 9pfs.
> > Because in current design, 9pfs  may seek forward and seek back several
> times during reading directory, which may cause the performance worse.
> 
> Poor performance, yes, probably a bit worse than I would have expected.
> 
> So it is about choosing your poison (potential crash vs. poor performance).
> 
> I mean, I am not keen into suggesting any kind of bike shredding for you on
> this issue, but if this is merged, then people expect it to behave reliably
> and not allowing a guest to crash QEMU host process by simply creating a
> large number of directory entries on guest.
> 
> I was also considering to make it a QEMU option, but OTOH, this is a
> temporary situation and those options would be wiped once we have an
> oppropriate solution a bit later.
> 
> I am open for suggestions. Could we probably just mark Windows host support
> as experimental for now, is that even allowed by QEMU policies?

Yes, it is hard to choose:

a) 1 file id entry is 24 bytes, to reduce memory fragment, I used an array to 
store the file ids.
b) 1 file name entry is ~300 bytes, by using link list.

If there are 1-million files in one directory, a) need 24 MB continues memory 
buffer, b) need 300 MB memory (no need continues).
If there are 10-million files in one directory, a) need 240 MB continues memory 
buffer, b) need 3 GB memory (no need continues).

Both #a and #b are need more and more memory buffer. If there no more free 
memory, opendir() will be failed.
However, is it a normal status that a directory contains more than 1-million 
files?

I will prepare an new version solution just for this commit with storing file 
id.
The new patch would be ready tomorrow.

Thanks
Guohuai

> 
> > > > So I think store all name entries would be better than store all file
> ID.
> > >
> > > As already discussed, NTFS allows up to (2^32 - 1) = 4,294,967,295
> > > entries per directory. So caching only one directory (entirely) in
> > > RAM can already exceed the available RAM, which would crash QEMU.
> > > Multiplied by an expected amount of directory lookups by client and
> > > we even get into much higher categories, even with much smaller
> individual directory sizes.
> > >
> >
> > Windows file id structure is 24 bytes, which is not a small structure.
> > If you think the performance is acceptable, I can rework this commit based
> on file id.
> 




reply via email to

[Prev in Thread] Current Thread [Next in Thread]