[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [PATCH v4 10/11] 9pfs: T_readdir latency optimization
From: |
Christian Schoenebeck |
Subject: |
Re: [PATCH v4 10/11] 9pfs: T_readdir latency optimization |
Date: |
Thu, 23 Jan 2020 13:57:23 +0100 |
On Donnerstag, 23. Januar 2020 12:33:42 CET Greg Kurz wrote:
> On Tue, 21 Jan 2020 01:30:10 +0100
>
> Christian Schoenebeck <address@hidden> wrote:
> > Make top half really top half and bottom half really bottom half:
> >
> > Each T_readdir request handling is hopping between threads (main
> > I/O thread and background I/O driver threads) several times for
> > every individual directory entry, which sums up to huge latencies
> > for handling just a single T_readdir request.
> >
> > Instead of doing that, collect now all required directory entries
> > (including all potentially required stat buffers for each entry) in
> > one rush on a background I/O thread from fs driver, then assemble
> > the entire resulting network response message for the readdir
> > request on main I/O thread. The fs driver is still aborting the
> > directory entry retrieval loop (on the background I/O thread) as
> > soon as it would exceed the client's requested maximum R_readdir
> > response size. So we should not have any performance penalty by
> > doing this.
> >
> > Signed-off-by: Christian Schoenebeck <address@hidden>
> > ---
>
> Ok so this is it. Not reviewed this huge patch yet but I could at
> least give a try. The gain is impressive indeed:
Tseses, so much scepticism. :)
> [greg@bahia qemu-9p]$ (cd .mbuild-$(stg branch)/obj ; export
> QTEST_QEMU_BINARY='x86_64-softmmu/qemu-system-x86_64'; make all
> tests/qtest/qos-test && for i in {1..100}; do tests/qtest/qos-test -p
> $(tests/qtest/qos-test -l | grep readdir/basic); done) |& awk '/IMPORTANT/
> { print $10 }' | sed -e 's/s//' -e 's/^/n+=1;x+=/;$ascale=6;x/n' | bc
> .009806
>
> instead of .055654, i.e. nearly 6 times faster ! This sounds promising :)
Like mentioned in the other email, performance improvement by this patch is
actually far more than factor 6 since you probably just dropped the n-square
driver hack in your benchmarks (which tainted your benchmark results):
Unoptimized readdir, with n-square correction hack:
Time client spent for waiting for reply from server: 0.082539s [MOST
IMPORTANT]
Optimized readdir, with n-square correction hack:
Time 9p server spent on synth_readdir() I/O only (synth driver): 0.001576s
Time 9p server spent on entire T_readdir request: 0.002244s [IMPORTANT]
Time client spent for waiting for reply from server: 0.002566s [MOST
IMPORTANT]
So in this particular test run performance improvement by around factor 32,
but I also observed factors around 40 before in my tests.
> Now I need to find time to do a decent review... :-\
Sure, take your time! But as you can see, it is really worth it.
And it's not just the performance improvement. This patch also reduces program
flow complexity significantly, e.g. there is just one lock and one unlock;
entry name allocation is immediately freed without any potential branch in
between, and much more. In other words: it adds safety.
Best regards,
Christian Schoenebeck
- [PATCH v4 00/11] 9pfs: readdir optimization, Christian Schoenebeck, 2020/01/20
- [PATCH v4 10/11] 9pfs: T_readdir latency optimization, Christian Schoenebeck, 2020/01/20
- [PATCH v4 05/11] tests/virtio-9p: added readdir test, Christian Schoenebeck, 2020/01/20
- [PATCH v4 04/11] hw/9pfs/9p-synth: added directory for readdir test, Christian Schoenebeck, 2020/01/20
- [PATCH v4 11/11] hw/9pfs/9p.c: benchmark time on T_readdir request, Christian Schoenebeck, 2020/01/20
- [PATCH v4 09/11] hw/9pfs/9p-synth: avoid n-square issue in synth_readdir(), Christian Schoenebeck, 2020/01/20
- [PATCH v4 02/11] 9pfs: require msize >= 4096, Christian Schoenebeck, 2020/01/20
- [PATCH v4 01/11] tests/virtio-9p: add terminating null in v9fs_string_read(), Christian Schoenebeck, 2020/01/20
- [PATCH v4 07/11] tests/virtio-9p: failing splitted readdir test, Christian Schoenebeck, 2020/01/20