qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Call for GSoC/Outreachy internship project ideas


From: Palmer Dabbelt
Subject: Re: Call for GSoC/Outreachy internship project ideas
Date: Thu, 01 Feb 2024 10:01:13 -0800 (PST)

On Thu, 01 Feb 2024 09:39:22 PST (-0800), alex.bennee@linaro.org wrote:
Palmer Dabbelt <palmer@dabbelt.com> writes:

On Tue, 30 Jan 2024 12:28:27 PST (-0800), stefanha@gmail.com wrote:
On Tue, 30 Jan 2024 at 14:40, Palmer Dabbelt <palmer@dabbelt.com> wrote:

On Mon, 15 Jan 2024 08:32:59 PST (-0800), stefanha@gmail.com wrote:
> Dear QEMU and KVM communities,
> QEMU will apply for the Google Summer of Code and Outreachy internship
> programs again this year. Regular contributors can submit project
> ideas that they'd like to mentor by replying to this email before
> January 30th.

It's the 30th, sorry if this is late but I just saw it today.  +Alistair
and Daniel, as I didn't sync up with anyone about this so not sure if
someone else is looking already (we're not internally).
<snip>
Hi Palmer,
Performance optimization can be challenging for newcomers. I wouldn't
recommend it for a GSoC project unless you have time to seed the
project idea with specific optimizations to implement based on your
experience and profiling. That way the intern has a solid starting
point where they can have a few successes before venturing out to do
their own performance analysis.

Ya, I agree.  That's part of the reason why I wasn't sure if it's a
good idea.  At least for this one I think there should be some easy to
understand performance issue, as the loops that go very slowly consist
of a small number of instructions and go a lot slower.

I'm actually more worried about this running into a rabbit hole of
adding new TCG operations or even just having no well defined mappings
between RVV and AVX, those might make the project really hard.

You shouldn't have a hard guest-target mapping. But are you already
using the TCGVec types and they are not expanding to AVX when its
available?

Ya, sorry, I guess that was an odd way to describe it. IIUC we're doing sane stuff, it's just that RISC-V has a very different vector masking model than other ISAs. I just said AVX there because I only care about the performance on Intel servers, since that's what I run QEMU on. I'd asssume we have similar performance problems on other targets, I just haven't looked.

So my worry would be that the RVV things we're doing slowly just don't have fast implementations via AVX and thus we run into some intractable problems. That sort of stuff can be really frusturating for an intern, as everything's new to them so it can be hard to know when something's an optimization dead end.

That said, we're seeing 100x slowdows in microbenchmarks and 10x slowdowns in real code, so I think there sholud be some way to do better.

Remember for anything float we will end up with softfloat anyway so we
can't use SIMD on the backend.

Yep, but we have a handful of integer slowdowns too so I think there's some meat to chew on here. The softfloat stuff should be equally slow for scalar/vector, so we shouldn't be tripping false positives there.

Do you have the time to profile and add specifics to the project idea
by Feb 21st? If that sounds good to you, I'll add it to the project
ideas list and you can add more detailed tasks in the coming weeks.

I can at least dig up some of the examples I ran into, there's been a
handful filtering in over the last year or so.

This one
<https://gist.github.com/compnerd/daa7e68f7b4910cb6b27f856e6c2beba>
still has a much more than 10x slowdown (73ms -> 13s) with
vectorization, for example.

Thanks,
Stefan

--
Alex Bennée
Virtualisation Tech Lead @ Linaro



reply via email to

[Prev in Thread] Current Thread [Next in Thread]