Re: Call for GSoC and Outreachy project ideas for summer 2023

On Fri, Jan 27, 2023 at 3:02 PM Stefan Hajnoczi <stefanha@gmail.com> wrote:

On Fri, 27 Jan 2023 at 12:10, Warner Losh <imp@bsdimp.com> wrote:
>
> [[ cc list trimmed to just qemu-devel ]]
>
> On Fri, Jan 27, 2023 at 8:18 AM Stefan Hajnoczi <stefanha@gmail.com> wrote:
>>
>> Dear QEMU, KVM, and rust-vmm communities,
>> QEMU will apply for Google Summer of Code 2023
>> (https://summerofcode.withgoogle.com/) and has been accepted into
>> Outreachy May 2023 (https://www.outreachy.org/). You can now
>> submit internship project ideas for QEMU, KVM, and rust-vmm!
>>
>> Please reply to this email by February 6th with your project ideas.
>>
>> If you have experience contributing to QEMU, KVM, or rust-vmm you can
>> be a mentor. Mentors support interns as they work on their project. It's a
>> great way to give back and you get to work with people who are just
>> starting out in open source.
>>
>> Good project ideas are suitable for remote work by a competent
>> programmer who is not yet familiar with the codebase. In
>> addition, they are:
>> - Well-defined - the scope is clear
>> - Self-contained - there are few dependencies
>> - Uncontroversial - they are acceptable to the community
>> - Incremental - they produce deliverables along the way
>>
>> Feel free to post ideas even if you are unable to mentor the project.
>> It doesn't hurt to share the idea!
>
>
> I've been a GSoC mentor for the FreeBSD project on and off for maybe
> 10-15 years now. I thought I'd share this for feedback here.
>
> My project idea falls between the two projects. I've been trying
> to get bsd-user reviewed and upstreamed for some time now and my
> time available to do the upstreaming has been greatly diminished lately.
> It got me thinking: upstreaming is more than just getting patches reviewed
> often times. While there is a rather mechanical aspect to it (and I could likely
> automate that aspect more), the real value of going through the review process
> is that it points out things that had been done wrong, things that need to be
> redone or refactored, etc. It's often these suggestions that lead to the biggest
> investment of time on my part: Is this idea good? if I do it, does it break things?
> Is the feedback right about what's wrong, but wrong about how to fix it? etc.
> Plus the inevitable, I thought this was a good idea, implemented it only to find
> it broke other things, and how do I explain that and provide feedback to the
> reviewer about that breakage to see if it is worth pursuing further or not?
>
> So my idea for a project is two fold: First, to create scripts to automate the
> upstreaming process: to break big files into bite-sized chunks for review on
> this list. git publish does a great job from there. The current backlog to upstream
> is approximately " 175 files changed, 30270 insertions(+), 640 deletions(-)" which
> is 300-600 patches at the 50-100 line patch guidance I've been given. So even
> at .1hr (6 minutes) per patch (which is about 3x faster than I can do it by hand),
> that's ~60 hours just to create the patches. Writing automation should take
> much less time. Realistically, this is on the order of 10-20 hours to get done.
>
> Second, it's to take feedback from the reviews for refactoring
> the bsd-user code base (which will eventually land in upstream). I often spend
> a few hours creating my patches each quarter, then about 10 or so hours for the
> 30ish patches that I do processing the review feedback by refactoring other things
> (typically other architectures), checking details of other architectures (usually by
> looking at the FreeBSD kernel), or looking for ways to refactor to share code with
> linux-user (though so far only the safe signals is upstream: elf could be too), or
> chatting online about the feedback to better understand it, to see what I can mine
> from linux-user (since the code is derived from that, but didn't pick up all the changes
> linus-user has), etc. This would be on the order of 100 hours.
>
> Third, the testing infrastructure that exists for linux-user is not well leveraged to test
> bsd-user. I've done some tests from time to time with it, but it's not in a state that it
> can be used as, say, part of a CI pipeline. In addition, the FreeBSD project has some
> very large jobs, a subset of which could be used to further ensure that critical bits of
> infrastructure don't break (or are working if not in a CI pipeline). Things like building
> and using go, rust and the like are constantly breaking for reasons too long to enumerate
> here. This job could be as little as 50 hours to do a minimal but complete enough for CI job,
> or as much as 200 hours to do a more complete jobs that could be used to bisect breakage
> more quickly and give good assurance that at any given time bsd-user is useful and working.
>
> That's in addition to growing the number of people that can work on this code and
> on the *-user code in general since they are quite similar.
>
> Some of these tasks are squarely in the qemu-realm, while others are in the FreeBSD realm,
> but that's similar to linux-user which requires very heavy interfacing with the linux realm. It's
> just that a lot of that work is already complete so the needs are substantially less there on an
> ongoing basis. Since it does stratal the two projects, I'm unsure where to propose this project
> be housed. But since this is a call for ideas, I thought I'd float it to see what the feedback is. I'm
> happy to write this up in a more formal sense if it would be seriously considered, but want to get
> feedback as to what areas I might want to emphasize in such a proposal.
>
> Comments?

Hi Warner,
Don't worry about it spanning FreeBSD and QEMU, you're welcome to list
the project idea through QEMU. You can have co-mentors that are not
part of the QEMU community in order to bring in additional FreeBSD
expertise.

My main thought is that getting all code upstream sounds like a
sprawling project that likely won't be finished within one internship.
Can you pick just a subset of what you described? It should be a
well-defined project that depends minimally on other people finishing
stuff or reaching agreement on something controversial? That way the
intern will be able to come up with specific tasks for their project
plan and there is little risk that they can't complete them due to
outside factors.

I like this notion of limiting the scope. There's three or maybe four main areas

that I can call out. I got to thinking about all the details I have to do for how

I've been upstreaming things, and realized that there's a lot due to the complicated

history here...

One way to go about this might be for you to define a milestone that
involves completing, testing, and upstreaming just a subset of the
out-of-tree code. For example, it might implement a limited set of
core syscall families. The intern will then focus on delivering that
instead of worrying about the daunting task of getting everything
merged. Finishing this subset would advance bsd-user FreeBSD support
by a useful degree (e.g. ability to run certain applications).

Does that sound good?

Yes. I like this, but it's hard to know what that might be because many things are

hidden behind the scenes... But I'll try running a quick build to see if I can gather

enough stats to come up with a good set of tests... But maybe I'll start with building

'hello world' with clang on armv7 running on an amd64 host to see what's missing

today. I also have an aarch64 set of patches I might try hard to get in ASAP so that

might be the target instead (since it might be a bit more useful).

Warner

Stefan

From:	Warner Losh
Subject:	Re: Call for GSoC and Outreachy project ideas for summer 2023
Date:	Wed, 8 Feb 2023 16:01:51 -0700