|
From: | Paolo Bonzini |
Subject: | Re: [PATCH experiment 00/35] stackless coroutine backend |
Date: | Fri, 11 Mar 2022 13:04:33 +0100 |
User-agent: | Mozilla/5.0 (X11; Linux x86_64; rv:91.0) Gecko/20100101 Thunderbird/91.5.0 |
On 3/11/22 10:27, Stefan Hajnoczi wrote:
Not quite voluntarily, but I noticed I had to add one 0 to make them run for a decent amount of time. So yeah, it's much faster than siglongjmp.That's a nice first indication that performance will be good. I guess that deep coroutine_fn stacks could be less efficient with stackless coroutines compared to ucontext, but the cost of switching between coroutines (enter/yield) will be lower with stackless coroutines.
Note that right now I'm not placing the coroutine_fn stack on the heap, it's still allocated from a contiguous area in virtual address space. The contiguous allocation is wrapped by coroutine_stack_alloc and coroutine_stack_free, so it's really easy to change them to malloc and free.
I also do not have to walk up the whole call stack on coroutine_fn yields, because calls from one coroutine_fn to the next are tail calls; in exchange for that, I have more indirect calls than if the code did
if (next_call() == COROUTINE_YIELD) { return COROUTINE_YIELD; } For now the choice was again just the one that made the translation easiest. Today I also managed to implement a QEMU-like API on top of C++ coroutines: CoroutineFn<int> return_int() { co_await qemu_coroutine_yield(); co_return 30; } CoroutineFn<void> return_void() { co_await qemu_coroutine_yield(); } CoroutineFn<void> co(void *) { co_await return_void(); printf("%d\n", co_await return_int()) co_await qemu_coroutine_yield(); } int main() { Coroutine *f = qemu_coroutine_create(co, NULL); printf("--- 0\n"); qemu_coroutine_enter(f); printf("--- 1\n"); qemu_coroutine_enter(f); printf("--- 2\n"); qemu_coroutine_enter(f); printf("--- 3\n"); qemu_coroutine_enter(f); printf("--- 4\n"); } The runtime code is absurdly obscure; my favorite bit is Yield qemu_coroutine_yield() { return Yield(); }:) However, at 200 lines of code it's certainly smaller than a source-to-source translator. It might be worth investigating a bit more. Only files that define or use a coroutine_fn (which includes callers of qemu_coroutine_create) would have to be compiled as C++.
Paolo
[Prev in Thread] | Current Thread | [Next in Thread] |