qemu-block
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

comparison of coroutine backends


From: Paolo Bonzini
Subject: comparison of coroutine backends
Date: Fri, 18 Mar 2022 09:48:37 +0100
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:91.0) Gecko/20100101 Thunderbird/91.5.0

Hi all,

based on the previous discussions here is a comparison of the various
possibilities for implementing coroutine backends in QEMU and the
respective advantages and disadvantages.

I'm adding a third possibility for stackless coroutines, which is to
use the LLVM/clang builtins.  I believe that would still require a
source-to-source translator, but it would offload to the compiler the
complicated bits such as liveness analysis.

1) Stackful coroutines:
Advantages:
- no changes to current code

Disadvantages:
- portability issues regarding shadow stacks (SafeStack, CET)
- portability/nonconformance issues regarding TLS

Another possible advantage is that it allows using the same function for
both coroutine and non-coroutine context.  I'm listing this separately
because I'm not sure that's desirable, as it prevents compile-time
checking of calls to coroutine_fn.  Compile-time checking would be
possible using clang -fthread-safety if we forgo the ability to use the
same function in both scenarios.


2) "Duff's device" stackless coroutines
Advantages:
- no portability issues regarding both shadow stacks and TLS
- compiles to good old C code
- compile-time checking of "coroutine-only" but not awaitable functions
- debuggability: stack frames should be easy to inspect

Disadvantages:
- complex source-to-source translator
- more complex build process


3) C++20 stackless coroutines
Advantages:
- no portability issues regarding both shadow stacks and TLS
- no code to write outside QEMU
- simpler build process

Disadvantages:
- requires a new compiler
- it's C++
- no compile-time checking of "coroutine-only" but not awaitable functions


4) LLVM stackless coroutines
Advantages:
- no portability issues regarding both shadow stacks and TLS
- no code to write outside QEMU

Disadvantages:
- relatively simple source-to-source translator
- more complex build process
- requires a new compiler and doesn't support GCC


Note that (2) would still have a build dependency on libclang.
However the code generation could still be done with GCC and with
any compiler version.

I'll also put it in a table, though I understand that some choices
here might be debatable:

                         stackful      Duff's device            C++20           
   LLVM
==============================================================================================
Code to write/maintain    ++ [1]             ---                   +++          
    - [2]
Changes to existing code  ++ [3]             -                     --           
    -
Community acceptance      ++                 ++                    --           
    ?
Code or PoC exists        ++                 +                     -            
    --
==============================================================================================
Portability               --                 ++                    +            
    -
Debuggability             -                  ++                    ?            
    ?
Performance               -                  ++ [4]                ++           
    ++

[1] I'm penalizing stackful coroutines here because the worse portability
has an impact on future maintainability too.

[2] This is an educated guess.

[3] If we decide to remove the possibility of using the same function for
both coroutine and non-coroutine context, the changes to existing code
would be the same as for Duff's device and LLVM coroutines.

[4] Slightly worse than C++20 coroutines for the PoC, but that is mostly due
to implementation choices that are easy to change.


Stackful coroutines are obviously pretty good, or we wouldn't have used them.
They might be a local optimum though, as shown by the negative points in terms
of portability, debuggability and performance.

Both Duff's device and LLVM would be more or less transparent to the part of
the community that doesn't care about the coroutines.  The translator would
probably be write-and-forget (though I'm not sure about the API stability of
libclang, which would be a major factor), but it would still be a substantial
amount of work to commit to.

Thanks,

Paolo




reply via email to

[Prev in Thread] Current Thread [Next in Thread]