[SCM] GNU Mach branch, master, updated. v1.8-917-gff6f2240

commit-hurd
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[SCM] GNU Mach branch, master, updated. v1.8-917-gff6f2240

From:	Samuel Thibault
Subject:	[SCM] GNU Mach branch, master, updated. v1.8-917-gff6f2240
Date:	Mon, 15 Apr 2024 21:03:08 -0400 (EDT)
This is an automated email from the git hooks/post-receive script. It was
generated because a ref change was pushed to the repository containing
the project "GNU Mach".

The branch, master has been updated
       via  ff6f22408260b191b0348029025432def45736c2 (commit)
       via  db8dacb578b687574ba900298a4159c887dd18d0 (commit)
       via  cf8afe49af2c7c77dfc6701da70d6700f7e6f1b5 (commit)
       via  d37c7a3fdb2edf5d3865fbac547297651a9b1941 (commit)
       via  fd2482bb3500602b8f747ab96df195b1d5c511d5 (commit)
       via  a0ff799984cf2ed1a50c1161aabff3aaee622c64 (commit)
       via  b23e7718dd9942b27edfac9aee05f737d0e6922e (commit)
       via  b792f1eb08e34db89ac200a86ea7e6859a6b0668 (commit)
      from  782b800f2aecfe39ce5d9cdd8cc7b4c7f36ea398 (commit)

Those revisions listed above that are new to this repository have
not appeared on any other notification email; so we list those
revisions in full, below.

- Log -----------------------------------------------------------------
commit ff6f22408260b191b0348029025432def45736c2
Author: Sergey Bugaev <bugaevc@gmail.com>
Date:   Mon Apr 15 12:01:48 2024 +0300

    Add thread_set_self_state() trap
    
    This is a new Mach trap that sets the calling thread's state to the
    passed value, as if with a call to the thread_set_state() RPC.  If the
    flavor of state being set is the one that contains the register used for
    syscall return value (i386_THREAD_STATE or i386_REGS_SEGS_STATE on x86,
    AARCH64_THREAD_STATE on AArch64), the set register value is *not*
    overwritten with KERN_SUCCESS when the state gets set successfully, yet
    errors do get reported if the syscall fails.
    
    Although the trap is intended to enable userland to implement sigreturn
    functionality in the AArch64 port (more on which below), the trap itself
    is architecture-independent, and fully implemented in terms of the
    existing kernel routines (thread_setstatus & thread_set_syscall_return).
    
    This trap's functionality is similar to sigreturn() on Unix or
    NtContinue() on NT.  The use case for these all is restoring the local
    state of an interrupted thread in the following set-up:
    
    1. A thread is running some arbitrary code.
    2. An event happens that deserves the thread's immediate attention,
       analogous to a hardware interrupt request.  This might be caused by
       the thread itself (e.g. running into a Mach exception that was
       arranged to be handled by the same thread), or by external events
       (e.g. receiving a Unix SIGCHLD).
    3. Another thread (or perhaps the kernel, although this is not the case
       on Mach) suspends the thread, saves its state at the point of
       interruption, alters its state to execute some sort of handler for
       the event, and resumes the thread again, now running the handler.
    4. Once the thread is done running the handler, it wants to return to
       what it was doing at the time it was interrupted.  To do this, it
       needs to restore the state as saved at the moment of interruption.
    
    Unlike with setjmp()/longjmp(), we cannot rely on the interrupted logic
    collaborating in any way, as it's not aware that it's being interrupted.
    This means that we have to fully restore the state, including values of
    all the general-purpose registers, as well as the stack pointer, program
    counter, and any state flags.
    
    Depending on the instruction set, this may or may not be possible to do
    fully in userland, simply by loading all the registers with their saved
    values.  It should be more or less easy to load the saved values into
    general-purpose registers, but state flags and the program counter can
    be more of a challenge.  Loading the program counter value (in other
    words, performing an indirect jump to the interrupted instruction) has
    to be the very last thing we do, since we don't control the program flow
    after that.  The only real place program counter can be loaded from is
    popped off the stack, since all general-purpose registers would already
    contain their restored values by that point, and using global storage is
    incompatible with another interruption of the same kind happening at the
    time we were about to return.  For the same reason, the saved program
    counter cannot be really stored outside of the "active" stack area (such
    as below the stack pointer), since otherwise it can get clobbered by
    another interruption.
    
    This means that to support fully-userland returns, the instruction set
    must provide a single instruction that loads an address from the stack,
    adjusts the stack pointer, and performs an indirect jump to the loaded
    address.  The instruction must also either preserve (previously
    restored) state flags, or additionally load state flags from the stack
    in addition to the jump address.
    
    On x86, 'ret' is such an instruction: it pops an address from the stack,
    adjusting the stack pointer without modifying flags, and performs an
    indirect jump to the address.  On x86_64, where the ABI mandates a red
    zone, one can use the 'ret imm16' variant to additionally adjust the
    stack pointer by the size of the red zone, atomically restoring the
    value of the stack pointer at the time of the interruption while loading
    the return address from outside the red zone.  This is how sigreturn is
    implemented in glibc for the Hurd on x86.
    
    On ARM AArch32, 'pop {pc}' (alternatively written 'ldr pc, [sp], #4') is
    such an instruction: since SP and PC are just general-purpose, directly
    accessible registers (r13 and r15), it is possible to perform a load
    from the address pointed to by SP into PC, with a post-increment of SP.
    It is, in fact, possible to restore all the other general-purpose
    registers too in a single instruction this way: 'pop {r0-r12, r14, r15}'
    will do that; here r13, the stack pointer, gets incremented after all
    the other registers get loaded from the stack.  This also preserves the
    CPSR flags, which would need to be restored just prior to the 'pop'.
    
    On ARM AArch64 however, PC is no longer a directly accessible general-
    purpose register (and SP is only accessible that way by some of the
    instructions); so it is no longer possible to load PC from memory in a
    single instruction.  The only way to perform an indirect jump is by
    using one of the dedicated branching instructions ('br', 'blr', or
    'ret').  All of them accept the address to branch to in a general-
    purpose register, which is incompatible with our use case.
    
    Moreover, with the BTI extension, there is a BTYPE field in PSTATE that
    tracks which type (if any) of an indirect branch was the last executed
    instruction; this is then used to raise an exception if the instruction
    the indirect branch lands on was not intended to be a target of an
    indirect branch (of a matching type).  It is important to restore the
    BTYPE (among the other state) when returning to an interrupted context;
    failing to do that will either cause an unexpected BTI failure exception
    (if the last executed instruction before the interruption was not an
    indirect branch, but the last instruction of the restoration logic is),
    or open up a window for exploitation (if the last executed instruction
    before the interruption was an indirect branch, but the last instruction
    of the restoration logic is not -- note that 'ret' is not considered an
    indirect branch for the purposes of BTI).
    
    So, it is not possible to fully restore the state of an interrupted
    context in userland on AArch64.  The kernel can do that however (and is
    in fact doing just that every time it handles a fault or an IRQ): the
    'eret' instruction for returning from an exception is accessible to EL1
    (the kernel), but not EL0 (the user).  'eret' atomically restores PC
    from the ELR_EL1 system register, and PSTATE from the SPSR_EL1 system
    register (and does other things); both of these system registers are
    inaccessible from userland, and so couldn't have been used by the
    interrupted context for any purpose, meaning their values doesn't need
    to be restored.  (They can be used by the kernel code, which presents an
    additional complication when it's the kernel context that gets
    interrupted and has to be returned to.  To make this work, the kernel
    masks interrupt requests and avoids doing anything that could cause a
    fault when using those registers.)
    
    The above justifies the need for a kernel API to atomically restore
    saved userland state on AArch64 (and possibly other platforms that
    aren't x86).  Mach already has an API to set state of a thread, namely
    the thread_set_state() RPC; however, a thread calling thread_set_state()
    on itself is explicitly disallowed.  We have previously relaxed this
    restriction to allow setting i386_DEBUG_STATE and i386_FSGS_BASE_STATE
    on the current thread, so one way to address the need for such an API on
    AArch64 would be to also allow setting AARCH64_THREAD_STATE on the
    current thread.  That is what I have originally proposed and
    implemented.  Like the thread_set_self_state() trap implemented by this
    patch, the implementation of setting AARCH64_THREAD_STATE on the current
    thread needs to ensure that the set value of the x0 register does not
    get immediately overwritten with the return value of the mach_msg()
    trap.
    
    However, it's not only the return value of the mach_msg() trap that is
    important, but also the RPC reply message.  The thread_set_state() RPC
    should not generate a reply message when used for returning to an
    interrupted context, since there'd be nobody expecting the message.
    This could be achieved by special-casing that in the kernel as well, or
    (simpler) by userland not passing a valid reply port in the first place.
    Note that the implementation of sigreturn in glibc already uses the
    strategy of passing an invalid reply port for the last RPC is does
    before returning to the interrupted context (which is deallocating the
    reply port used by the signal handler).
    
    Not passing a valid reply port and consequently not blocking on awaiting
    the reply message works, since the way Mach is implemented, kernel RPCs
    are always executed synchronously when userland sends the request
    message (unless the routine implementation includes explicit asynchrony,
    as device RPCs do, and gsync_wait() should do, but currently doesn't),
    meaning the RPC caller never has to *wait* for the reply message, as one
    is produced immediately.  In other words, the mere act of invoking a
    kernel RPC (that does not involve explicit asynchrony) is enough to
    ensure it completes when mach_msg() returns, even if a reply message is
    not received (whether because an invalid reply port has been specified,
    or because MACH_RCV_MSG wasn't passed to mach_msg(), or because a
    message other than the kernel RPC's reply was received by the call).
    
    However, the same is not true when interposing is involved, and the
    thread's self port does not in fact point directly to the kernel, but to
    a userspace proxy of some sort.  The two primary examples of this are
    Hurd's rpctrace tool, which interposes all the task's ports and proxies
    all RPCs after tracing them, and Mach's old netmsg/netname server, which
    proxies ports and messages over network.  In this case, the actual
    implementation only runs once the request reaches the actual kernel, and
    not once the request message has been sent by the original caller, so it
    *is* necessary for the caller to await the reply message if it wants to
    make sure that the requested action has been completed.  This does not
    cause much issues for deallocation of a reply port on the sigreturn code
    path in glibc, since that only delays when the port is deallocated, but
    does not otherwise change the program behavior.  With
    thread_set_state(mach_thread_self()), however, this would be quite
    catastrophic, since the message-send would return back to the caller
    without changing its state, and the actual change of state would only
    happen at some later point.
    
    This issue is avoided nicely by turning the functionality into an
    explicit Mach trap rather than an RPC.  As it's not an RPC, it doesn't
    involve messaging, and doesn't need a reply port or a reply message.  It
    is always a direct call to the kernel (and not to any interposer), and
    it's always guaranteed to have completed synchronously once the trap
    returns.  That also means that the thread_set_self_state() call won't be
    visible to rpctrace or forwarded over network for netmsg, but this is
    fine, since all it does is sets thread state (i.e. register values); the
    thread could do the same on its own by issuing relevant machine
    instruction without involving any Mach abstractions (traps or RPCs) at
    all if it weren't for the need of atomicity.
    
    Finally, this new trap is unfortunately somewhat of a security concern
    (as any sigreturn-like functionality is in general), since it would
    potentially allow an attacker who already has a way to invoke a function
    with 3 controlled argument values to set the values of all registers to
    any desired values (sigreturn-oriented programming).  There is currently
    no mitigation for this other than the generic ones such as PAC and stack
    check guards.
    
    The limit of 150 used in the implementation has been chosen to be large
    enough to fit the largest thread state flavor so far, namely
    AARCH64_FLOAT_STATE, but small enough to not overflow the 4K stack.  If
    a new thread state flavor is added that is too big to fit on the stack,
    the implementation should be switched to use kalloc instead of on-stack
    storage.
    Message-ID: <20240415090149.38358-9-bugaevc@gmail.com>

commit db8dacb578b687574ba900298a4159c887dd18d0
Author: Sergey Bugaev <bugaevc@gmail.com>
Date:   Mon Apr 15 12:01:47 2024 +0300

    aarch64: Add thread state types
    
    Notes:
    * TPIDR_EL0, the TLS pointer, is included in the generic state directly.
    * TPIDR2_EL0, part of the SME extension, is not included in the generic
      state.  If we add SME support, it will be a part of something like
      aarch64_sme_state.
    * CPSR is not a real register in AArch64 (unlike in AArch32), but a
      collection of individually accessible bits and pieces from PSTATE.
      Due to how the kernel accesses user mode's PSTATE (via SPSR), it's
      convenient to represent PSTATE as a pseudo-register in the same
      format as SPSR.  This is also what QEMU and XNU do.
    * There is no hardware-enforced 'natural' order to place the registers
      in, since no registers get pushed onto the stack on exception entry.
      Saving and restoring registers from an instance of struct
      aarch64_thread_state is implemented entirely in software, and the
      format is essentially arbitrary.
    * aarch64_float_state includes registers of a 128-bit type; this may
      create issues for compilers other than GCC.
    * fp_reserved is not a register, but a placeholder.  If and when Arm
      adds another floating-point meta-register, this will be changed to
      represent it, and that would not be considered a compatibility break,
      so don't access fp_reserved by name, or its value, from userland.
      Instead, memset the whole structure to 0 if starting from scratch, or
      memcpy an existing structure.
    
    More thread state types could be added in the future, such as
    aarch64_debug_state, aarch64_virt_state (for hardware-accelerated
    virtualization), potentially ones for PAC, SVE/SME, etc.
    Message-ID: <20240415090149.38358-8-bugaevc@gmail.com>

commit cf8afe49af2c7c77dfc6701da70d6700f7e6f1b5
Author: Sergey Bugaev <bugaevc@gmail.com>
Date:   Mon Apr 15 12:01:46 2024 +0300

    aarch64: Add exception type definitions
    
    A few yet-unimplemented codes are also sketched out; these are included
    so you know roughly what to expect once the missing functionality gets
    implemented, but are not in any way stable or usable.
    Message-ID: <20240415090149.38358-7-bugaevc@gmail.com>

commit d37c7a3fdb2edf5d3865fbac547297651a9b1941
Author: Sergey Bugaev <bugaevc@gmail.com>
Date:   Mon Apr 15 12:01:45 2024 +0300

    aarch64: Add mach_aarch64 API
    
    This currently contains a single RPC to get Linux-compatible hwcaps,
    as well as the values of MIDR_EL1 and REVIDR_EL1 system registers.
    
    In the future, this is expected to host the APIs to manage PAC keys,
    and possibly some sort of AArch64-specific APIs for userland IRQ
    handlers.
    Message-ID: <20240415090149.38358-6-bugaevc@gmail.com>

commit fd2482bb3500602b8f747ab96df195b1d5c511d5
Author: Sergey Bugaev <bugaevc@gmail.com>
Date:   Mon Apr 15 12:01:44 2024 +0300

    aarch64: Add vm_param.h
    
    And make it so that the generic vm_param.h doesn't require the machine-
    specific one to define PAGE_SIZE etc.  We *don't* want a PAGE_SIZE
    constant to be statically exported to userland; instead userland should
    initialize vm_page_size by querying vm_statistics(), and then use
    vm_page_size.
    
    We'd also like to eventually avoid exporting VM_MAX_ADDRESS, but this is
    not feasible at the moment.  To make it feasible in the future, userland
    should try to avoid relying on the definition where possible.
    Message-ID: <20240415090149.38358-5-bugaevc@gmail.com>

commit a0ff799984cf2ed1a50c1161aabff3aaee622c64
Author: Sergey Bugaev <bugaevc@gmail.com>
Date:   Mon Apr 15 12:01:43 2024 +0300

    aarch64: Add public syscall ABI
    
    We use largely the same ABI as Linux: a syscall is invoked with the
    "svc #0" instruction, passing arguments the same way as for a regular
    function call.  Specifically, up to 8 arguments are passed in the x0-x7
    registers, and the rest are placed on the stack (this is only necessary
    for the vm_map() syscall).  w8 should contain the (negative) Mach trap
    number.  A syscall preserves all registers except for x0, which upon
    returning contains the return value.
    Message-ID: <20240415090149.38358-4-bugaevc@gmail.com>

commit b23e7718dd9942b27edfac9aee05f737d0e6922e
Author: Sergey Bugaev <bugaevc@gmail.com>
Date:   Mon Apr 15 12:01:42 2024 +0300

    aarch64: Add the basics
    
    This adds "aarch64" host support to the build system, along with some
    uninteresting installed headers. The empty aarch64/aarch64/ast.h header
    is also added to create the aarch64/aarch64/ directory (due to Git
    peculiarity).
    
    With this, it should be possible to run 'configure --host=aarch64-gnu'
    and 'make install-data' successfully.
    Message-ID: <20240415090149.38358-3-bugaevc@gmail.com>

commit b792f1eb08e34db89ac200a86ea7e6859a6b0668
Author: Sergey Bugaev <bugaevc@gmail.com>
Date:   Mon Apr 15 12:01:41 2024 +0300

    Add CPU_TYPE_ARM64
    
    This is distinct from CPU_TYPE_ARM, since we're going to exclusively use
    AArch64 / A64, which CPU_TYPE_ARM was never meant to support, and to
    match EM_AARCH64, which is also separate from EM_ARM.  CPU_TYPE_X86_64
    was similarly made distinct from CPU_TYPE_I386.
    
    This is named CPU_TYPE_ARM64 rather than CPU_TYPE_AARCH64, since AArch64
    is an "execution state" (analogous to long mode on x86_64) rather than a
    CPU type.  "ARM64" here is not a name of the architecture, but simply
    means an ARM CPU that is capable of (and for our case, will only really
    be) running in the 64-bit mode (AArch64).
    
    There are no subtypes defined, and none are expected to be defined in
    the future.  Support for individual features/extensions should be
    discovered by other means, i.e. the aarch64_get_hwcaps() RPC.
    Message-ID: <20240415090149.38358-2-bugaevc@gmail.com>

-----------------------------------------------------------------------

Summary of changes:
 Makefrag.am                                        |   3 +
 aarch64/Makefrag.am                                |  43 ++++++++
 kern/gnumach.srv => aarch64/aarch64/ast.h          |   8 +-
 .../aarch64/mach_aarch64.srv                       |   4 +-
 configfrag-first.ac => aarch64/configfrag.ac       |  26 ++---
 .../include/mach/aarch64/asm.h                     |  34 +++---
 .../include/mach/aarch64/boolean.h                 |  10 +-
 aarch64/include/mach/aarch64/exception.h           |  90 ++++++++++++++++
 .../include/mach/aarch64/kern_return.h             |  13 +--
 aarch64/include/mach/aarch64/mach_aarch64.defs     |  52 +++++++++
 aarch64/include/mach/aarch64/mach_aarch64_types.h  | 118 +++++++++++++++++++++
 .../include/mach/aarch64}/machine_types.defs       |  53 +++++----
 .../include/mach/aarch64/syscall_sw.h              |  25 +++--
 .../include/mach/aarch64/thread_status.h           |  48 +++++----
 .../include/mach/aarch64/vm_param.h                |  26 ++---
 .../include/mach/aarch64}/vm_types.h               |  70 ++++--------
 configure.ac                                       |   5 +
 include/mach/machine.h                             |   1 +
 include/mach/syscall_sw.h                          |   2 +
 include/mach/vm_param.h                            |   6 +-
 kern/ipc_mig.c                                     |  41 +++++++
 kern/ipc_mig.h                                     |   5 +
 kern/syscall_sw.c                                  |   2 +-
 tests/include/syscalls.h                           |   1 +
 24 files changed, 509 insertions(+), 177 deletions(-)
 create mode 100644 aarch64/Makefrag.am
 copy kern/gnumach.srv => aarch64/aarch64/ast.h (83%)
 copy kern/gnumach.srv => aarch64/aarch64/mach_aarch64.srv (89%)
 copy configfrag-first.ac => aarch64/configfrag.ac (65%)
 copy ipc/ipc_print.h => aarch64/include/mach/aarch64/asm.h (60%)
 copy i386/i386at/mem.h => aarch64/include/mach/aarch64/boolean.h (80%)
 create mode 100644 aarch64/include/mach/aarch64/exception.h
 copy device/device_init.h => aarch64/include/mach/aarch64/kern_return.h (74%)
 create mode 100644 aarch64/include/mach/aarch64/mach_aarch64.defs
 create mode 100644 aarch64/include/mach/aarch64/mach_aarch64_types.h
 copy {i386/include/mach/i386 => 
aarch64/include/mach/aarch64}/machine_types.defs (77%)
 mode change 100755 => 100644
 copy i386/i386/db_trace.h => aarch64/include/mach/aarch64/syscall_sw.h (67%)
 copy i386/i386at/model_dep.h => aarch64/include/mach/aarch64/thread_status.h 
(50%)
 copy ddb/db_write_cmd.h => aarch64/include/mach/aarch64/vm_param.h (60%)
 copy {i386/include/mach/i386 => aarch64/include/mach/aarch64}/vm_types.h (76%)


hooks/post-receive
-- 
GNU Mach
[Prev in Thread]
Current Thread
[Next in Thread]
[SCM] GNU Mach branch, master, updated. v1.8-917-gff6f2240, Samuel Thibault <=
Prev by Date: [SCM] Web pages branch, master, updated. 58e9e2a179e7c7a5e64f39d2a0d58f94a12bb097
Next by Date: [SCM] GNU Mach branch, master, updated. v1.8-918-g0396920c
Previous by thread: [SCM] Web pages branch, master, updated. 58e9e2a179e7c7a5e64f39d2a0d58f94a12bb097
Next by thread: [SCM] GNU Mach branch, master, updated. v1.8-918-g0396920c
Index(es):
- Date
- Thread