[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Future Direction of GNU Hurd?

From: William ML Leslie
Subject: Re: Future Direction of GNU Hurd?
Date: Mon, 15 Mar 2021 19:58:33 +1100

On Mon, 15 Mar 2021 at 05:19, Olaf Buddenhagen <olafbuddenhagen@gmx.net> wrote:
> On Thu, Feb 25, 2021 at 06:48:11PM +1100, William ML Leslie wrote:
> > I am still hopeful that someone will figure out how to do async right
> > at some point.  The seL4 people haven't figured it out, and I've got
> > three different approaches, all with their own serious drawbacks.
> Care to elaborate on these approaches? While I personally don't consider
> it a priority to settle on a specific design for this, it certainly
> wouldn't hurt to have some food for thought :-)

It's somewhat easy to stay as close to io_uring as possible, having
ring buffers for messages ready to be sent and messages to be
received.  As a minimum, the kernel can copy messages from send rings
to receive rings before resuming the process, and on a full receive
buffer, setting up a linked list of processes waiting to send as
normal in the synchronous case.  Some additional care needs to be
taken to meet the existing reliability and accounting guarantees that
the KeyKOS family provide, but the details aren't that interesting.

We're a bit spoiled in the capability space, though.  Many of our
protocols include the ability to send a message before its target or
arguments become available, as well as the ability to only have a
message sent on receipt or failure of a previous message.  This is
normally done for the sake of reducing round-trip network latency - a
different order of magnitude to the size of a context switch - but
without it, some useful performance improvements put unnecessary load
on the server.  For example, one of the operations we might want to
speed up is searching a variable like PATH for an executable.  We
might want to issue searches for each of the elements of PATH in
parallel.  This reduces the number of context switches required from 2
* N, where N is the element of PATH containing the executable, to 2,
assuming the message rings are large enough to hold all of the

However, if the filesystem server is running on a limited number of
threads, once it finds the executable, it is now checking paths and
clobbering disk cache for a client that doesn't care about the result.
It would be better if we could instruct the filesystem server not to
execute the search if a positive match has already been delivered.  In
this case, we are happy to let the filesystem server decide how it
will process the subsequent messages, but you can also imagine cases
where messages should not be delivered until the server has replied.
Perhaps, conditioned on some predicate on a returned capability, such
as "is this a real spacebank?", which is important when acting as a
system server.

We can of course push this into the application in a few ways.  We can
avoid solving the 2 * N problem and let the application either send a
bunch of requests or send them one at a time.  We could make the
application speak to the server using a protocol that supports this
natively, such as CapTP or CapnProto, which rely on the server to
correctly discard messages that were not intended to be delivered -
one issue with this approach is that we need to handle introduction as
we share native capabilities, which is tricky to account space for.
The third solution is to add logic to the kernel to perform further
sends when a message is received, and complicating the kernel at all
is frankly a bit scary (and how should we time-bound that?).

It may be that I can get away with sending several messages in one
context switch for only one direction (e.g. client to server) but not
in the other direction.  This would be unfortunate as there isn't
really anything special about clients vs servers so far, except that
reply endpoints are fabricated by a client and failures to send are
ignored by a server.

> > I am as convinced today as I was back when Marcus first ranted about
> > ownership and shopping trolleys that he was mostly wrong (12 years
> > ago? before I kept logs).
> I'm too lazy to dig through the archive (which BTW is also available on
> the web): but from my recollection of events, it must have been around
> 2006 or 2007.
> > If you're running an operating system locally, you absolutely want
> > (and already technically have) the ability to subvert any software
> > protections, including factories.  You may as well reify this as a
> > capability so that programs you nominate can peek into attempts at
> > mutually secure collaboration.
> >
> > If the computer you're operating on is some sort of shared hosting
> > situation, you probably want your provider to be audited /not/ to be
> > able to peek inside anything that you created using a factory.
> Well, there is likely nothing we can really do with the system design
> that could prevent others from creating and certifying a setup without
> an almighty admin. Whether it's a good idea or not to provide this
> ability by default is a different question... But actually mostly
> orthogonal as far as I can tell from the question whether we want to
> provide a privileged constructor mechanism.
> Part of the problem with this discussion is that to a large extent, it's
> actually a question of how we look at things. We inevitably do need the
> ability to invoke privileged services, that deal with resources we don't
> have direct access to; and in order to prevent DoS, we also inevitably
> need the ability to furbish such privileged services with resources
> provided by the invoker, but which the invoker can't access while they
> are in use by the service. No matter what we think of this from a
> philosophical standpoint: there is really no way around that. Also, most
> likely we want to give the invoker (and its ancestors, at least up to
> the user session) some amount of control, so they can abort the request,
> and reclaim the resources provided.
> Where things get contentious is how such privileged services are
> implemented exactly; how they are presented to the user; and how much
> the general system design builds upon this facility.
> A privileged constructor, as far as I understand it, is really just a
> type of privileged service, that creates a new child process to serve
> each request, and makes that process appear as if it was a child of the
> invoking process. (The Unix SUID mechanism is pretty much the same --
> apart from the mess with the inherited client environment...)
> The first issue is arguably that the creation of a process to service
> the request is just a technical detail, that should be transparent to
> the invoker. The invoker doesn't need the ability to see the process (or
> even know whether there is one at all): it only needs the ability to
> abort the request -- whether that involves killing a process or uses
> some other mechanism.
> More importantly though, I believe that it's a bad idea to make the
> process appear as a child of the invoker: giving the impression that it
> owns that process, when in truth it doesn't. It presents a false process
> hierarchy, that doesn't reflect the actual ownership relations. Not only
> is this misleading: but I feel that it encourages a system design that
> relies on this mechanism more than it should, needlessly disempowering
> the user.
> When I'm talking about a strictly hierachical system, I guess what I
> really mean is a system that strictly reflects the true ownership
> hierarchy.
> (I'd have to dig up and re-read the original discussion to be sure: but
> my guess would be that the point of Marcus's metaphor was simply to
> illustrate that we don't really own a process, unless we have full
> control over it...)

To a first approximation, I guess.  The constructor can take on a
similar role to a setuid binary in that it may have access to things
the user does not, but that is not unique to constructors really (it's
true of most running system services, also).  What a constructor
enables is local secure collaboration.  If we use the same machine,
then I can run a program that you've shared with me, and if you have
provided it as a constructor then I can check that it cannot leak any
data or capabilities I provide it back to you (or anyone else).
Similarly, you can know that I can't open your program up in the
debugger and force it to leak or misuse your authority.  The process
is "mine" in the sense that I have the authority to reclaim its

I don't imagine that this feature will be used that much between
different users on the same system, but more likely between two
different processes.  The *default* should be not to leak information
that isn't necessary to other applications.  For example, windowed
clients shouldn't by default get to learn about what other processes
are visible, or what brand of GPU is rendering them, or what
keybindings are not passed on by the window manager.  The aim should
be to limit any surveilance or fingerprinting to the level where it
must be explicitly operated on, namely, at the express request of the
user, not a random process running on their behalf or a broad root
user that is misused by everything under the sun.

> > > Excuse my ignorance: but isn't the single level store concept closely
> > > related to orthogonal persistence?
> I thought about it a bit more. In my understanding, "single-level store"
> simply describes the concept of treating main memory as a cache for the
> backing store.
> Technically, of course you could forego the persistence, i.e. don't keep
> the backing store (more or less) in sync with data in main memory --
> though I'm not sure whether the term still means much in that case...

Our disk drivers in the hurd have quite a few layers and libraries -
by contrast the EROS SLS is very simple, only really implementing the
block layer.  For the point of getting a set of hurd servers up and
connected, it's a lot easier to just use something like the SLS to
load everything needed and then demand-page everything else in without
the unix-like file abstraction.  It's a bit like an initrd.

Additionally, having pages persisted long before running out of unused
RAM means that deciding which pages to free is much easier: don't free
those that are dirty, and of those that remain, aim not to free those
recently mapped.  You don't immediately have to stop everything to do
the write-out, either.  I plan to do a little more on a per-process
basis.  A few turns before a process is scheduled, we make sure to
page-in anything the process is about to touch.  A capability provides
the means for a process to say which addresses it will soon access, in
addition to the instruction pointer.

An aside: I absolutely want to have optional orthogonal persistence
per-application.  Imagine having emacs up and ready to go the moment
you log into your machine.  Yes please.

> > The problem, however, is fork().  Fork is tricky because how much
> > memory do you account to the child?  If you need to account everything
> > mapped COW into the address space of the parent, then large processes
> > won't be able to start other processes, even if those children would
> > immediately exec.  If you don't account everything COW in the parent,
> > then you have a situation where a process can run out of memory
> > without explicitly allocating it.
> >
> > What I would personally like is precise accounting by default (as in
> > the SLS), and lazy accounting on fork by request.  That way I could
> > still run firefox when needed, yet have a system that is much more
> > stable under memory load than anything traditional.
> I'm not convinced this would actually improve stability...
> But more to the point: why would that be harder to implement with a
> traditional filesystem? Unless I'm missing something, this seem like a
> pretty simple change to make...

It's a pretty pervasive change to make, as it impacts paging logic,
allocation, and the exec server in interesting ways.  Maybe it's not
that big a deal?  It just looks easier to me using the EROS paging

> > Also: running the hurd directly out of the SLS means I avoid the need
> > to bootstrap it the same way we do on Mach: dynamic linker needs the
> > filesystem, so filesystem must be statically linked.  TBF all my
> > programs so far are statically linked, but having a really simple
> > filesystem like SLS to strap from instead of ext2 is very convenient.
> I don't see a fundamental difference... Whether it's ext2, SLS, or some
> sort of RAM disk: in each case you need a driver that can be loaded by
> the bootloader?...

It's just a matter of complexity.  The various pieces that implement
the SLS are less than 5000 lines, wheras libstore is over 7000 on its
own; libdiskfs 12000, and then libext2 on top of that.  But yes, it's
somewhat like an initrd.

William Leslie

Q: What is your boss's password?
A: "Authentication", clearly

Likely much of this email is, by the nature of copyright, covered
under copyright law.  You absolutely MAY reproduce any part of it in
accordance with the copyright law of the nation you are reading this
in.  Any attempt to DENY YOU THOSE RIGHTS would be illegal without
prior contractual agreement.

reply via email to

[Prev in Thread] Current Thread [Next in Thread]