[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: self-paging

From: Bas Wijnen
Subject: Re: self-paging
Date: Mon, 5 Dec 2005 12:16:39 +0100
User-agent: Mutt/1.5.11

On Sun, Dec 04, 2005 at 01:49:34PM -0500, Jonathan S. Shapiro wrote:
> On Sat, 2005-12-03 at 09:30 +0100, Bas Wijnen wrote:
> > On Fri, Dec 02, 2005 at 01:41:53PM -0500, Jonathan S. Shapiro wrote:
> > > 
> > > 2. Tuning knobs: app gives jaggy video, user turns a quality knob.
> > 
> > What I mean by settings for the user really is 2, except that 2 seems very
> > window-oriented.  The point I'm making is that the user may want something
> > to get resources which isn't in the foreground.  For real-time video this
> > is obviously not the case, but computationally intensive tasks are often
> > uninteresting until they're done.  Consider a fractal generator.  If you
> > want generation to be fast, you minimize the window so it doesn't waste
> > time on drawing intermediate states.  It is not desirable if that means
> > the generation is even slower because much of its resources are revoked.
> I don't agree. First, there is no reason to imagine that tuning knobs are
> limited to visible windows, so I don't really agree with that concern.

Ok, in that case my "settings" really is your option 2.

> Second, I don't agree with the background fractal example. If the user puts
> the fractal generator in the background and then starts a movie, they are
> saying "my attention is on the movie". I think that the principle here is
> that real time guarantees have to do with responding in a timely fashion
> where the focus of attention is. The background fractal problem may be a
> motivator for a high priority, but it is *not* a motivator for a real time
> guarantee.

Oh, I wasn't talking about real-time guarantees.  I was just saying that the
user should have the option to specify that one process is the most important
of all, and it should not be hindered in the sense that it is swapped out.
There should be similar settings for processor time, but I wasn't talking
about them.  If the user specifies limits which are so high that the movie
cannot get a real-time guarantee anymore, then the movie shouldn't get it
(until the user changes the settings).

> > > I'm curious what application you have in mind where you feel that the
> > > resource should be dynamically adaptive without user intervention. I can
> > > think of some, but for most of them I end up concluding that dynamic
> > > adaptivity isn't well motivated.
> > > 
> > > So: what application are you thinking about?
> > 
> > Many applications need resources only every now and then.  For example a
> > game may need much more memory to generate a level than to play it.  It is
> > impractical if the user needs to manually change the quota every time a
> > new level is generated.  However, it is unacceptable if the game gets the
> > physical memory reservation all the time, even when there is pressure.
> This is not a good motivator. Level generation is not a real-time activity.
> At worst, it is an interactive-time activity. Why should it involve a
> residency quota at all?

In my proposal, every process (or address space, really) has a physical memory
quota.  It may be 0, which means it is fully swapped out.  A process cannot
have more pages in memory than it has quota.  When the quota shrinks, there
are immediately pages swapped out (if the process was using its full quota).
Because we don't want to ask the process which pages that should be (and in
particular, we don't want to wait for an answer), the answer has to be
prepared beforehand.  That's what the list is all about.

So the game needs a quota in this system, because it cannot run at all without
one.  This quota should enlarge at level generation (in this example where
that takes much more memory than the game itself), but it musn't be so large
all the time.

Why must this be without user intervention?  Because it'd be very annoying for
the user to turn the knobs all the time.  What the user wants to do is give
the game high priority.  Then the system should just give it the memory it
wants when it asks for it (at the cost of others, which have a lower
priority).  But when it no longer needs the memory, the quota must decrease,
thus giving the memory back to others (remember the sum of quota is never
larger than the actual physical memory in the machine) while the game is

> > This is true, but things I do in the background aren't usually movies.
> > Movies are a typical example of something which may be completely stopped
> > when not visible.  Many other applications though (computationally
> > intensive things like simulations) can do very useful things of which the
> > user will only want to see the end result.  If they need to be in the
> > foreground in order to keep their resources, that very much stimulates
> > single tasking.
> I understand that you want these things to be fast. That is not the same as
> wanting them to be real-time.

Indeed.  I wasn't talking about real-time (although giving it all the memory
it asks for may hinder other applications which do want real-time), but about
getting the largest part of resources, in particular physical memory.  (But as
I said before, it's thinkable to make the CPU priority and the memory priority
the same variable).

The difference is between ping-time and throughput on a network line.  I was
talking about giving the process high throughput, not about giving it a low

> > For example the user may start a game of mine sweeper waiting for mp32ogg
> > to finish.  This doesn't mean that mine sweeper is more important than
> > mp32ogg.
> Actually, it does. If minesweeper's interaction is inadequate, it doesn't
> satisfy. In this example, minesweeper needs a guaranteed slice (though it
> doesn't need to be very big).

If the user set the mp32ogg priority to "highest sensible value", she will be
prepared for bad performance of other processes.  If she doesn't like that,
she'll choose "very high" instead.

> The problem here underneath it all is that you are imagining a policy
> defined by the user about where you want resources to go, but you aren't
> proposing a way for the user to articulate the policy, and the system simply
> isn't able to guess the answer. If you want to be able to articulate this,
> you're going to need a tuning interface somewhere that says "it's really
> important to me that mp32ogg finishes soon, and I'm willing to tolerate a
> low render rate in doom3 until it does."

Yes, that's exactly what I propose, and I think it's also what you propose, so
I think we agree on that. :-)

> > During this thread I got more and more the idea that it isn't cumbersome
> > at all, really. :-)  The program needs to request a page anyway.  So add
> > an extra argument to that call which specifies where that page is in the
> > list...
> I think that you are not considering the kernel-level implications well
> enough.

I wasn't thinking kernel-level at all, I'd think this would all be in user
space (in physmem and some global pager).  Is it just that those tasks need to
be in the kernel for you, or are we misunderstanding each other?  I'll assume
the former for now, that you need physmem and the global pager in the kernel.
I'd be interested to know why though.

> I'm not sure which design is better. Here is what I do know:
>   2. The design that you propose is tricky, because it involves attaching
>   ordering state to each frame. This is not as easy as it sounds. What
>   happens when you and I share the same frame, and you say "put it in
>   position 4" and I say "put it in position 10"? Where does the position
>   info get stored? Hint: can't be done in the kernel, and then we are going
>   to get a reductio problem here.

First of all, note that there is a list of pages per process.  So "position 4"
is meaningless to the process who puts it at position 10 in its own list.  The
page will be stored in two lists, at its own position for each.

Every space bank has a certain number of "active" pages.  Those can be of
three types:
- currently in memory
- currently in swap
- currently nonexistant ("swapped-out" cache)
The third one exists because the process doesn't need to reallocate it when it
is "swapped out", it can just remap it, however there's no guarantee about the

I would think that the space bank is typically the place to store the priority
list, for several reasons:
- It is part of the TCB, so it is allowed to manipulate it.
- It has a list of pages anyway, the struct it holds just needs an extra

Shared memory is a pain in general.  I didn't think about it within this
scheme before, but here's an idea for how to solve it: Any process with access
to a page of shared memory gives it a position in its list of pages.  It will
simply cost it some of its quota.  The sum of all quota does in fact become
larger than the total amount of physical memory.  Now when one process
rearranges its list in a way that would swap out the shared page, and replace
it with a non-shared one, its quota will immediately drop by one if there
isn't any memory to put the new page (that is, if all pages were in use).
Later, this may be readjusted as usual.

The question is if this opens any covert channels.  The only realistic one
that I can think of is a file server giving read-only access to multiple
processes.  Then one process can see if the memory is still shared by swapping
it out and see if its quota drops.  I'm not sure how well this would work in

> > > I think the application should instead define working set contexts for
> > > regions in its virtual address space, and then define a replacement
> > > policy for each context. The association between a frame and its
> > > dominating context is maintained within the OS, and is not explicitly
> > > disclosed to the application.
> > 
> > I don't quite see what you're saying here.  If the policy is not disclosed
> > to the application, then it isn't the application which sets it I suppose?
> > Who does set it then?  It will be a property of the space bank, which
> > isn't controlled by the parents of the process.
> It is not a property of the space bank. It is a property of the kernel.
> What I would actually propose initially is that the kernel implements
> approximate LRU w.r.t. a pool, and the application manages contention by
> keeping separate pools separate.

So you do indeed assume that the kernel does the paging to swap.

I think your approach forces policy on processes, except if it's very
efficient to have one page per pool.  Otherwise processes which really don't
want LRU will be penalized (either by using LRU anyway, or by the inefficiency
of the one page per pool.

> > I'm not really sure what you mean, but I don't think I like it. ;-) Adding
> > a layer of contexts with different behaviour on top may in some cases be a
> > good idea, but I'd do that in the pager (which does most of the list
> > rearranging), only for pagers which want to make use of it.  Otherwise it
> > just adds extra baggage to processes which don't use it.
> You can't do this in the pager. The pager itself may need to remain
> resident. The residency requirement needs kernel support at the bottom
> level. The problem I have with your proposal is mainly that you haven't
> connected the dots that far and I can't see how to implement it.

I did think about it, but considered it a too small detail to mention (to keep
the discussion focussed).  Anyway, now you're asking: Every address space
should be able to specify a minimum quota.  Whenever its actual quota drops
below that value, it doesn't want to run at all.  Typically the pager (if it
is in the same address space as the process) will be somewhere below that
minimum in the list, so it can handle page faults due to quota decreases for
the process.  If a process puts its pager above it, then the pager's pager
will be called (assuming L4 style pagers), and so on until there is one in
memory, or there is none.  (If pagers don't have pagers, assume "there is
none" always.)  If there is no pager, the process will be stopped, only to be
resumed when it has enough quota again to let its pager run.


I encourage people to send encrypted e-mail (see http://www.gnupg.org).
If you have problems reading my e-mail, use a better reader.
Please send the central message of e-mails as plain text
   in the message body, not as HTML and definitely not as MS Word.
Please do not use the MS Word format for attachments either.
For more information, see

Attachment: signature.asc
Description: Digital signature

reply via email to

[Prev in Thread] Current Thread [Next in Thread]