[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] Storing code caching

From: Martin Williams
Subject: Re: [Qemu-devel] Storing code caching
Date: Thu, 8 Jul 2004 18:57:22 +0100


My initial idea was that once code is loaded (as I understand it - please correct me if I am utterly wrong) is that when a program loads, the memory is uses starts at a certain location, and that all code inside the program is consistently located at a relative address from that location. (Could someone who really knows about this sort of thing please advise?!?)

My idea is to write a program that caches individual files code (rather than everything) - based around the idea that when a block is started executing, the cache would be accessed and address minus the base address (in other words the offset of the block) would be used to find it within the cache (some algorithm is needed for an efficient method of storing and locating these blocks as they will not be the same size as the originals). The basic idea would then be that once qemu detects a self modifying piece of code, (by a write to a memory address), it would then black list the block in which the write happened (is this possible?).

The program I would write would basically use the qemu core to process an entire executable, creating the blocks that are executable on the host machine, and store them. Then start work on modifying qemu to recognise the existense of the cache file and use the blocks. Then deal with the self-modyfing code issue as above ...


PS - I'm a CS undergrad, but I'm game for it anyway :)

On 8 Jul 2004, at 18:05, John R. Hogerhuis wrote:

On Thu, 2004-07-08 at 05:26, Martin Williams wrote:
Has anyone thought about trying to store the code caching on disk?

Are you talking about "save machine state" essentially "suspend/resume?"
That is certainly possible and I believe it has been discussed on the

The other possibility, that you wish to permanently associate
untranslated code with translated code by having a big cache available
on disk is in the general case "the halting problem" and there can be no
algorithm for that. So you've been warned: There Be Dragons Here...

However this is real life so there are probably some things you can do.

Some things to understand:

1. Basic blocks of code in the cache are found by their addresses in
memory, not their content. You can imagine that from one run to the next
code would load in different spots in memory. I suppose you could come
up with a set of heuristics for recognizing a basic block:
a) the location is not permanent but it might be a good clue. Perhaps
though with virtual address space programs always locate to the same
place in a virtual map though they will be different spots in physical
b) the length of the block never changes. That could be a good heuristic c) A checksum of the code with consideration for absolute addresses that
have been "fixed up" in the code. These addresses may be different from
run-to-run. Remember though adding in a checksum is an efficiency
tradeoff. It may not be worth it.
d) self modifying code, self modifying code, self modifying code...

In coming up with heuristics for recognizing already translated code
available in the cache, remember you are trading off against just
retranslating. Depending on the complexity/resource intensivity of
computations for your heuristic it may not be worth it to do the

If you think hard about it there are probably some things you could do
efficiently to reuse basic blocks from previous runs. "User mode" QEMU
is probably an easier case than the general one of running an entire OS
image. And maybe you would want to look at load time... When given a
program to run you check your on disk cache to see if you have loaded
this program before. Checksum it once to see if you have already saved a
cache image for this program. If so, load it up. Encountering
dynamically translated (invalidated cache) portions of the code will
result in "dead areas" which should never be cached.

Anyway an interesting problem for a grad student, I'd say... you have
some prototyping/analysis to do in order to come up with some heuristics
for matching up real code with cached code.

-- John.

Qemu-devel mailing list

reply via email to

[Prev in Thread] Current Thread [Next in Thread]