Re: [Qemu-devel] Re: [PATCH v2 3/7] docs: Add QED image format specifica

qemu-devel

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] Re: [PATCH v2 3/7] docs: Add QED image format specifica

From:	Anthony Liguori
Subject:	Re: [Qemu-devel] Re: [PATCH v2 3/7] docs: Add QED image format specification
Date:	Mon, 11 Oct 2010 11:10:57 -0500
User-agent:	Mozilla/5.0 (X11; U; Linux x86_64; en-US; rv:1.9.1.12) Gecko/20100915 Lightning/1.0b1 Thunderbird/3.0.8

On 10/11/2010 11:02 AM, Avi Kivity wrote:

 On 10/11/2010 05:49 PM, Anthony Liguori wrote:
On 10/11/2010 09:58 AM, Avi Kivity wrote:
A leak is unacceptable. It means an image can grow to an unboundedsize. If you are a server provider offering multitenancy, then amalicious guest can potentially grow the image beyond it's allottedsize causing a Denial of Service attack against another tenant.
This particular leak cannot grow, and is not controlled by the guest.
As the image gets moved from hypervisor to hypervisor, it can keepgrowing if given a chance to fill up the disk, then trim it all way.
In a mixed hypervisor environment, it just becomes a numbers game.
I don't see how it can grow. Both the freelist and the clusters itpoints to consume space, which becomes a leak once you move it to ahypervisor that doesn't understand the freelist. The older hypervisorthen allocates new blocks. As soon as it performs a metadata scan (ifever), the freelist is reclaimed.


Assume you don't ever do a metadata scan (which is really our design point).

If you move to a hypervisor that doesn't support it, then move to ahypervisor that does, you create a brand new freelist and start leakingmore space. This isn't a contrived scenario if you have a cloudenvironment with a mix of hosts.

You might not be able to get a ping-pong every time you provision, butwith enough effort, you could create serious problems.

It's really an issue of correctness. Making correctness trade-offs forthe purpose of compatibility is a policy decision and not something weshould bake into an image format. If a tool feels strongly that it's areasonable trade off to make, it can always fudge the feature bits itself.

A freelist has to be a non-optional feature. When the freelist bitis set, an older QEMU cannot read the image. If the freelist iscompleted used, the freelist bit can be cleared and the image isthen usable by older QEMUs.
Once we support TRIM (or detect zeros) we'll never have a cleanfreelist.
Zero detection doesn't add to the free list.
Why not? If a cluster is zero filled, you may drop it (assuming nobacking image).

Sorry, I was thinking about the case of copy-on-read. When youtransition from UCE -> ZCE, nothing gets added to the free list. But ifyou go from allocated -> ZCE, then you would add to the free list.

A potential solution here is to treat TRIM a little differently thanwe've been discussing.
When TRIM happens, don't immediately write an unallocated clusterentry for the L2. Leave the L2 entry in-tact. Don't actually writea UCE to the L2 until you actually allocate the block.
This implies a cost because you'll need to do metadata syncs to makethis work. However, that eliminates leakage.
The information is lost on shutdown; and you can have a large numberof unallocated-in-waiting clusters (like a TRIM issued by mkfs, or auser expecting a visit from RIAA).
A slight twist on your proposal is to have an allocated-but-may-dropbit in a L2 entry. TRIM or zero detection sets the bit (leaving thecluster number intact). A following write to the cluster needs toclear the bit; if we reallocate the cluster we need to replace it witha ZCE.

Yeah, this is sort of what I was thinking. You would still want a freelist but it becomes totally optional because if it's lost, no data isleaked (assuming that the older version understands the bit).

I was suggesting that we store that bit in the free list though becausethat let's us support having older QEMUs with absolutely no knowledgestill work.

This makes the freelist all L2 entries with the bit set; it may beless efficient than a custom data structure though.

We still want the freelist to avoid recreating it. We also want tostore the allocated-but-may-drop bit in the free list.


Regards,

Anthony Liguori

[Prev in Thread]

Current Thread

[Next in Thread]

[Qemu-devel] Re: [PATCH v2 3/7] docs: Add QED image format specification, (continued)

Prev by Date: [Qemu-devel] [PATCH V5 14/14] acpi-piix4: Add Xen hypercall for sleep state.
Next by Date: Re: [Qemu-devel] Re: [PATCH v2 3/7] docs: Add QED image format specification
Previous by thread: Re: [Qemu-devel] Re: [PATCH v2 3/7] docs: Add QED image format specification
Next by thread: Re: [Qemu-devel] Re: [PATCH v2 3/7] docs: Add QED image format specification
Index(es):
- Date
- Thread