qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] Design of the blobstore [API of the NVRAM]


From: Stefan Hajnoczi
Subject: Re: [Qemu-devel] Design of the blobstore [API of the NVRAM]
Date: Fri, 16 Sep 2011 11:35:17 +0100
User-agent: Mutt/1.5.21 (2010-09-15)

On Thu, Sep 15, 2011 at 08:34:55AM -0400, Stefan Berger wrote:
> On 09/15/2011 07:17 AM, Stefan Hajnoczi wrote:
> >On Wed, Sep 14, 2011 at 6:05 PM, Stefan Berger
> ><address@hidden>  wrote:
> >>  One property of the blobstore is that it has a certain required size for
> >>accommodating all blobs of device that want to store their blobs onto. The
> >>assumption is that the size of these blobs is know a-priori to the writer of
> >>the device code and all devices can register their space requirements with
> >>the blobstore during device initialization. Then gathering all the
> >>registered blobs' sizes plus knowing the overhead of the layout of the data
> >>on the disk lets QEMU calculate the total required (minimum) size that the
> >>image has to have to accommodate all blobs in a particular blobstore.
> >Libraries like tdb or gdbm come to mind.  We should be careful not to
> >reinvent cpio/tar or FAT :).
> Sure. As long as these dbs allow to over-ride open(), close(),
> read(), write() and seek() with bdrv ops we could recycle any of
> these. Maybe we can build something smaller than those...
> >What about live migration?  If each VM has a LUN assigned on a SAN
> >then these qcow2 files add a new requirement for a shared file system.
> >
> Well, one can still block-migrate these. The user has to know of
> course whether shared storage is setup or not and pass the
> appropriate flags to libvirt for migration. I know it works (modulo
> some problems when using encrypted QCoW2) since I've been testing
> with it.
> 
> >Perhaps it makes sense to include the blobstore in the VM state data
> >instead?  If you take that approach then the blobstore will get
> >snapshotted *into* the existing qcow2 images.  Then you don't need a
> >shared file system for migration to work.
> >
> It could be an option. However, if the user has a raw image for the
> VM we still need the NVRAM emulation for the TPM for example. So we
> need to store the persistent data somewhere but raw is not prepared
> for that. Even if snapshotting doesn't work at all we need to be
> able to persist devices' data.
> 
> 
> >Can you share your design for the actual QEMU API that the TPM code
> >will use to manipulate the blobstore?  Is it designed to work in the
> >event loop while QEMU is running, or is it for rare I/O on
> >startup/shutdown?
> >
> Everything is kind of changing now. But here's what I have right now:
> 
>     tb->s.tpm_ltpms->nvram = nvram_setup(tpm_ltpms->drive_id, &errcode);
>     if (!tb->s.tpm_ltpms->nvram) {
>         fprintf(stderr, "Could not find nvram.\n");
>         return errcode;
>     }
> 
>     nvram_register_blob(tb->s.tpm_ltpms->nvram,
>                         NVRAM_ENTRY_PERMSTATE,
>                         tpmlib_get_prop(TPMPROP_TPM_MAX_NV_SPACE));
>     nvram_register_blob(tb->s.tpm_ltpms->nvram,
>                         NVRAM_ENTRY_SAVESTATE,
>                         tpmlib_get_prop(TPMPROP_TPM_MAX_SAVESTATE_SPACE));
>     nvram_register_blob(tb->s.tpm_ltpms->nvram,
>                         NVRAM_ENTRY_VOLASTATE,
> tpmlib_get_prop(TPMPROP_TPM_MAX_VOLATILESTATE_SPACE));
> 
>     rc = nvram_start(tpm_ltpms->nvram, fail_on_encrypted_drive);
> 
> Above first sets up the NVRAM using the drive's id. That is the
> -tpmdev ...,nvram=my-bs, parameter. This establishes the NVRAM.
> Subsequently the blobs to be written into the NVRAM are registered.
> The nvram_start then reconciles the registered NVRAM blobs with
> those found on disk and if everything fits together the result is
> 'rc = 0' and the NVRAM is ready to go. Other devices can than do the
> same also with the same NVRAM or another NVRAM. (NVRAM now after
> renaming from blobstore).
> 
> Reading from NVRAM in case of the TPM is a rare event. It happens in
> the context of QEMU's main thread:
> 
>     if (nvram_read_data(tpm_ltpms->nvram,
>                         NVRAM_ENTRY_PERMSTATE,
> &tpm_ltpms->permanent_state.buffer,
> &tpm_ltpms->permanent_state.size,
>                         0, NULL, NULL) ||
>         nvram_read_data(tpm_ltpms->nvram,
>                         NVRAM_ENTRY_SAVESTATE,
> &tpm_ltpms->save_state.buffer,
> &tpm_ltpms->save_state.size,
>                         0, NULL, NULL))
>     {
>         tpm_ltpms->had_fatal_error = true;
>         return;
>     }
> 
> Above reads the data of 2 blobs synchronously. This happens during startup.
> 
> 
> Writes are depending on what the user does with the TPM. He can
> trigger lots of updates to persistent state if he performs certain
> operations, i.e., persisting keys inside the TPM.
> 
>     rc = nvram_write_data(tpm_ltpms->nvram,
>                           what, tsb->buffer, tsb->size,
>                           VNVRAM_ASYNC_F | VNVRAM_WAIT_COMPLETION_F,
>                           NULL, NULL);
> 
> Above writes a TPM blob into the NVRAM. This is triggered by the TPM
> thread and notifies the QEMU main thread to write the blob into
> NVRAM. I do this synchronously at the moment not using the last two
> parameters for callback after completion but the two flags. The
> first is to notify the main thread the 2nd flag is to wait for the
> completion of the request (using a condition internally).
> 
> Here are the protos:
> 
> VNVRAM *nvram_setup(const char *drive_id, int *errcode);
> 
> int nvram_start(VNVRAM *, bool fail_on_encrypted_drive);
> 
> int nvram_register_blob(VNVRAM *bs, enum NVRAMEntryType type,
>                         unsigned int maxsize);
> 
> unsigned int nvram_get_totalsize(VNVRAM *bs);
> unsigned int nvram_get_totalsize_kb(VNVRAM *bs);
> 
> typedef void NVRAMRWFinishCB(void *opaque, int errcode, bool is_write,
>                              unsigned char **data, unsigned int len);
> 
> int nvram_write_data(VNVRAM *bs, enum NVRAMEntryType type,
>                      const unsigned char *data, unsigned int len,
>                      int flags, NVRAMRWFinishCB cb, void *opaque);
> 
> 
> As said, things are changing right now, so this is to give an impression...

Thanks, these details are interesting.  I interpreted the blobstore as a
key-value store but these example show it as a stream.  No IDs or
offsets are given, the reads are just performed in order and move
through the NVRAM.  If it stays this simple then bdrv_*() is indeed a
natural way to do this - although my migration point remains since this
feature adds a new requirement for shared storage when it would be
pretty easy to put this stuff in the vm data stream (IIUC the TPM NVRAM
is relatively small?).

Stefan



reply via email to

[Prev in Thread] Current Thread [Next in Thread]