qemu-stable
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-stable] [Qemu-devel] [PATCH v3 6/7] qdev: Protect device-list-


From: Markus Armbruster
Subject: Re: [Qemu-stable] [Qemu-devel] [PATCH v3 6/7] qdev: Protect device-list-properties against broken devices
Date: Mon, 28 Sep 2015 10:11:54 +0200
User-agent: Gnus/5.13 (Gnus v5.13) Emacs/24.5 (gnu/linux)

Thomas Huth <address@hidden> writes:

> On 25/09/15 16:17, Markus Armbruster wrote:
>> Thomas Huth <address@hidden> writes:
>> 
>>> On 24/09/15 20:57, Markus Armbruster wrote:
>>>> Several devices don't survive object_unref(object_new(T)): they crash
>>>> or hang during cleanup, or they leave dangling pointers behind.
>>>>
>>>> This breaks at least device-list-properties, because
>>>> qmp_device_list_properties() needs to create a device to find its
>>>> properties.  Broken in commit f4eb32b "qmp: show QOM properties in
>>>> device-list-properties", v2.1.  Example reproducer:
>>>>
>>>>     $ qemu-system-aarch64 -nodefaults -display none -machine none
>>>> -S -qmp stdio
>>>>     {"QMP": {"version": {"qemu": {"micro": 50, "minor": 4,
>>>> "major": 2}, "package": ""}, "capabilities": []}}
>>>>     { "execute": "qmp_capabilities" }
>>>>     {"return": {}}
>>>>     { "execute": "device-list-properties", "arguments": {
>>>> "typename": "pxa2xx-pcmcia" } }
>>>>     qemu-system-aarch64: /home/armbru/work/qemu/memory.c:1307:
>>>> memory_region_finalize: Assertion `((&mr->subregions)->tqh_first
>>>> == ((void *)0))' failed.
>>>>     Aborted (core dumped)
>>>>     [Exit 134 (SIGABRT)]
>>>>
>>>> Unfortunately, I can't fix the problems in these devices right now.
>>>> Instead, add DeviceClass member cannot_even_create_with_object_new_yet
>>>> to mark them:
> ...
>>>>  static void pxa2xx_pcmcia_register_types(void)
>>>> diff --git a/hw/ppc/spapr_rng.c b/hw/ppc/spapr_rng.c
>>>> index ed43d5e..e1b115d 100644
>>>> --- a/hw/ppc/spapr_rng.c
>>>> +++ b/hw/ppc/spapr_rng.c
>>>> @@ -169,6 +169,11 @@ static void spapr_rng_class_init(ObjectClass *oc, 
>>>> void *data)
>>>>      dc->realize = spapr_rng_realize;
>>>>      set_bit(DEVICE_CATEGORY_MISC, dc->categories);
>>>>      dc->props = spapr_rng_properties;
>>>> +
>>>> +    /*
>>>> +     * Reason: crashes device-introspect-test for unknown reason.
>>>> +     */
>>>> +    dc->cannot_even_create_with_object_new_yet = true;
>>>>  }
>>>
>>> Please don't do that! That breaks the help output from
>>> "-device spapr-rng,?" which should help the user to see how to use this
>>> device!
>> 
>> Well, device-introspection-test makes qemu crash, with the backtrace
>> pointing squarely to this device.  Stands to reason that device
>> introspection could crash in normal usage, too.  Until the crash is
>> debugged, we better disable introspection of this device.
>> 
>> I quite agree that disabling introspection hurts users.  Just not as
>> much as crashes :)
>> 
>>> I tried to debug why this device breaks the test, but the test
>>> environment is giving me a hard time ... how do you best hook a gdb into
>>> that framework, so you can trace such problems?
>>> Anyway, with some trial and error, I found out that it seems like the
>>>
>>>   object_resolve_path_type("", TYPE_SPAPR_RNG, NULL)
>>>
>>> in spapr_rng_instance_init() is causing the problems. Could it be that
>>> object_resolve_path_type is not working with the test environment?
>> 
>> I tried to figure out why this device breaks under this test, but
>> couldn't, so I posted with the "for unknown reason" comment.
>
> I've debugged this now for a while (thanks for the tip with
> MALLOC_PERTURB, by the way!) and it seems to me that the problem is in
> the macio object than in spapr-rng - the latter is just the victim of
> some memory corruption caused by the first one: The
> object_resolve_path_type() crashes while trying to go through the macio
> object.
>
> So could you please add the "dc->cannot_even_create_with_object_new_yet
> = true;" to macio_class_init() instead? ... that seems to fix the crash
> for me, too, and is likely the better place.

Hmm.

For most of the devices my patch marks, we have a pretty good idea on
what's wrong with them.  spapr-rng is among the exceptions.  You believe
it's actually "the macio object".  Which one?  "macio" is abstract...

You report introspecting "spapr-rng" crashes "while trying to go through
the macio object".  I wonder how omitting introspection of macio objects
(that's what marking them does to this test) could affect the object
we're going through when we crash.

> Or maybe we could get this also fixed? The problem could be the
> memory_region_init(&s->bar, NULL, "macio", 0x80000) in
> macio_instance_init() ... is this ok here? Or does this rather have to
> go to the realize() function instead?

Hmm, does creating and destroying a macio object leave the memory region
behind?

Paolo, is calling memory_region_init() in an instance_init() method
okay?

If yes, where should they be destroyed, and how?

If no, we should search for the erroneous pattern and mark the
offenders.

Some more evidence for macio's culpability: valgrind lets me happily
introspect spapr-rng as often as I want, but once I introspected
macio-newworld, further introspection of spapr-rng throws "Invalid read"
errors.



reply via email to

[Prev in Thread] Current Thread [Next in Thread]