[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Qemu-ppc] [Qemu-devel] [PATCH v3 6/7] qdev: Protect device-list-pro
From: |
Markus Armbruster |
Subject: |
Re: [Qemu-ppc] [Qemu-devel] [PATCH v3 6/7] qdev: Protect device-list-properties against broken devices |
Date: |
Mon, 28 Sep 2015 21:36:10 +0200 |
User-agent: |
Gnus/5.13 (Gnus v5.13) Emacs/24.5 (gnu/linux) |
Markus Armbruster <address@hidden> writes:
> Thomas Huth <address@hidden> writes:
>
>> On 28/09/15 10:11, Markus Armbruster wrote:
>>> Thomas Huth <address@hidden> writes:
>>>
>>>> On 25/09/15 16:17, Markus Armbruster wrote:
>>>>> Thomas Huth <address@hidden> writes:
>>>>>
>>>>>> On 24/09/15 20:57, Markus Armbruster wrote:
>>>>>>> Several devices don't survive object_unref(object_new(T)): they crash
>>>>>>> or hang during cleanup, or they leave dangling pointers behind.
>>>>>>>
>>>>>>> This breaks at least device-list-properties, because
>>>>>>> qmp_device_list_properties() needs to create a device to find its
>>>>>>> properties. Broken in commit f4eb32b "qmp: show QOM properties in
>>>>>>> device-list-properties", v2.1. Example reproducer:
>>>>>>>
>>>>>>> $ qemu-system-aarch64 -nodefaults -display none -machine none
>>>>>>> -S -qmp stdio
>>>>>>> {"QMP": {"version": {"qemu": {"micro": 50, "minor": 4,
>>>>>>> "major": 2}, "package": ""}, "capabilities": []}}
>>>>>>> { "execute": "qmp_capabilities" }
>>>>>>> {"return": {}}
>>>>>>> { "execute": "device-list-properties", "arguments": {
>>>>>>> "typename": "pxa2xx-pcmcia" } }
>>>>>>> qemu-system-aarch64: /home/armbru/work/qemu/memory.c:1307:
>>>>>>> memory_region_finalize: Assertion `((&mr->subregions)->tqh_first
>>>>>>> == ((void *)0))' failed.
>>>>>>> Aborted (core dumped)
>>>>>>> [Exit 134 (SIGABRT)]
>>>>>>>
>>>>>>> Unfortunately, I can't fix the problems in these devices right now.
>>>>>>> Instead, add DeviceClass member cannot_even_create_with_object_new_yet
>>>>>>> to mark them:
>>>> ...
>>>>>>> static void pxa2xx_pcmcia_register_types(void)
>>>>>>> diff --git a/hw/ppc/spapr_rng.c b/hw/ppc/spapr_rng.c
>>>>>>> index ed43d5e..e1b115d 100644
>>>>>>> --- a/hw/ppc/spapr_rng.c
>>>>>>> +++ b/hw/ppc/spapr_rng.c
>>>>>>> @@ -169,6 +169,11 @@ static void spapr_rng_class_init(ObjectClass *oc,
>>>>>>> void *data)
>>>>>>> dc->realize = spapr_rng_realize;
>>>>>>> set_bit(DEVICE_CATEGORY_MISC, dc->categories);
>>>>>>> dc->props = spapr_rng_properties;
>>>>>>> +
>>>>>>> + /*
>>>>>>> + * Reason: crashes device-introspect-test for unknown reason.
>>>>>>> + */
>>>>>>> + dc->cannot_even_create_with_object_new_yet = true;
>>>>>>> }
>>>>>>
>>>>>> Please don't do that! That breaks the help output from
>>>>>> "-device spapr-rng,?" which should help the user to see how to use this
>>>>>> device!
>>>>>
>>>>> Well, device-introspection-test makes qemu crash, with the backtrace
>>>>> pointing squarely to this device. Stands to reason that device
>>>>> introspection could crash in normal usage, too. Until the crash is
>>>>> debugged, we better disable introspection of this device.
>>>>>
>>>>> I quite agree that disabling introspection hurts users. Just not as
>>>>> much as crashes :)
>>>>>
>>>>>> I tried to debug why this device breaks the test, but the test
>>>>>> environment is giving me a hard time ... how do you best hook a gdb into
>>>>>> that framework, so you can trace such problems?
>>>>>> Anyway, with some trial and error, I found out that it seems like the
>>>>>>
>>>>>> object_resolve_path_type("", TYPE_SPAPR_RNG, NULL)
>>>>>>
>>>>>> in spapr_rng_instance_init() is causing the problems. Could it be that
>>>>>> object_resolve_path_type is not working with the test environment?
>>>>>
>>>>> I tried to figure out why this device breaks under this test, but
>>>>> couldn't, so I posted with the "for unknown reason" comment.
>>>>
>>>> I've debugged this now for a while (thanks for the tip with
>>>> MALLOC_PERTURB, by the way!) and it seems to me that the problem is in
>>>> the macio object than in spapr-rng - the latter is just the victim of
>>>> some memory corruption caused by the first one: The
>>>> object_resolve_path_type() crashes while trying to go through the macio
>>>> object.
>>>>
>>>> So could you please add the "dc->cannot_even_create_with_object_new_yet
>>>> = true;" to macio_class_init() instead? ... that seems to fix the crash
>>>> for me, too, and is likely the better place.
>>>
>>> Hmm.
>>>
>>> For most of the devices my patch marks, we have a pretty good idea on
>>> what's wrong with them. spapr-rng is among the exceptions. You believe
>>> it's actually "the macio object". Which one? "macio" is abstract...
>>>
>>> You report introspecting "spapr-rng" crashes "while trying to go through
>>> the macio object". I wonder how omitting introspection of macio objects
>>> (that's what marking them does to this test) could affect the object
>>> we're going through when we crash.
>>
>> I have to correct myself: It's not going through the macio object, the
>> problem is actually the "macio[0]" property that is created during
>> memory_region_init() with object_property_add_child() ... the property
>> points to a free()d object when the crash happens.
>>
>>>> Or maybe we could get this also fixed? The problem could be the
>>>> memory_region_init(&s->bar, NULL, "macio", 0x80000) in
>>>> macio_instance_init() ... is this ok here? Or does this rather have to
>>>> go to the realize() function instead?
>>>
>>> Hmm, does creating and destroying a macio object leave the memory region
>>> behind?
>>>
>>> Paolo, is calling memory_region_init() in an instance_init() method
>>> okay?
>>
>> As Paolo mentioned, we likely need to pass an "owner" to
>> memory_region_init() or the macio memory region will get attached to
>> "/unattached" instead - and then leave a dangling link property behind
>> when the original macio object got destroyed.
>>
>> By the way, there are some more spots like this in the code, e.g. in
>> pxa2xx_fir_instance_init() in hw/arm/pxa2xx.c ...
>
> That's a memory_region_init_io(), so I should search for that pattern,
> too. Any memory_region_init*() in fact, I guess. >300 hits :(
I tracked down problematic devices in two ways:
1. I made device-introspection-test run "info qom-tree", which has a
lovely propensity to crash when a crappy device left dangling pointer
behind. This led me to "cgthree", "cuda", "integrator_debug",
"macio-oldworld", "macio-newworld", "pxa2xx-fir", "SUNW,tcx". They
all create memory regions without owner in their instance_init()
method.
"pxa2xx-pcmcia" does, too. It's already marked in v3, because it
actually crashes. Perhaps it has additional problems.
2. I instrumented memory_region_init() and object_init_with_type() to
crash when the former is called with null owner from within
->instance_init(). I verified this catches cases like the above. It
doesn't catch any new ones. This makes me reasonably confident I got
them all.
I'll send out v4 shortly.
- Re: [Qemu-ppc] [Qemu-devel] [PATCH v3 6/7] qdev: Protect device-list-properties against broken devices, (continued)
- Re: [Qemu-ppc] [Qemu-devel] [PATCH v3 6/7] qdev: Protect device-list-properties against broken devices, Paolo Bonzini, 2015/09/28
- Re: [Qemu-ppc] [Qemu-devel] [PATCH v3 6/7] qdev: Protect device-list-properties against broken devices, Markus Armbruster, 2015/09/29
- Re: [Qemu-ppc] [Qemu-devel] [PATCH v3 6/7] qdev: Protect device-list-properties against broken devices, Paolo Bonzini, 2015/09/29
- Re: [Qemu-ppc] [Qemu-devel] [PATCH v3 6/7] qdev: Protect device-list-properties against broken devices, Thomas Huth, 2015/09/28
- Re: [Qemu-ppc] [Qemu-devel] [PATCH v3 6/7] qdev: Protect device-list-properties against broken devices, Peter Maydell, 2015/09/28
- Re: [Qemu-ppc] [Qemu-devel] [PATCH v3 6/7] qdev: Protect device-list-properties against broken devices, Markus Armbruster, 2015/09/29
- Re: [Qemu-ppc] [Qemu-devel] [PATCH v3 6/7] qdev: Protect device-list-properties against broken devices, Peter Maydell, 2015/09/29
- Re: [Qemu-ppc] [Qemu-devel] [PATCH v3 6/7] qdev: Protect device-list-properties against broken devices,
Markus Armbruster <=
- Re: [Qemu-ppc] [Qemu-devel] [PATCH v3 6/7] qdev: Protect device-list-properties against broken devices, Peter Maydell, 2015/09/29