Re: [Qemu-devel] Overcommiting cpu results in all vms offline

qemu-devel

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] Overcommiting cpu results in all vms offline

From:	Stefan Priebe - Profihost AG
Subject:	Re: [Qemu-devel] Overcommiting cpu results in all vms offline
Date:	Mon, 17 Sep 2018 11:58:02 +0200
User-agent:	Mozilla/5.0 (X11; Linux x86_64; rv:52.0) Gecko/20100101 Thunderbird/52.9.1

Am 17.09.2018 um 11:40 schrieb Jack Wang:
> Stefan Priebe - Profihost AG <address@hidden> 于2018年9月17日周一 上午9:00写道：
>>
>> Hi,
>>
>> Am 17.09.2018 um 08:38 schrieb Jack Wang:
>>> Stefan Priebe - Profihost AG <address@hidden> 于2018年9月16日周日 下午3:31写道：
>>>>
>>>> Hello,
>>>>
>>>> while overcommiting cpu I had several situations where all vms gone 
>>>> offline while two vms saturated all cores.
>>>>
>>>> I believed all vms would stay online but would just not be able to use all 
>>>> their cores?
>>>>
>>>> My original idea was to automate live migration on high host load to move 
>>>> vms to another node but that makes only sense if all vms stay online.
>>>>
>>>> Is this expected? Anything special needed to archive this?
>>>>
>>>> Greets,
>>>> Stefan
>>>>
>>> Hi, Stefan,
>>>
>>> Do you have any logs when all VMs go offline?
>>> Maybe OOMkiller play a role there?
>>
>> After reviewing i think this is memory related but OOM did not play a role.
>> All kvm processes where spinning trying to get > 100% CPU and i was not
>> able to even login to ssh. After 5-10 minutes i was able to login.
> So the VMs are not really offline, what the result if you run
> query-status via qmp?

I can't as i can't connect to the host in that stage.

>> There were about 150GB free mem.
>>
>> Relevant settings (no local storage involved):
>>         vm.dirty_background_ratio:
>>             3
>>         vm.dirty_ratio:
>>             10
>>         vm.min_free_kbytes:
>>             10567004
>>
>> # cat /sys/kernel/mm/transparent_hugepage/defrag
>> always defer [defer+madvise] madvise never
>>
>> # cat /sys/kernel/mm/transparent_hugepage/enabled
>> [always] madvise never
>>
>> After that i had the following traces on the host node:
>> https://pastebin.com/raw/0VhyQmAv
> 
> The call trace looks ceph related deadlock or hung.

Yes but i can also show you traces where nothing from ceph is involved
the only thing they have in common is the beginning in page_fault.

>> Thanks!
>>
>> Greets,
>> Stefan

[Prev in Thread]

Current Thread

[Next in Thread]

[Qemu-devel] Overcommiting cpu results in all vms offline, Stefan Priebe - Profihost AG, 2018/09/16
- Re: [Qemu-devel] Overcommiting cpu results in all vms offline, Jack Wang, 2018/09/17
  - Re: [Qemu-devel] Overcommiting cpu results in all vms offline, Stefan Priebe - Profihost AG, 2018/09/17
    - Re: [Qemu-devel] Overcommiting cpu results in all vms offline, Stefan Priebe - Profihost AG, 2018/09/17
    - Re: [Qemu-devel] Overcommiting cpu results in all vms offline, Jack Wang, 2018/09/17
    - Re: [Qemu-devel] Overcommiting cpu results in all vms offline, Stefan Priebe - Profihost AG <=
- Re: [Qemu-devel] Overcommiting cpu results in all vms offline, Alexandre DERUMIER, 2018/09/17

Prev by Date: [Qemu-devel] [PULL 2/6] doc: replace x-root with rootdir for usb-mtp
Next by Date: Re: [Qemu-devel] [PATCH v2 01/20] memory-device: fix error message when hinted address is too small
Previous by thread: Re: [Qemu-devel] Overcommiting cpu results in all vms offline
Next by thread: Re: [Qemu-devel] Overcommiting cpu results in all vms offline
Index(es):
- Date
- Thread