[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

spin loop 100x faster in user mode (CPL=3) than superuser (CPL=0)?

From: Garrick Toubassi
Subject: spin loop 100x faster in user mode (CPL=3) than superuser (CPL=0)?
Date: Tue, 19 Oct 2021 15:05:56 -0700


I have a mystery I haven't been able to run down and would appreciate any explanation or advice.

On a mac/intel I am running qemu-system-x86_64 on a simple image which bootstraps into 64 bit long mode and then runs a simple spin loop (literally for (int i = 0; i < 10000000; i++) {}).  This completes in ~5 seconds of wall time.  After completion it then enters user mode (CPL=3) via a fabricated interrupt stack frame and an iretq, returning to the same spin loop.  In this case it runs about 100x faster.

I at first thought maybe the TCG jit somehow isn't kicking in and maybe there is some pure interpretation going on but I've run with "-trace exec_tb -trace translate_block -d out_asm,guest_errors,nochain,int,plugin" and it seems to be running "translation blocks", just a lot more of them when running the slow loop (or to be more precise running one tb many more times according to exec_tb logging).  Upon inspection the relevant generated assembly is morally equivalent between the two as best I can tell.  Which implies to me its something outside of the tb.  I was thinking perhaps its regenerating the code every time, but logging doesn't show that.

I also was wondering if something about the MMU implementation might slow things down when in user mode?  In this case both loops are running under the same GDT/page table which just happens to mark all pages as "user" pages so that when jumping to CPL=3 it will still run.

I can package up a reproducible case if it's helpful but wanted to see if there is something obvious I am missing in terms of expected behavior before doing that.



reply via email to

[Prev in Thread] Current Thread [Next in Thread]