[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Qemu-devel] [RFC] A multithread lockless deduplication engine
From: |
XaviLi |
Subject: |
[Qemu-devel] [RFC] A multithread lockless deduplication engine |
Date: |
Wed, 20 Sep 2017 11:50:12 +0800 |
PageONE (Page Object Non-duplicate Engine) is a multithread kernel page
deduplication engine. It is based on a lock-less tree algorithm we currently
named as SD (Static and Dynamic) Tree. Normal operations such as
insert/query/delete to this tree are block-less. Adding more CPU cores can
linearly boost speed as far as we tested. Multithreading gives not only
opportunity to work faster. It also allows any CPU to donate spare time for the
job. Therefore, it reveals a way to use CPU more efficiently. PageONE is from
an open source solution named Dynamic VM:
https://github.com/baibantech/dynamic_vm.git
patch can be found here:
https://github.com/baibantech/dynamic_vm/tree/master/dynamic_vm_0.5
One work thread of PageONE can match the speed of KSM daemon. Adding more CPUs
can increase speed linearly. Here we can see a brief test:
Test environment
DELL R730
Intel® Xeon® E5-2650 v4 (2.20 GHz, of Cores 12, threads 24);
256GB RAM
Host OS: Ubuntu server 14.04 Host kernel: 4.4.1
Qemu: 2.9.0
Guest OS: Ubuntu server 16.04 Guest kernel: 4.4.76
We ran 12 VMs together. Each create 16GB data in memory. After all data is
ready we start dedup-engine and see how host-side used memory amount changes.
KSM:
Configuration: sleep_millisecs = 0, pages_to_scan = 1000000
Starting used memory: 216.8G
Result: KSM start merging pages immediately after turned on. KSM daemon took
100% of one CPU for 13:16 until used memory was reduced to 79.0GB.
PageONE:
Configuration: merge_period(secs) = 20, work threads = 12
Starting used memory: 207.3G
(Which means PageONE scans full physical memory in 20 secs period. Pages was
merged if not changed in 2 merge_periods.)
Result: In the first two periods PageONE only observe and identify unchanged
pages. Little CPU was used in this time. As the third period begin all 12
threads start using 100% CPU to do real merge job. 00:58 later used memory was
reduced to 70.5GB.
We ran the above test using the data quite easy for red-black tree of KSM.
Every difference can be detected by comparing the first 8 bytes. Then we ran
another test in which each data was begin with random zero bytes for
comparison. The average size of zero data was 128 bytes. Result is shown below:
KSM:
Configuration: sleep_millisecs = 0, pages_to_scan = 1000000
Starting used memory: 216.8G
Result: 19:49 minutes until used memory was reduced to 78.7GB.
PageONE:
Configuration: merge period(secs) = 20, work threads = 12
Starting used memory: 210.3G
Result: First 2 periods same as above. 1:09 after merge job start memory was
reduced to 72GB.
PageONE shows little difference in the two tests because SD tree search compare
each key bit just once in most cases.
[Prev in Thread] |
Current Thread |
[Next in Thread] |
- [Qemu-devel] [RFC] A multithread lockless deduplication engine,
XaviLi <=