coreutils
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[PATCH] cksum: Use pclmul hardware instruction for CRC32 calculation


From: Kristoffer Brånemyr
Subject: [PATCH] cksum: Use pclmul hardware instruction for CRC32 calculation
Date: Sat, 13 Feb 2021 07:38:11 +0000 (UTC)

Hi,
I implemented another improvement for cksum to increase the speed of it some 
more. It is possible to use x86 pclmul hardware instruction for CRC32 
calculation. The patch detects support for this by using CPUID, and falls back 
to the slice by 8 algorithm if no support. Also added detection in autoconf, so 
it only will be compiled on supported targets.

By my testing it seem the checksum calculation is sped up about 6x compared to 
slice by 8 algorithm (looking at user time). However! Since the time the 
process spends waiting on syscalls (fread) is still the same, actual real time 
speedup is only 3x. It would be an interesting exercise to try to use async IO, 
so you could checksum one block while reading the next. Maybe I will try that 
one day.

As a sidenote, x86 also has a crc32 hardware instruction but it uses a 
different polynominal than cksum does, so not possible to use here.

Some benchmarking with a file already in file cache.
Oldest version: (byte by byte)
ztion@rita:~/coreutils/coreutils-8.32/src$ time ./cksum 
/disk2/download/bigfile2G

real    0m7,311s
user    0m7,039s
sys    0m0,262s

Slice by 8 version:
ztion@rita:~/coreutils/coreutils-8.32/src$ time ./cksum.slice 
/disk2/download/bigfile2G

real    0m1,546s
user    0m1,267s
sys    0m0,247s

ztion@rita:~/coreutils/coreutils_fork/src$ time ./cksum 
/disk2/download/bigfile2G

real    0m0,462s
user    0m0,191s
sys    0m0,271s



The patch is at:
https://github.com/coreutils/coreutils/pull/48

-- 
/Kristoffer Brånemyr

reply via email to

[Prev in Thread] Current Thread [Next in Thread]