coreutils
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: wc -l AVX code 10%+10% speedup


From: Pádraig Brady
Subject: Re: wc -l AVX code 10%+10% speedup
Date: Sat, 30 Mar 2024 16:14:47 +0000
User-agent: Mozilla Thunderbird

On 30/03/2024 14:52, Evgeny Nizhibitsky wrote:
Dear GNU coreutils maintainers,

It seems that I found a way to both speed-up (~10%) and simplify (13
insertions, 43 deletions) the wc -l avx code while playing with it, at
least on several million to 1 billion row files I tested with my cpu.

It mostly involves using _mm256_movemask_epi8 and __builtin_popcount
instead of the two accumulators handling that allowed me to increase the
buffer size.

I also have a further ~10% improvement in code by using 2 separate threads
instead of 1 to mitigate the usr time overhead, although it’s naturally
more complicated.

Whom should I discuss this potential contribution with?

You can propose the code here.
The AVX adjustments sound very interesting.

The threading sounds a bit less useful TBH,
and might introduce more complexity/overhead
especially in the common case.
BTW threaded line counting was discussed at:
https://www.pixelbeat.org/docs/unix-parallel-tools.html

thanks,
Pádraig




reply via email to

[Prev in Thread] Current Thread [Next in Thread]