[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[PATCH] deflate.c: identify slide_Pos() for later optimization
From: |
John Reiser |
Subject: |
[PATCH] deflate.c: identify slide_Pos() for later optimization |
Date: |
Mon, 23 Jul 2012 11:06:31 -0700 |
User-agent: |
Mozilla/5.0 (X11; Linux x86_64; rv:14.0) Gecko/20120717 Thunderbird/14.0 |
Modern "multimedia" vectorized hardware instructions can speed deflate().
For higher-end x86* CPUs the speedup might be 2% to 3% of total CPU time.
On a slower CPU, or with a compiler plus instruction decoder that suffer
longer latency after a branch (such as gcc for some PowerPC chips)
then the improvement might be 5% to 8%.
The attached patch introduces a new subroutine slide_Pos() in deflate.c
which identifies the operation that is subject to optimization.
The opportunity arises when sliding the window. The vectors head[]
and prev[] of substring indices are adjusted using saturating subtraction.
A very good compiler should be able to recognize and vectorize the operation
from the patched source. If not, then any compiler which can inline a local
subroutine should give code which is no worse than the unmodified version.
A compiler which does not inline slide_Pos might introduce a penalty
approximately equal to the cost of two internal subroutine calls.
If there is interest, then I will follow with assembly-language versions
of slide_Pos for i686/x86_64 (with runtime selection among several variants
according to actual hardware capabilities), PowerPC altivec (compile-time
selection) and ARM neon (compile-time selection.)
--
John Reiser, address@hidden
0002-slide_Pos-identify-for-future-optimization.patch
Description: Text Data
- [PATCH] deflate.c: identify slide_Pos() for later optimization,
John Reiser <=