|
From: | Paolo Bonzini |
Subject: | Re: [PATCH] unistr/u8-strchr: speed up searching for ASCII characters |
Date: | Sun, 11 Jul 2010 16:20:14 +0200 |
User-agent: | Mozilla/5.0 (X11; U; Linux x86_64; en-US; rv:1.9.1.10) Gecko/20100621 Fedora/3.0.5-1.fc13 Lightning/1.0b2pre Thunderbird/3.0.5 |
On 07/07/2010 03:44 PM, Pádraig Brady wrote:
Subject: [PATCH] unistr/u8-strchr: speed up searching for ASCII characters * lib/unistr/u8-strchr.c (u8_strchr): Use strchr() for the single byte case as it was measured to be 50% faster than the existing code on x86 linux. Also add a comment on why not to use memmem() for the moment for the multibyte case.
If p is surely a valid UTF-8 string, you can do better in general like this. Say [q, q+q_len) points to an UTF-8 representation of uc:
for (; p = strchr (p, *q) && memcmp (p+1, q+1, q_len-1); p += q_len) ; return p;That's because once the first byte has matched, the length of the UTF-8 character is known to be q_len. It's better than memmem if the startup cost of strchr is low enough (of course memcmp has to be inlined/unrolled/unswitched to get decent performance).
Does the argument of u8_strchr have this guarantee? If not, the above code can read arbitrary memory.
Paolo
--- ChangeLog | 4 ++++ lib/unistr/u8-strchr.c | 19 +++++++------------ 2 files changed, 11 insertions(+), 12 deletions(-) diff --git a/ChangeLog b/ChangeLog index afcae28..8ca0bd7 100644 --- a/ChangeLog +++ b/ChangeLog @@ -1,3 +1,7 @@ +2010-07-07 Pádraig Brady<address@hidden> + + * lib/unistr/u8-strchr.c (u8_strchr): Use strchr() as it's faster + 2010-07-04 Bruno Haible<address@hidden> fsusage: Clarify which code applies to which platforms. diff --git a/lib/unistr/u8-strchr.c b/lib/unistr/u8-strchr.c index 3be14c7..3dbd3ca 100644 --- a/lib/unistr/u8-strchr.c +++ b/lib/unistr/u8-strchr.c @@ -21,25 +21,20 @@ /* Specification. */ #include "unistr.h" +#include<string.h> + uint8_t * u8_strchr (const uint8_t *s, ucs4_t uc) { uint8_t c[6]; if (uc< 0x80) - { - uint8_t c0 = uc; - - for (;; s++) - { - if (*s == c0) - break; - if (*s == 0) - goto notfound; - } - return (uint8_t *) s; - } + return strchr (s, uc); else + /* The following is equivalent to: + return memmem (s, strlen(s), c, csize); + but faster for long S with matching UC near the start, + and also memmem is sometimes buggy and inefficient. */ switch (u8_uctomb_aux (c, uc, 6)) { case 2:
[Prev in Thread] | Current Thread | [Next in Thread] |