bug-gnulib
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH] unistr/u8-strchr: speed up searching for ASCII characters


From: Bruno Haible
Subject: Re: [PATCH] unistr/u8-strchr: speed up searching for ASCII characters
Date: Sun, 18 Jul 2010 17:23:21 +0200
User-agent: KMail/1.9.9

Hi Pádraig,

> However, the first byte of a multibyte
> UTF-8 char is the same for a lot of characters

Yes. The last byte is equidistributed across the range 0x80..0xBF, whereas
the first byte is often the same. I'm applying the commit below to exploit it
for speed.

> I was wondering myself about what parts of gnulib/unistring could take
> advantage of assuming valid UTF-8 strings. From my own notes on this
> function, I have:
> 
> "Some possible optimizations would need to
> be conditional on CONFIG_UNICODE_SAFETY (see u8_mblen).
> Note also u8_mbtouc_unsafe() and u8_mbtouc(), the latter
> detecting invalid utf-8 chars even without --enable-safety
> So given the above I'm assuming that most of gnulib/unistring
> assumes valid UTF-8 (which users can enforce on input with u8_check()),
> and if a safe but inefficient implementation option is possible
> then it should be within CONFIG_UNICODE_SAFETY. Note I found
> no mention of --enable-safety in the gnulib/libunistring configure scripts."

Generally, it's better to go for safety by default. --enable-safety is for
cases where a user wants to trade safety for speed. I doubt that's
reasonable in general. It's for this reason that I provided u8_mbtouc_unsafe
under a different function name, so that programmers can use it at those
places where they know that the input is well-formed.

Bruno


2010-07-18  Bruno Haible  <address@hidden>

        unistr/u8-strchr: Optimize non-ASCII argument case.
        * lib/unistr/u8-strchr.c (u8_strchr): Compare the last byte first,
        because the first byte often matches anyway.
        Reported by Pádraig Brady <address@hidden>.

--- lib/unistr/u8-strchr.c.orig Sun Jul 18 17:16:07 2010
+++ lib/unistr/u8-strchr.c      Sun Jul 18 17:12:17 2010
@@ -68,7 +68,7 @@
             {
               if (s[1] == 0)
                 goto notfound;
-              if (*s == c0 && s[1] == c1)
+              if (s[1] == c1 && *s == c0)
                 break;
             }
           return (uint8_t *) s;
@@ -86,7 +86,7 @@
             {
               if (s[2] == 0)
                 goto notfound;
-              if (*s == c0 && s[1] == c1 && s[2] == c2)
+              if (s[2] == c2 && s[1] == c1 && *s == c0)
                 break;
             }
           return (uint8_t *) s;
@@ -105,7 +105,7 @@
             {
               if (s[3] == 0)
                 goto notfound;
-              if (*s == c0 && s[1] == c1 && s[2] == c2 && s[3] == c3)
+              if (s[3] == c3 && s[2] == c2 && s[1] == c1 && *s == c0)
                 break;
             }
           return (uint8_t *) s;



reply via email to

[Prev in Thread] Current Thread [Next in Thread]