bug-gnulib
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: please keep 'memmem' module simple


From: Eric Blake
Subject: Re: please keep 'memmem' module simple
Date: Wed, 09 Jan 2008 06:37:40 -0700
User-agent: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.8.1.9) Gecko/20071031 Thunderbird/2.0.0.9 Mnenhy/0.7.5.666

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

According to Simon Josefsson on 1/9/2008 3:16 AM:
|
| For gnutls, the strings are always the same (X.509 PEM headers), and we
| haven't seen such slowdowns even on large inputs (searching for a 10
| byte needle in a 500kb input is, while not typical, not uncommon).  What
| properties do unlucky strings have?

The quadratic nature of the brute force algorithm (which glibc currently
uses) is, in the worst case, proportional to the length of the haystack
times the length of the needle, and occurs when the needle is either not
present or is only present at the end of the haystack; it is also a factor
of how periodic the start of the needle is.  In other words, a needle of
"1234567890" will behave linearly (every character in the haystack is
checked at most twice), even on glibc, since it is non-periodic, while a
needle of "1111111112" in a haystack of 500kb of all '1' will require 10x
effort, since all nine '1' in the needle are compared in the inner loop
prior to the failing '2', for every 1-byte progression of the outer loop.
~ Meanwhile, the new Two-Way algorithm requires at most 2x effort, since it
exploits some properties that it learned about the needle in a single
linear preprocessing step, in order to advance the outer loop by steps
larger than one; it will never compare a character more than twice, no
matter how periodic the needle is.

If all your needles are short, and more importantly, non-periodic, then
you are correct that the worst-case slowdown of the brute-force algorithm
will not be triggered, so the speed difference is not noticeable.  But if
you ever have user-provided needles of arbitrary length, as is the case in
m4, then it is critical to use a non-quadratic algorithm if you want to
avoid denial-of-service attacks due to carefully chosen needles and
haystacks causing wasted processing cycles.

So I'm okay with the idea of adding a memmem-simple module that skips the
quadratic check if the system memmem is otherwise working (remember,
however, that the system memmem may be broken, as on cygwin).  But I think
it should be done solely through m4 magic - in other words, split
gl_FUNC_MEMMEM into two macros, where the behavioral check is called for
both memmem and memmem-simple, but the speed check is called only for
memmem; both modules can share the lib/memmem.c implementation.

Meanwhile, I'm installing this, to claim ownership of my implementation,
and to give gcc some compiler hints about memmem's properties.

- --
Don't work too hard, make some time for fun as well!

Eric Blake             address@hidden
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.5 (Cygwin)
Comment: Public key at home.comcast.net/~ericblake/eblake.gpg
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iD8DBQFHhM4j84KuGfSFAYARAucnAKCD2AHMrZnRnLYcSOlKwjLKKrdVJwCeJu3D
862KQvrBAKgWKno8SZN2jCI=
=F/2t
-----END PGP SIGNATURE-----
>From c01669e0970410a989f44883c4117b1a74b651e3 Mon Sep 17 00:00:00 2001
From: Eric Blake <address@hidden>
Date: Tue, 8 Jan 2008 17:15:27 -0700
Subject: [PATCH] Give gcc some memmem optimization hints.

* lib/string.in.h (memmem, memrchr, strchrnul, strnlen, strpbrk)
(strcasestr): Declare as pure.
* modules/memmem (Maintainer): Claim my implementation.

Signed-off-by: Eric Blake <address@hidden>
---
 ChangeLog       |    9 ++++++++-
 lib/string.in.h |   32 +++++++++++++++++++++++++-------
 modules/memmem  |    2 +-
 3 files changed, 34 insertions(+), 9 deletions(-)

diff --git a/ChangeLog b/ChangeLog
index bd4daf8..975665d 100644
--- a/ChangeLog
+++ b/ChangeLog
@@ -1,3 +1,10 @@
+2008-01-09  Eric Blake  <address@hidden>
+
+       Give gcc some memmem optimization hints.
+       * lib/string.in.h (memmem, memrchr, strchrnul, strnlen, strpbrk)
+       (strcasestr): Declare as pure.
+       * modules/memmem (Maintainer): Claim my implementation.
+
 2008-01-09  Ralf Wildenhues  <address@hidden>
 
        Support AIX 6.1 and higher.
@@ -91,7 +98,7 @@
        Suggested by Paul Eggert.
 
 2008-01-01  Sylvain Beucler  <address@hidden>
-            Bruno Haible  <address@hidden>
+           Bruno Haible  <address@hidden>
 
        Improve memory cleanup in 'relocatable' module.
        * lib/relocatable.h (compute_curr_prefix): Change return type to
diff --git a/lib/string.in.h b/lib/string.in.h
index 09205e7..355479a 100644
--- a/lib/string.in.h
+++ b/lib/string.in.h
@@ -1,6 +1,6 @@
 /* A GNU-like <string.h>.
 
-   Copyright (C) 1995-1996, 2001-2007 Free Software Foundation, Inc.
+   Copyright (C) 1995-1996, 2001-2008 Free Software Foundation, Inc.
 
    This program is free software; you can redistribute it and/or modify
    it under the terms of the GNU General Public License as published by
@@ -25,6 +25,18 @@
 #define _GL_STRING_H
 
 
+#ifndef __attribute__
+/* This feature is available in gcc versions 2.5 and later.  */
+# if __GNUC__ < 2 || (__GNUC__ == 2 && __GNUC_MINOR__ < 5) || __STRICT_ANSI__
+#  define __attribute__(Spec) /* empty */
+# endif
+/* The attribute __pure__ was added in gcc 2.96.  */
+# if __GNUC__ < 2 || (__GNUC__ == 2 && __GNUC_MINOR__ < 96)
+#  define __pure__ /* empty */
+# endif
+#endif
+
+
 /* The definition of GL_LINK_WARNING is copied here.  */
 
 
@@ -40,7 +52,8 @@ extern "C" {
 # endif
 # if ! @HAVE_DECL_MEMMEM@ || @REPLACE_MEMMEM@
 extern void *memmem (void const *__haystack, size_t __haystack_len,
-                    void const *__needle, size_t __needle_len);
+                    void const *__needle, size_t __needle_len)
+  __attribute__ ((__pure__));
 # endif
 #elif defined GNULIB_POSIXCHECK
 # undef memmem
@@ -68,7 +81,8 @@ extern void *mempcpy (void *restrict __dest, void const 
*restrict __src,
 /* Search backwards through a block for a byte (specified as an int).  */
 #if @GNULIB_MEMRCHR@
 # if ! @HAVE_DECL_MEMRCHR@
-extern void *memrchr (void const *, int, size_t);
+extern void *memrchr (void const *, int, size_t)
+  __attribute__ ((__pure__));
 # endif
 #elif defined GNULIB_POSIXCHECK
 # undef memrchr
@@ -121,7 +135,8 @@ extern char *stpncpy (char *restrict __dst, char const 
*restrict __src,
 /* Find the first occurrence of C in S or the final NUL byte.  */
 #if @GNULIB_STRCHRNUL@
 # if ! @HAVE_STRCHRNUL@
-extern char *strchrnul (char const *__s, int __c_in);
+extern char *strchrnul (char const *__s, int __c_in)
+  __attribute__ ((__pure__));
 # endif
 #elif defined GNULIB_POSIXCHECK
 # undef strchrnul
@@ -166,7 +181,8 @@ extern char *strndup (char const *__string, size_t __n);
    return MAXLEN.  */
 #if @GNULIB_STRNLEN@
 # if ! @HAVE_DECL_STRNLEN@
-extern size_t strnlen (char const *__string, size_t __maxlen);
+extern size_t strnlen (char const *__string, size_t __maxlen)
+  __attribute__ ((__pure__));
 # endif
 #elif defined GNULIB_POSIXCHECK
 # undef strnlen
@@ -192,7 +208,8 @@ extern size_t strnlen (char const *__string, size_t 
__maxlen);
 /* Find the first occurrence in S of any character in ACCEPT.  */
 #if @GNULIB_STRPBRK@
 # if ! @HAVE_STRPBRK@
-extern char *strpbrk (char const *__s, char const *__accept);
+extern char *strpbrk (char const *__s, char const *__accept)
+  __attribute__ ((__pure__));
 # endif
 # if defined GNULIB_POSIXCHECK
 /* strpbrk() assumes the second argument is a list of single-byte characters.
@@ -288,7 +305,8 @@ extern char *strsep (char **restrict __stringp, char const 
*restrict __delim);
 /* Find the first occurrence of NEEDLE in HAYSTACK, using case-insensitive
    comparison.  */
 #if ! @HAVE_STRCASESTR@
-extern char *strcasestr (const char *haystack, const char *needle);
+extern char *strcasestr (const char *haystack, const char *needle)
+  __attribute__ ((__pure__));
 #endif
 #if defined GNULIB_POSIXCHECK
 /* strcasestr() does not work with multibyte strings:
diff --git a/modules/memmem b/modules/memmem
index 2c02be9..71f770f 100644
--- a/modules/memmem
+++ b/modules/memmem
@@ -25,4 +25,4 @@ License:
 LGPLv2+
 
 Maintainer:
-libc, Simon Josefsson
+libc, Eric Blake
-- 
1.5.3.5


reply via email to

[Prev in Thread] Current Thread [Next in Thread]