bug-grep
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Boyer Moore overflow patch


From: Charles Levert
Subject: Re: Boyer Moore overflow patch
Date: Tue, 14 Jun 2005 23:38:37 -0400
User-agent: Mutt/1.4.1i

* On Tuesday 2005-06-14 at 22:19:26 -0400, Charles Levert wrote:
> * On Wednesday 2005-06-15 at 01:45:10 +0100, Julian Foad wrote:
> 
> I noticed there's a problem with LC_ALL still
> being set to $u from above, which it shouldn't
> be.  I'll have to investigate that separately.

It turns out that, in posix mode (e.g., when
bash is invoked as /bin/sh), the following holds:

 23. Assignment statements preceding POSIX 1003.2 special builtins
     persist in the shell environment after the builtin completes.

 24. Assignment statements preceding shell function calls persist in the
     shell environment after the function returns, as if a POSIX
     special builtin command had been executed.

Given that, it's pointless and even ambiguous
(if LC_ALL isn't already exported) to use

   LC_ALL=... function-call ...
   LC_ALL=... function-call ...

instead of

   LC_ALL=...; export LC_ALL
   function-call ...
   function-call ...

so I propose the following patch, to be applied
before the Boyer-Moore one.

The Boyer-Moore tests should then be inserted
before the UTF-8 tests which should always remain
the last ones in the file.

That way, the whole file can still be called
manually with some specific LC_ALL value that
will be used for the tests in the first half of
the file, if desired.



--- tests/foad1.sh      2005-06-14 09:42:17 -0400
+++ tests/foad1.sh      2005-06-14 23:21:14 -0400
@@ -82,62 +82,61 @@
 grep_test "LIN7C 55327/" "" -wF -e 5327 -e 5532
 
 
-u=cs_CZ.UTF-8
+# The rest of this file is meant to be executed under this locale.
+LC_ALL=cs_CZ.UTF-8; export LC_ALL
 # If the UTF-8 locale doesn't work, skip these tests silently.
-if LC_ALL="$u" locale -k LC_CTYPE 2>/dev/null |
-  "${GREP}" -q "charmap.*UTF-8"
-then
-  # Test character class erroneously matching a '[' character.
-  LC_ALL="$u" grep_test "[/" "" "[[:alpha:]]" -E
-
-  for mode in F G E; do
-    # Hint:  pipe the output of these tests in
-    #        "| LESS= LESSCHARSET=ascii less".
-    # LETTER N WITH TILDE is U+00F1 and U+00D1.
-    # LETTER Y WITH DIAERESIS is U+00FF and U+0178.
-    LC_ALL="$u" grep_test 'añÿb/AÑŸB/' 'ñÿ/ÑŸ/' 'ñÿ' -o -i -$mode
-    LC_ALL="$u" grep_test 'añÿb/AÑŸB/' 'ñÿ/ÑŸ/' 'ÑŸ' -o -i -$mode
-    LC_ALL="$u" grep_test 'añÿb/AÑŸB/' "a${CB}ñÿ${CE}b/A${CB}ÑŸ${CE}B/" 'ñÿ' 
--color=always -i -$mode
-    LC_ALL="$u" grep_test 'añÿb/AÑŸB/' "a${CB}ñÿ${CE}b/A${CB}ÑŸ${CE}B/" 'ÑŸ' 
--color=always -i -$mode
-
-    # POSIX (about -i):  ... each character in the string is matched
-    # against the pattern, not only the character, but also its case
-    # counterpart (if any), shall be matched.
-    # The following were chosen because of their trickiness due to the
-    # differing UTF-8 octet length of their counterpart and to the
-    # non-reflexivity of their mapping.
-    # Beware of homographs!  Look carefully at the actual octets.
-
-    # lc(U+0130 LATIN CAPITAL LETTER I WITH DOT ABOVE) = U+0069 LATIN SMALL 
LETTER I
-    LC_ALL="$u" grep_test 'aİb/' "a${CB}İ${CE}b/" 'i' --color=always -i -$mode
-    LC_ALL="$u" grep_test 'aib/' ''               'İ' --color=always -i -$mode
-    LC_ALL="$u" grep_test 'aİb/' ''               'I' --color=always -i -$mode
-    # uc(U+0131 LATIN SMALL LETTER DOTLESS I)          = U+0049 LATIN CAPITAL 
LETTER I
-    LC_ALL="$u" grep_test 'aıb/' "a${CB}ı${CE}b/" 'I' --color=always -i -$mode
-    LC_ALL="$u" grep_test 'aIb/' ''               'ı' --color=always -i -$mode
-    LC_ALL="$u" grep_test 'aıb/' ''               'i' --color=always -i -$mode
-    # uc(U+017F LATIN SMALL LETTER LONG S)             = U+0053 LATIN CAPITAL 
LETTER S
-    LC_ALL="$u" grep_test 'aſb/' "a${CB}ſ${CE}b/" 'S' --color=always -i -$mode
-    LC_ALL="$u" grep_test 'aSb/' ''               'ſ' --color=always -i -$mode
-    LC_ALL="$u" grep_test 'aſb/' ''               's' --color=always -i -$mode
-    # uc(U+1FBE GREEK PROSGEGRAMMENI)                  = U+0399 GREEK CAPITAL 
LETTER IOTA
-    LC_ALL="$u" grep_test 'aιb/' "a${CB}ι${CE}b/" 'Ι' --color=always -i -$mode
-    LC_ALL="$u" grep_test 'aΙb/' ''               'ι' --color=always -i -$mode
-    LC_ALL="$u" grep_test 'aιb/' ''               'ι' --color=always -i -$mode
-    # lc(U+2126 OHM SIGN)                              = U+03C9 GREEK SMALL 
LETTER OMEGA
-    LC_ALL="$u" grep_test 'aΩb/' "a${CB}Ω${CE}b/" 'ω' --color=always -i -$mode
-    LC_ALL="$u" grep_test 'aωb/' ''               'Ω' --color=always -i -$mode
-    LC_ALL="$u" grep_test 'aΩb/' ''               'Ω' --color=always -i -$mode
-    # lc(U+212A KELVIN SIGN)                           = U+006B LATIN SMALL 
LETTER K
-    LC_ALL="$u" grep_test 'aKb/' "a${CB}K${CE}b/" 'k' --color=always -i -$mode
-    LC_ALL="$u" grep_test 'akb/' ''               'K' --color=always -i -$mode
-    LC_ALL="$u" grep_test 'aKb/' ''               'K' --color=always -i -$mode
-    # lc(U+212B ANGSTROM SIGN)                         = U+00E5 LATIN SMALL 
LETTER A WITH RING ABOVE
-    LC_ALL="$u" grep_test 'aÅb/' "a${CB}Å${CE}b/" 'å' --color=always -i -$mode
-    LC_ALL="$u" grep_test 'aåb/' ''               'Å' --color=always -i -$mode
-    LC_ALL="$u" grep_test 'aÅb/' ''               'Å' --color=always -i -$mode
-  done
-fi
+locale -k LC_CTYPE 2>/dev/null | "${GREP}" -q "charmap.*UTF-8" || exit 
$failures
+
+# Test character class erroneously matching a '[' character.
+grep_test "[/" "" "[[:alpha:]]" -E
+
+for mode in F G E; do
+  # Hint:  pipe the output of these tests in
+  #        "| LESS= LESSCHARSET=ascii less".
+  # LETTER N WITH TILDE is U+00F1 and U+00D1.
+  # LETTER Y WITH DIAERESIS is U+00FF and U+0178.
+  grep_test 'añÿb/AÑŸB/' 'ñÿ/ÑŸ/' 'ñÿ' -o -i -$mode
+  grep_test 'añÿb/AÑŸB/' 'ñÿ/ÑŸ/' 'ÑŸ' -o -i -$mode
+  grep_test 'añÿb/AÑŸB/' "a${CB}ñÿ${CE}b/A${CB}ÑŸ${CE}B/" 'ñÿ' --color=always 
-i -$mode
+  grep_test 'añÿb/AÑŸB/' "a${CB}ñÿ${CE}b/A${CB}ÑŸ${CE}B/" 'ÑŸ' --color=always 
-i -$mode
+
+  # POSIX (about -i):  ... each character in the string is matched
+  # against the pattern, not only the character, but also its case
+  # counterpart (if any), shall be matched.
+  # The following were chosen because of their trickiness due to the
+  # differing UTF-8 octet length of their counterpart and to the
+  # non-reflexivity of their mapping.
+  # Beware of homographs!  Look carefully at the actual octets.
+
+  # lc(U+0130 LATIN CAPITAL LETTER I WITH DOT ABOVE) = U+0069 LATIN SMALL 
LETTER I
+  grep_test 'aİb/' "a${CB}İ${CE}b/" 'i' --color=always -i -$mode
+  grep_test 'aib/' ''               'İ' --color=always -i -$mode
+  grep_test 'aİb/' ''               'I' --color=always -i -$mode
+  # uc(U+0131 LATIN SMALL LETTER DOTLESS I)          = U+0049 LATIN CAPITAL 
LETTER I
+  grep_test 'aıb/' "a${CB}ı${CE}b/" 'I' --color=always -i -$mode
+  grep_test 'aIb/' ''               'ı' --color=always -i -$mode
+  grep_test 'aıb/' ''               'i' --color=always -i -$mode
+  # uc(U+017F LATIN SMALL LETTER LONG S)             = U+0053 LATIN CAPITAL 
LETTER S
+  grep_test 'aſb/' "a${CB}ſ${CE}b/" 'S' --color=always -i -$mode
+  grep_test 'aSb/' ''               'ſ' --color=always -i -$mode
+  grep_test 'aſb/' ''               's' --color=always -i -$mode
+  # uc(U+1FBE GREEK PROSGEGRAMMENI)                  = U+0399 GREEK CAPITAL 
LETTER IOTA
+  grep_test 'aιb/' "a${CB}ι${CE}b/" 'Ι' --color=always -i -$mode
+  grep_test 'aΙb/' ''               'ι' --color=always -i -$mode
+  grep_test 'aιb/' ''               'ι' --color=always -i -$mode
+  # lc(U+2126 OHM SIGN)                              = U+03C9 GREEK SMALL 
LETTER OMEGA
+  grep_test 'aΩb/' "a${CB}Ω${CE}b/" 'ω' --color=always -i -$mode
+  grep_test 'aωb/' ''               'Ω' --color=always -i -$mode
+  grep_test 'aΩb/' ''               'Ω' --color=always -i -$mode
+  # lc(U+212A KELVIN SIGN)                           = U+006B LATIN SMALL 
LETTER K
+  grep_test 'aKb/' "a${CB}K${CE}b/" 'k' --color=always -i -$mode
+  grep_test 'akb/' ''               'K' --color=always -i -$mode
+  grep_test 'aKb/' ''               'K' --color=always -i -$mode
+  # lc(U+212B ANGSTROM SIGN)                         = U+00E5 LATIN SMALL 
LETTER A WITH RING ABOVE
+  grep_test 'aÅb/' "a${CB}Å${CE}b/" 'å' --color=always -i -$mode
+  grep_test 'aåb/' ''               'Å' --color=always -i -$mode
+  grep_test 'aÅb/' ''               'Å' --color=always -i -$mode
+done
 
 
 exit $failures





reply via email to

[Prev in Thread] Current Thread [Next in Thread]