bug-grep
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

newly-exposed errors with invalid-MB input


From: Jim Meyering
Subject: newly-exposed errors with invalid-MB input
Date: Fri, 02 Apr 2010 09:57:19 +0200

Yesterday, just before I planned to release,
I ran all of "make check" through valgrind.
That showed up some new failures.
The first few here (main/Ecompile/GEAcompile) are not new,
but the others are.

They arise because of this code:

static void
addtok_wc (wint_t wc)
{
  unsigned char buf[MB_LEN_MAX];
  mbstate_t s;
  int i;
  memset (&s, 0, sizeof(s));
  cur_mb_len = wcrtomb ((char *) buf, wc, &s);
  addtok_mb(buf[0], cur_mb_len == 1 ? 3 : 1);
  for (i = 1; i < cur_mb_len; i++)
    {
      addtok_mb(buf[i], i == cur_mb_len - 1 ? 2 : 0);
      addtok(CAT);
    }
}

It calls addtok_mb with uninitialized buf[0] even
when wcrtomb has returned -1.

[Note, that to reproduce this, you'll probably have to
 install the shift-JIS locale.  These commands worked for
 me on Fedora-based systems:

  d=/usr/lib/locale/ja_JP.SHIFT_JIS
  mkdir $d && zcat /usr/share/i18n/charmaps/SHIFT_JIS.gz \
    | localedef -f - -i /usr/share/i18n/locales/ja_JP $d
]


  $ printf '\203\n' > in
  $ LC_ALL=ja_JP.SHIFT_JIS valgrind src/grep -E $'\203' in
  ==16670== Memcheck, a memory error detector
  ==16670== Copyright (C) 2002-2009, and GNU GPL'd, by Julian Seward et al.
  ==16670== Using Valgrind-3.5.0 and LibVEX; rerun with -h for copyright info
  ==16670== Command: src/.vg-tmp/grep -E  in
  ==16670==
  ==16670== Conditional jump or move depends on uninitialised value(s)
  ==16670==    at 0x398F6E2617: iswalnum (wcfuncs.c:41)
  ==16670==    by 0x4202ED: peek_token (regcomp.c:1928)
  ==16670==    by 0x41FD8C: fetch_token (regcomp.c:1766)
  ==16670==    by 0x42089B: parse (regcomp.c:2116)
  ==16670==    by 0x41D716: re_compile_internal (regcomp.c:816)
  ==16670==    by 0x41C702: rpl_re_compile_pattern (regcomp.c:237)
  ==16670==    by 0x402535: GEAcompile (dfasearch.c:156)
  ==16670==    by 0x402135: Ecompile (grep.c:16)
  ==16670==    by 0x40748B: main (main.c:2141)
  ==16670==
  ==16670== Use of uninitialised value of size 8
  ==16670==    at 0x398F6E2630: iswalnum (wcfuncs.c:41)
  ==16670==    by 0x4202ED: peek_token (regcomp.c:1928)
  ==16670==    by 0x41FD8C: fetch_token (regcomp.c:1766)
  ==16670==    by 0x42089B: parse (regcomp.c:2116)
  ==16670==    by 0x41D716: re_compile_internal (regcomp.c:816)
  ==16670==    by 0x41C702: rpl_re_compile_pattern (regcomp.c:237)
  ==16670==    by 0x402535: GEAcompile (dfasearch.c:156)
  ==16670==    by 0x402135: Ecompile (grep.c:16)
  ==16670==    by 0x40748B: main (main.c:2141)
  ==16670==
  ==16670== Conditional jump or move depends on uninitialised value(s)
  ==16670==    at 0x4202F6: peek_token (regcomp.c:1928)
  ==16670==    by 0x41FD8C: fetch_token (regcomp.c:1766)
  ==16670==    by 0x42089B: parse (regcomp.c:2116)
  ==16670==    by 0x41D716: re_compile_internal (regcomp.c:816)
  ==16670==    by 0x41C702: rpl_re_compile_pattern (regcomp.c:237)
  ==16670==    by 0x402535: GEAcompile (dfasearch.c:156)
  ==16670==    by 0x402135: Ecompile (grep.c:16)
  ==16670==    by 0x40748B: main (main.c:2141)
  ==16670==
  ==16670== Conditional jump or move depends on uninitialised value(s)
  ==16670==    at 0x40C598: addtok_mb (dfa.c:1127)
  ==16670==    by 0x40C6B7: addtok_wc (dfa.c:1178)
  ==16670==    by 0x40C743: atom (dfa.c:1228)
  ==16670==    by 0x40C9C3: closure (dfa.c:1308)
  ==16670==    by 0x40CAFE: branch (dfa.c:1341)
  ==16670==    by 0x40CB45: regexp (dfa.c:1352)
  ==16670==    by 0x40CC48: dfaparse (dfa.c:1393)
  ==16670==    by 0x411AB0: dfacomp (dfa.c:3043)
  ==16670==    by 0x4026E1: GEAcompile (dfasearch.c:195)
  ==16670==    by 0x402135: Ecompile (grep.c:16)
  ==16670==    by 0x40748B: main (main.c:2141)
  ==16670==
  ==16670== Conditional jump or move depends on uninitialised value(s)
  ==16670==    at 0x4125D9: dfamust (dfa.c:3421)
  ==16670==    by 0x411ABC: dfacomp (dfa.c:3044)
  ==16670==    by 0x4026E1: GEAcompile (dfasearch.c:195)
  ==16670==    by 0x402135: Ecompile (grep.c:16)
  ==16670==    by 0x40748B: main (main.c:2141)
  ==16670==
  ==16670== Conditional jump or move depends on uninitialised value(s)
  ==16670==    at 0x4125E4: dfamust (dfa.c:3421)
  ==16670==    by 0x411ABC: dfacomp (dfa.c:3044)
  ==16670==    by 0x4026E1: GEAcompile (dfasearch.c:195)
  ==16670==    by 0x402135: Ecompile (grep.c:16)
  ==16670==    by 0x40748B: main (main.c:2141)

  [... there are many more that I have elided...]

My first attempt at a noninvasive fix was to return upon wcrtomb failure,
thus avoiding any reference to buf[0] when cur_mb_len <= 0.

However, that makes it even worse (or maybe "better" as in more apparent ;-)
and results in heap corruption.

So I've done this instead, as a stop-gap measure:
[With it, we're back to the status-quo of "just" the
three re_compile_internal violations. ]

>From 48a4826e2f97d7d6b27b189f46746c0d8417089f Mon Sep 17 00:00:00 2001
From: Jim Meyering <address@hidden>
Date: Fri, 2 Apr 2010 09:51:18 +0200
Subject: [PATCH] grep: avoid used-undefined error with truncated multibyte input

* src/dfa.c (addtok_wc): Don't use buf[0] (it's undefined) when
wcrtomb returns <= 0.

MBS_SUPPORT-removal: * src/dfa.c (dfastate):
---
 src/dfa.c |    7 +++++++
 1 files changed, 7 insertions(+), 0 deletions(-)

diff --git a/src/dfa.c b/src/dfa.c
index c2ef18c..ba78b08 100644
--- a/src/dfa.c
+++ b/src/dfa.c
@@ -1175,6 +1175,13 @@ addtok_wc (wint_t wc)
   int i;
   memset (&s, 0, sizeof(s));
   cur_mb_len = wcrtomb ((char *) buf, wc, &s);
+
+  /* This is merely stop-gap.  When cur_mb_len is 0 or negative,
+     buf[0] is undefined, yet skipping the addtok_mb call altogether
+     can result in heap corruption.  */
+  if (cur_mb_len <= 0)
+    buf[0] = 0;
+
   addtok_mb(buf[0], cur_mb_len == 1 ? 3 : 1);
   for (i = 1; i < cur_mb_len; i++)
     {
--
1.7.0.3.513.gc8ed0


I'll push this soon, and then make the release.




reply via email to

[Prev in Thread] Current Thread [Next in Thread]