bug-grep
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

bug#62483: echo a | grep -E -w '((()|a)|())*' # does not terminate


From: arnold
Subject: bug#62483: echo a | grep -E -w '((()|a)|())*' # does not terminate
Date: Sun, 02 Apr 2023 04:03:56 -0600
User-agent: Heirloom mailx 12.5 7/5/10

OK, thanks!

Dimitry Andric <dimitry@andric.com> wrote:

> Ah sorry, I did indeed rebuild grep with --with-included-regex, and for
> debugging purposes added CFLAGS="-O0 -g".
>
> In any case, the problematic code is both in glibc and grep, as I
> believe these are originating from the same source.
>
> -Dimitry
>
> > On 2 Apr 2023, at 08:52, arnold@skeeve.com wrote:
> > 
> > Hi.
> > 
> > Dimitry Andric <dimitry@andric.com> wrote:
> > 
> >> Yes, it still reproduces when I configure the latest grep using
> >> --without-included-regex:
> > 
> > I assume you meant --with-included-regex?
> > 
> > If you really used --without-included-regex, that doesn't prove anything... 
> > :-)
> > 
> > It's interesting, as gawk uses the same regex, but with different flags.
> > It might be worth trying to understand which of the syntax bits
> > is causing this.
> > 
> > Thanks,
> > 
> > Arnold
> > 
> >> 
> >> 1400      for (idx = pmatch[0].rm_so; idx <= pmatch[0].rm_eo ;)
> >> 1: idx = 0
> >> (gdb)
> >> 1402          update_regs (dfa, pmatch, prev_idx_match, cur_node, idx, 
> >> nmatch);
> >> 1: idx = 0
> >> (gdb)
> >> 1404          if ((idx == pmatch[0].rm_eo && cur_node == mctx->last_node)
> >> 1: idx = 0
> >> (gdb)
> >> 1405              || (fs && re_node_set_contains (&eps_via_nodes, 
> >> cur_node)))
> >> 1: idx = 0
> >> (gdb)
> >> 1428          cur_node = proceed_next_node (mctx, nmatch, pmatch, 
> >> prev_idx_match,
> >> 1: idx = 0
> >> (gdb)
> >> 1432          if (__glibc_unlikely (cur_node < 0))
> >> 1: idx = 0
> >> (gdb)
> >> 1400      for (idx = pmatch[0].rm_so; idx <= pmatch[0].rm_eo ;)
> >> 1: idx = 0
> >> (gdb)
> >> 1402          update_regs (dfa, pmatch, prev_idx_match, cur_node, idx, 
> >> nmatch);
> >> 1: idx = 0
> >> (gdb)
> >> 1404          if ((idx == pmatch[0].rm_eo && cur_node == mctx->last_node)
> >> 1: idx = 0
> >> (gdb)
> >> 1405              || (fs && re_node_set_contains (&eps_via_nodes, 
> >> cur_node)))
> >> 1: idx = 0
> >> 
> >> The endless loop looks the same. grep's regexec.c is mostly the same as
> >> glibc's, except for the latter having a bunch of #if RE_ENABLE_I18N
> >> directives added.
> >> 
> >> -Dimitry
> >> 
> >>> On 28 Mar 2023, at 08:46, arnold@skeeve.com wrote:
> >>> 
> >>> This does not reproduce with gawk, even when I force use of the regex
> >>> matcher.
> >>> 
> >>> What happens if grep is built with the included regex files instead of
> >>> relying on the ones in the local glibc?
> >>> 
> >>> Arnold
> >>> 
> >>> Dimitry Andric <dimitry@andric.com> wrote:
> >>> 
> >>>> On 27 Mar 2023, at 11:14, Koen Claessen <koen@chalmers.se> wrote:
> >>>>> 
> >>>>> Running the command:
> >>>>> 
> >>>>> echo a | grep -E -w '((()|a)|())*'
> >>>>> 
> >>>>> does not terminate, and uses a LOT of processor time, for all versions 
> >>>>> of
> >>>>> grep I have tried.
> >>>>> 
> >>>>> This is the smallest case that could be found; simplifying anything in 
> >>>>> the
> >>>>> input and/or expression leads to correct behavior.
> >>>> 
> >>>> Reproducible with GNU grep 3.7 on Ubuntu 22:
> >>>> 
> >>>>   PID USER      PR  NI    VIRT    RES    SHR S  %CPU  %MEM     TIME+ 
> >>>> COMMAND
> >>>> 93938 dim       20   0    9.0m   2.1m   2.0m R 100.0   0.0   0:08.32 
> >>>> grep -E -w ((()|a)|())*
> >>>> 
> >>>> It seems that at least on Ubuntu, grep in this mode uses glibc's 
> >>>> regexec(3), and it is that implementation which ends up in an endless 
> >>>> loop.
> >>>> 
> >>>> It loops between lines 1415, 1417 and 1443, but idx and cur_node never 
> >>>> change from 0:
> >>>> 
> >>>> 1378  static reg_errcode_t
> >>>> 1379  __attribute_warn_unused_result__
> >>>> 1380  set_regs (const regex_t *preg, const re_match_context_t *mctx, 
> >>>> size_t nmatch,
> >>>> 1381            regmatch_t *pmatch, bool fl_backtrack)
> >>>> 1382  {
> >>>> ...
> >>>> 1415    for (idx = pmatch[0].rm_so; idx <= pmatch[0].rm_eo ;)
> >>>> 1416      {
> >>>> 1417        update_regs (dfa, pmatch, prev_idx_match, cur_node, idx, 
> >>>> nmatch);
> >>>> 1418
> >>>> 1419        if ((idx == pmatch[0].rm_eo && cur_node == mctx->last_node)
> >>>> 1420            || (fs && re_node_set_contains (&eps_via_nodes, 
> >>>> cur_node)))
> >>>> ...
> >>>> 1442        /* Proceed to next node.  */
> >>>> 1443        cur_node = proceed_next_node (mctx, nmatch, pmatch, 
> >>>> prev_idx_match,
> >>>> 1444                                      &idx, cur_node,
> >>>> 1445                                      &eps_via_nodes, fs);
> >>>> 1446
> >>>> 1447        if (__glibc_unlikely (cur_node < 0))
> >>>> ...
> >>>> 1465          }
> >>>> 1466      }
> >>>> 
> >>>> -Dimitry
> >>>> 
> >>>> P.S.: Interestingly this does not reproduce with BSD grep, which returns 
> >>>> immediately with "a".
> >>>> 
> >> 
>





reply via email to

[Prev in Thread] Current Thread [Next in Thread]