bug-grep
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

bug#62483: echo a | grep -E -w '((()|a)|())*' # does not terminate


From: Dimitry Andric
Subject: bug#62483: echo a | grep -E -w '((()|a)|())*' # does not terminate
Date: Fri, 31 Mar 2023 17:25:21 +0200

Yes, it still reproduces when I configure the latest grep using
--without-included-regex:

1400      for (idx = pmatch[0].rm_so; idx <= pmatch[0].rm_eo ;)
1: idx = 0
(gdb)
1402          update_regs (dfa, pmatch, prev_idx_match, cur_node, idx, nmatch);
1: idx = 0
(gdb)
1404          if ((idx == pmatch[0].rm_eo && cur_node == mctx->last_node)
1: idx = 0
(gdb)
1405              || (fs && re_node_set_contains (&eps_via_nodes, cur_node)))
1: idx = 0
(gdb)
1428          cur_node = proceed_next_node (mctx, nmatch, pmatch, 
prev_idx_match,
1: idx = 0
(gdb)
1432          if (__glibc_unlikely (cur_node < 0))
1: idx = 0
(gdb)
1400      for (idx = pmatch[0].rm_so; idx <= pmatch[0].rm_eo ;)
1: idx = 0
(gdb)
1402          update_regs (dfa, pmatch, prev_idx_match, cur_node, idx, nmatch);
1: idx = 0
(gdb)
1404          if ((idx == pmatch[0].rm_eo && cur_node == mctx->last_node)
1: idx = 0
(gdb)
1405              || (fs && re_node_set_contains (&eps_via_nodes, cur_node)))
1: idx = 0

The endless loop looks the same. grep's regexec.c is mostly the same as
glibc's, except for the latter having a bunch of #if RE_ENABLE_I18N
directives added.

-Dimitry

> On 28 Mar 2023, at 08:46, arnold@skeeve.com wrote:
> 
> This does not reproduce with gawk, even when I force use of the regex
> matcher.
> 
> What happens if grep is built with the included regex files instead of
> relying on the ones in the local glibc?
> 
> Arnold
> 
> Dimitry Andric <dimitry@andric.com> wrote:
> 
>> On 27 Mar 2023, at 11:14, Koen Claessen <koen@chalmers.se> wrote:
>>> 
>>> Running the command:
>>> 
>>> echo a | grep -E -w '((()|a)|())*'
>>> 
>>> does not terminate, and uses a LOT of processor time, for all versions of
>>> grep I have tried.
>>> 
>>> This is the smallest case that could be found; simplifying anything in the
>>> input and/or expression leads to correct behavior.
>> 
>> Reproducible with GNU grep 3.7 on Ubuntu 22:
>> 
>>    PID USER      PR  NI    VIRT    RES    SHR S  %CPU  %MEM     TIME+ COMMAND
>>  93938 dim       20   0    9.0m   2.1m   2.0m R 100.0   0.0   0:08.32 grep 
>> -E -w ((()|a)|())*
>> 
>> It seems that at least on Ubuntu, grep in this mode uses glibc's regexec(3), 
>> and it is that implementation which ends up in an endless loop.
>> 
>> It loops between lines 1415, 1417 and 1443, but idx and cur_node never 
>> change from 0:
>> 
>>  1378  static reg_errcode_t
>>  1379  __attribute_warn_unused_result__
>>  1380  set_regs (const regex_t *preg, const re_match_context_t *mctx, size_t 
>> nmatch,
>>  1381            regmatch_t *pmatch, bool fl_backtrack)
>>  1382  {
>> ...
>>  1415    for (idx = pmatch[0].rm_so; idx <= pmatch[0].rm_eo ;)
>>  1416      {
>>  1417        update_regs (dfa, pmatch, prev_idx_match, cur_node, idx, 
>> nmatch);
>>  1418
>>  1419        if ((idx == pmatch[0].rm_eo && cur_node == mctx->last_node)
>>  1420            || (fs && re_node_set_contains (&eps_via_nodes, cur_node)))
>> ...
>>  1442        /* Proceed to next node.  */
>>  1443        cur_node = proceed_next_node (mctx, nmatch, pmatch, 
>> prev_idx_match,
>>  1444                                      &idx, cur_node,
>>  1445                                      &eps_via_nodes, fs);
>>  1446
>>  1447        if (__glibc_unlikely (cur_node < 0))
>> ...
>>  1465          }
>>  1466      }
>> 
>> -Dimitry
>> 
>> P.S.: Interestingly this does not reproduce with BSD grep, which returns 
>> immediately with "a".
>> 

Attachment: signature.asc
Description: Message signed with OpenPGP


reply via email to

[Prev in Thread] Current Thread [Next in Thread]