bug-grep
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

bug#26864: Clarification on obscure regular expressions mentioned in kno


From: Sundeep Agarwal
Subject: bug#26864: Clarification on obscure regular expressions mentioned in known bugs
Date: Wed, 10 May 2017 14:18:37 +0530

Hello,

>From the man page, version 'grep (GNU grep) 2.25'

--------------------------------
   Known Bugs
       Large repetition counts in the {n,m} construct may cause grep to use
lots of  memory.   In  addition,  certain other obscure regular expressions
require exponential time and space, and may cause grep to run out of memory.
       Back-references are very slow, and may require exponential time.
--------------------------------

I was trying a regular expression to find words from dictionary that have
two different instances of repeated letters, for example the words:
misspellings, chilliness, woodcutter etc

$ # gives no output
$ grep -m5 -xiE '([a-z]*([a-z])\2[a-z]*){2}' /usr/share/dict/words

$ # works as expected with PCRE
$ grep -m5 -xiP '([a-z]*([a-z])\2[a-z]*){2}' /usr/share/dict/words
Abbott
Annabelle
Annette
Appaloosa
Appleseed


I asked regarding this on
https://stackoverflow.com/questions/43572924/ere-adding-quantifier-to-group-with-inner-group-and-back-reference
and other forums. It helped to identify some more cases like

$ echo 'aazbbycc' | grep -E '(([a-z])\2[a-z]*){2}'
aazbbycc
$ # no output
$ echo 'aazbbycc' | grep -E '(([a-z])\2[a-z]*){3}'

and

$ echo 'aazbbycc' | grep -E '(([a-z])\2[a-z]{0,3}){3}'
aazbbycc
$ # no output
$ echo 'aazbbycc' | grep -E '(([a-z])\2[a-z]{0,4}){3}'

$ # seems dependent on character class clashing with back reference
characters and quantifier count
$ # a, b and c are the characters matching back reference
$ echo 'aazbbycc' | grep -E '(([a-z])\2[abcyz]{0,4}){2}'
aazbbycc
$ echo 'aazbbycc' | grep -E '(([a-z])\2[abcyz]{0,4}){3}'
$ echo 'aazbbycc' | grep -E '(([a-z])\2[abyz]{0,4}){3}'
$ echo 'aazbbycc' | grep -E '(([a-z])\2[byz]{0,4}){3}'
$ echo 'aazbbycc' | grep -E '(([a-z])\2[acyz]{0,4}){3}'
aazbbycc
$ echo 'aazbbycc' | grep -E '(([a-z])\2[ayz]{0,4}){3}'
aazbbycc
$ echo 'aazbbycc' | grep -E '(([a-z])\2[cyz]{0,4}){3}'
aazbbycc
$ echo 'aazbbycc' | grep -E '(([a-z])\2[yz]{0,4}){3}'
aazbbycc

The same behavior is seen with 'sed (GNU sed) 4.2.2' as well. For ex:

$ echo 'aazbbycc' | sed -nE '/(([a-z])\2[a-z]{0,3}){3}/p'
aazbbycc
$ # no output
$ echo 'aazbbycc' | sed -nE '/(([a-z])\2[a-z]{0,4}){3}/p'


So, my question is whether these regular expression examples come under
'obscure regular expressions' mentioned in the man page. If so, I feel
there should be an error message displayed instead of no output

Regards,
Sundeep


reply via email to

[Prev in Thread] Current Thread [Next in Thread]