[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
bug#26864: Clarification on obscure regular expressions mentioned in kno
From: |
Sundeep Agarwal |
Subject: |
bug#26864: Clarification on obscure regular expressions mentioned in known bugs |
Date: |
Wed, 10 May 2017 14:18:37 +0530 |
Hello,
>From the man page, version 'grep (GNU grep) 2.25'
--------------------------------
Known Bugs
Large repetition counts in the {n,m} construct may cause grep to use
lots of memory. In addition, certain other obscure regular expressions
require exponential time and space, and may cause grep to run out of memory.
Back-references are very slow, and may require exponential time.
--------------------------------
I was trying a regular expression to find words from dictionary that have
two different instances of repeated letters, for example the words:
misspellings, chilliness, woodcutter etc
$ # gives no output
$ grep -m5 -xiE '([a-z]*([a-z])\2[a-z]*){2}' /usr/share/dict/words
$ # works as expected with PCRE
$ grep -m5 -xiP '([a-z]*([a-z])\2[a-z]*){2}' /usr/share/dict/words
Abbott
Annabelle
Annette
Appaloosa
Appleseed
I asked regarding this on
https://stackoverflow.com/questions/43572924/ere-adding-quantifier-to-group-with-inner-group-and-back-reference
and other forums. It helped to identify some more cases like
$ echo 'aazbbycc' | grep -E '(([a-z])\2[a-z]*){2}'
aazbbycc
$ # no output
$ echo 'aazbbycc' | grep -E '(([a-z])\2[a-z]*){3}'
and
$ echo 'aazbbycc' | grep -E '(([a-z])\2[a-z]{0,3}){3}'
aazbbycc
$ # no output
$ echo 'aazbbycc' | grep -E '(([a-z])\2[a-z]{0,4}){3}'
$ # seems dependent on character class clashing with back reference
characters and quantifier count
$ # a, b and c are the characters matching back reference
$ echo 'aazbbycc' | grep -E '(([a-z])\2[abcyz]{0,4}){2}'
aazbbycc
$ echo 'aazbbycc' | grep -E '(([a-z])\2[abcyz]{0,4}){3}'
$ echo 'aazbbycc' | grep -E '(([a-z])\2[abyz]{0,4}){3}'
$ echo 'aazbbycc' | grep -E '(([a-z])\2[byz]{0,4}){3}'
$ echo 'aazbbycc' | grep -E '(([a-z])\2[acyz]{0,4}){3}'
aazbbycc
$ echo 'aazbbycc' | grep -E '(([a-z])\2[ayz]{0,4}){3}'
aazbbycc
$ echo 'aazbbycc' | grep -E '(([a-z])\2[cyz]{0,4}){3}'
aazbbycc
$ echo 'aazbbycc' | grep -E '(([a-z])\2[yz]{0,4}){3}'
aazbbycc
The same behavior is seen with 'sed (GNU sed) 4.2.2' as well. For ex:
$ echo 'aazbbycc' | sed -nE '/(([a-z])\2[a-z]{0,3}){3}/p'
aazbbycc
$ # no output
$ echo 'aazbbycc' | sed -nE '/(([a-z])\2[a-z]{0,4}){3}/p'
So, my question is whether these regular expression examples come under
'obscure regular expressions' mentioned in the man page. If so, I feel
there should be an error message displayed instead of no output
Regards,
Sundeep
[Prev in Thread] |
Current Thread |
[Next in Thread] |
- bug#26864: Clarification on obscure regular expressions mentioned in known bugs,
Sundeep Agarwal <=