bug-grep
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

bug#18266: handling bytes not part of the charset, and other garbage


From: Paul Eggert
Subject: bug#18266: handling bytes not part of the charset, and other garbage
Date: Fri, 12 Sep 2014 14:39:35 -0700
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:31.0) Gecko/20100101 Thunderbird/31.1.0

On 09/12/2014 02:29 PM, Vincent Lefevre wrote:

an option to control what happens on encoding errors would be better and sufficient.

It might suffice for your use cases, but it's more complicated and less flexible than being able to match bytes within the regular expression. (Plus, someone would have to implement it, which is perhaps the biggest objection to either approach ....) But I take your point that \C is best avoided. This whole area is pretty hairy, I'm afraid.

Speaking of hairy, why doesn't grep use PCRE_MULTILINE? Using PCRE_MULTILINE shouldn't be that hard, and should boost performance quite a bit in typical usage. Or am I being too optimistic here?





reply via email to

[Prev in Thread] Current Thread [Next in Thread]