[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [PATCH 2/4] maint.mk: expand the prohibit_doubled_word regex
From: |
Eric Blake |
Subject: |
Re: [PATCH 2/4] maint.mk: expand the prohibit_doubled_word regex |
Date: |
Fri, 29 Jul 2016 15:29:09 -0600 |
User-agent: |
Mozilla/5.0 (X11; Linux x86_64; rv:38.0) Gecko/20100101 Thunderbird/38.3.0 |
On 07/26/2016 08:28 AM, Ján Tomko wrote:
> This check has a static list of words that are checked for repetitions.
> Expand it before running the perl script to avoid using expensive
> captures.
> ---
> ChangeLog | 9 +++++++++
> top/maint.mk | 7 ++++++-
> 2 files changed, 15 insertions(+), 1 deletion(-)
>
> diff --git a/ChangeLog b/ChangeLog
> index 7dd78e3..b698a6c 100644
> --- a/ChangeLog
> +++ b/ChangeLog
> @@ -1,5 +1,14 @@
> 2016-07-26 Ján Tomko <address@hidden>
>
> + maint.mk: expand the prohibit_doubled_word regex
> +
> + This check has a static list of words that are checked for
> + repetitions.
> + Expand it before running the perl script to avoid using expensive
> + captures.
gnulib is still stuck in the old ways of GNU-style changelog entries
where you call out the file and section touched, as in:
* maint.mk (prohibit_doubled_word): Pre-expand the regex to
avoid expensive perl regex backreferences.
Can be touched up on commit.
>
> +prohibit_doubled_words_ = \
> + the then in an on if is it but for or at and do to
> +# expand the regex before running the check to avoid using expensive captures
> +prohibit_doubled_word_expanded_ = \
> + $(shell echo $(prohibit_doubled_words_) | sed -r
> 's/\b(\S+)\b/\1\\s\+\1/g')
I bet GNU make has builtins that could do this operation without forking
to $(shell). This stage results in a variable containing:
the\s\+the then\s\+then ...
Maybe:
$(join $(prohibit_doubled_words_),$(addprefix
\s\+,$(prohibit_doubled_words_)))
> prohibit_doubled_word_RE_ ?= \
> - /\b(then?|[iao]n|i[fst]|but|f?or|at|and|[dt]o)\s+\1\b/gims
> + /\b(?:$(subst $(space),|,$(prohibit_doubled_word_expanded_)))\b/gims
At any rate, you want to end up with the perl regex:
\b(?:the\s\+the|then\s\+then|...)\b/gims
> prohibit_doubled_word_ = \
> -e 'while ($(prohibit_doubled_word_RE_))'
> \
> $(perl_filename_lineno_text_)
>
At any rate, I doubt my make fine-tuning matters, and you are definitely
correct that avoiding back-references makes perl regexes more efficient.
--
Eric Blake eblake redhat com +1-919-301-3266
Libvirt virtualization library http://libvirt.org
signature.asc
Description: OpenPGP digital signature