bug-gnulib
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Syntax check prohibit_doubled_word: false positives on non-English text


From: James Youngman
Subject: Syntax check prohibit_doubled_word: false positives on non-English text
Date: Sun, 5 Jun 2011 01:04:33 +0100

I had an interesting failure from the prohibit_doubled_word syntax check:

prohibit_doubled_word_RE_ ?= \
  /\b(then?|[iao]n|i[fst]|but|f?or|at|and|[dt]o)\s+\1\b/gims
prohibit_doubled_word_ =                                                \
    -e 'while ($(prohibit_doubled_word_RE_))'                           \
    $(perl_filename_lineno_text_)

sc_prohibit_doubled_word:
        @perl -n -0777 $(prohibit_doubled_word_) $$($(VC_LIST_EXCEPT))  \
          | grep -vE '$(ignore_doubled_word_match_RE_)'                 \
          | grep . && { echo '$(ME): doubled words' 1>&2; exit 1; } || :


This gave a false positive on findutils/po/ga.po (the Irish message file):
#: find/parser.c:2844
#, c-format
msgid "Arguments to -type should contain only one letter"
mamsgstr "Ní cheadaítear ach litir amháin in argóint i ndiaidh -type"


It looks to me like Perl's \b has matched after "á" here.

$ sed -ne '435 p'
/home/james/source/GNU/findutils/git/gnu/findutils/po/ga.po| od -c
0000000   m   s   g   s   t   r       "   N 303 255       c   h   e   a
0000020   d   a 303 255   t   e   a   r       a   c   h       l   i   t
0000040   i   r       a   m   h 303 241   i   n       i   n       a   r
0000060   g 303 263   i   n   t       i       n   d   i   a   i   d   h
0000100       -   t   y   p   e   "  \n
0000110

This happens with my default locale (en_IE.UTF-8) but also with
ga_IE.UTF-8 (which matches the language and the encoding being used in
this file).


$  for lcall in en_IE.utf8 ga_IE.utf8 address@hidden
ga_IE.iso88591; do echo $lcall; LC_ALL=${lcall} make
sc_prohibit_doubled_word; echo Result: $?; done
en_IE.utf8
prohibit_doubled_word
/home/james/source/GNU/findutils/git/gnu/findutils/po/ga.po:435:in in
maint.mk: doubled words
make: *** [sc_prohibit_doubled_word] Error 1
Result: 2
ga_IE.utf8
prohibit_doubled_word
/home/james/source/GNU/findutils/git/gnu/findutils/po/ga.po:435:in in
maint.mk: doubled words
make: *** [sc_prohibit_doubled_word] Earráid 1
Result: 2
address@hidden
prohibit_doubled_word
/home/james/source/GNU/findutils/git/gnu/findutils/po/ga.po:435:in in
maint.mk: doubled words
make: *** [sc_prohibit_doubled_word] Earr�id 1
Result: 2
ga_IE.iso88591
prohibit_doubled_word
/home/james/source/GNU/findutils/git/gnu/findutils/po/ga.po:435:in in
maint.mk: doubled words
make: *** [sc_prohibit_doubled_word] Earr�id 1
Result: 2

I don't know enough about the Perl regex implementation to know how
smart \b is supposed to be here.   But in any case, the syntax check
is clearly intended to check for common English problems, and this
file is certainly not in English (well, it contains English text, but
that text is copied from files elsewhere in the same package, and so
the English text will be checked at its point of origin).

James.



reply via email to

[Prev in Thread] Current Thread [Next in Thread]