bug-coreutils
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

bug#34524: wc: word count incorrect when words separated only by no-brea


From: vampyrebat
Subject: bug#34524: wc: word count incorrect when words separated only by no-break space
Date: Mon, 18 Feb 2019 02:12:15 -0600

$ wc --version
wc (GNU coreutils) 8.29
Packaged by Gentoo (8.29-r1 (p1.0))

The man page for wc states: "A word is a... sequence of characters delimited by 
white space."

But its concept of white space only seems to include ASCII white space.  U+00A0 
NO-BREAK SPACE, for instance, is not recognized.

If your terminal displays UTF-8 encoding:

printf 'how are\xC2\xA0you\n'

or if your terminal displays ISO 8859-1 encoding:

printf 'how are\xA0you\n'

the visible output of this printf is "how are you".  In either case, wc does 
not recognize the second space as white space, resulting in an incorrect word 
count:

$ printf 'how are\xC2\xA0you\n' | LC_ALL=en_US.utf8 wc -w
2
$ printf 'how are\xA0you\n' | LC_ALL=en_US.iso88591 wc -w
2





reply via email to

[Prev in Thread] Current Thread [Next in Thread]