[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
bug#20954: wc - linux
From: |
Stephane Chazelas |
Subject: |
bug#20954: wc - linux |
Date: |
Thu, 2 Jul 2015 14:23:30 +0100 |
User-agent: |
Mutt/1.5.21 (2010-09-15) |
2015-07-01 19:41:00 -0600, Bob Proulx:
[...]
> > $ a="" ; echo $s | wc -l
> > 1
[...]
> No. Should be 1. You have forgotten about the newline at the end of
> the command. The echo will terminate with a newline.
[...]
Leaving a variable unquoted will also cause the shell to apply
the split+glob operator on it. echo will also do some
transformations on the string (backslash and option processing).
To count the number of bytes in a variable, you can use:
printf %s "$var" | wc -c
Use "${#var}" or
printf %s "$var" | wc -m
for the number of characters.
GNU wc will not count the bytes that are not part of a valid
character, while GNU bash's ${#var} will count them as one
character:
In a UTF-8 locale:
$ var=$'\x80X\x80\u00e9'
$ printf %s "$var" | hd
00000000 80 58 80 c3 a9 |.X...|
00000005
$ echo "${#var}"
4
$ printf %s "$var" | wc -c
5
$ printf %s "$var" | wc -m
2
Above $var contains the 0x80 byte that doesn't form a valid
character, "X" (0x58), then another 0x80, then é (0xc3 0xa9).
wc -c counts the 5 bytes, wc -m counts X and é, while bash
${#var} counts those plus the 0x80s.
--
Stephane