coreutils
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: wc: expand help of '-L' (and a question)


From: Pádraig Brady
Subject: Re: wc: expand help of '-L' (and a question)
Date: Wed, 13 May 2015 03:00:48 +0100
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:31.0) Gecko/20100101 Thunderbird/31.5.0

On 25/04/15 03:38, Assaf Gordon wrote:
> Hello,
> 
> Would you be willing to add the following patch, mentioning tab-expansion and 
> multibyte counting of '-L'
> in the "--help" screen, and the manual?
> Currently this is mentioned only in one sentence at the end of a long 
> paragraph, and is easily missed.
> My wording could be improved, but I hope this will help prevent confusion 
> with 'wc -L' output.

Wow that is confusing/ambiguous.
I'll apply the attached in your name.

> 
> Somewhat related:
> I seem to get unexpected result with '-L' when forcing C locale.
> Perhaps I'm doing something wrong, or there's more intricate details of '-L' ?
> 
> # This is a Unicode Character 'BLACK HEART SUIT' (U+2665)
> $ printf "\xe2\x99\xa5\n"
> 
> # counting characters with UTF-8 locale is 1,
> # Counting bytes is 3,
> # longest line is 1 - as expected:
> $ printf "\xe2\x99\xa5" | LC_ALL=en_US.UTF-8 wc -cmL
>        1       3       1
> 
> 
> # using C locale, characters=bytes=3,
> # but longest line is 0 ?
> $ printf "\xe2\x99\xa5" | LC_ALL=C wc -cmL
>        3       3       0
> 
> This could be because of wc.c line 492, where "isprint" is called on each 
> byte (e.g. isprint('\xe2') is false),
> and so these characters are not counted at all?

Yes. You could filter with sed to adjust:

         sed 's/././g' | wc -L    # count chars
LC_ALL=C sed 's/././g' | wc -L    # count bytes

cheers,
Pádraig.

Attachment: wc-L-clarify.patch
Description: Text Data


reply via email to

[Prev in Thread] Current Thread [Next in Thread]