coreutils
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: RFE: head,tail: -z, --zero-terminated


From: Pádraig Brady
Subject: Re: RFE: head,tail: -z, --zero-terminated
Date: Fri, 8 Jan 2016 22:07:06 +0000
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:38.0) Gecko/20100101 Thunderbird/38.3.0

On 08/01/16 19:04, Assaf Gordon wrote:
> Hello Pádraig and all,
> 
> On 01/08/2016 11:56 AM, Pádraig Brady wrote:
> [...]
>> Possible additions to this class:
>>
>>    nl (N/A as primarily text rather than record oriented)
>>    numfmt (ditto)
>>    expand (ditto)
>>    unexpand (ditto)
>>
> 
> Attached similarly structured patch adding -z to numfmt (it does not include 
> a NEWS entry, yet).

Cool. I was wondering a bit about numfmt, and thinking more this could be 
useful for:
  du -0 ... | numfmt -z

> an open question:
> With -z, do embedded newlines count as whitespace/field delimiters ?
> (not sure if this applies to other programs).
> 
> For example:
> 
>     $ printf "A B\tC\nD 1000\x00"
> 
> Should the newline count as whitespace/field delimiter (since numfmt defaults 
> to whitespace delimiters) ?
> If so, the "1000" should be the fifth field.
> If not, the "1000" should be in the fourth field (and "C\nD" cound as one 
> field).
> 
> Currently, because the numfmt code uses "isblank()", newlines DO NOT count as 
> whitespace:
> 
>      $ printf "A B\tC\nD 1000\x00" | ./src/numfmt -z --to=si --field=4 | od -a
>      0000000   A  sp   B  sp   C  nl   D  sp   1   .   0   K nul
>      0000015

A very good point.
This is not an issue for the utils in my current patch set I think,
but is for field processing utils like numfmt, sort, join, uniq
(cut delimits fields with a char rather than a class).
I.E. should these utils use isspace() rather than isblank()
when -z is specified? More conservatively they probably
should use isblank(c) || c=='\n'.

> Also,
> Two minor questions:
> 
> 1. If null-terminated test fail due to incorrect output, the log will contain:
>      numfmt.pl: test z4: stdout mismatch, comparing z4.2 (expected) and z4.O 
> (actual)
>      Binary files z4.2 and z4.O differ
> 
> This will make it hard for users to send us bug reports.
> Perhaps it's worth thinking about how to display a diff even for 
> null-terminated lines (not sure how best to approach this).

Maybe we should have something like bcompare
that diffs the base64 of two files?

> 2. In the patch for "wc", the long-form of the parameter (for getopt_long) is 
> "zero" instead of "zero-terminated" - is that intentional ?

Yes, to match other uses in that "class" of programs, like basename, etc.
Anyway -z may be moot for wc as discussed elsewhere in the thread.

thanks for the careful review!
Padraig.



reply via email to

[Prev in Thread] Current Thread [Next in Thread]