coreutils
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Question about uniq's treatment of spaces-only lines


From: Pádraig Brady
Subject: Re: Question about uniq's treatment of spaces-only lines
Date: Mon, 1 Aug 2022 22:21:14 +0100
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:98.0) Gecko/20100101 Thunderbird/98.0

On 01/08/2022 21:54, Pádraig Brady wrote:
On 31/07/2022 17:26, Sudarshan S Chawathe wrote:


On 2022-07-30T13:25:34+0100 (Saturday), Pádraig Brady writes:

More succinctly:

     $ printf '%s\n' first blah ' ' '  ' 'l ast' | uniq -f1
     first
     l ast

I.e. skipping one field will compare all but the 'l ast' line as equal.
This is operating as per the POSIX standard which states:

"Ignore the first fields fields on each input line when doing comparisons,
where fields is a positive decimal integer. A field is the maximal string
matched by the basic regular expression:

[[:blank:]]*[^[:blank:]]*

If the fields option-argument specifies more fields than appear on an input l
ine,
a null string shall be used for comparison."

Thank you for the clarification.  For me, the key to resolving my
earlier confusion was the realization that the blanks are included in
the field as opposed to being interpreted as inter-field separators.
This is obvious now based on what you quote above from the POSIX docs
but escaped me earlier because I hadn't thought of checking those
docs. The GNU info docs for uniq do not seem to describe what exactly a
field is in this context.  Perhaps it would be useful to include the
above quote or an equivalent description (or pointer) there.

Yes good point.
It's quite confusing actually.
Given:

$ cat in.txt
1 2
2  2
3 2
4  2

One might think given the current definition that `uniq -f1`
would operate only on the '2's above. But the leading spaces
are part of the second field and so significant to the comparison.

$ uniq -f1 in.txt
1 2
2  2
3 2
4  2

$ tr -s ' ' <in.txt | uniq -f1
1 2

This is quite awkward really in the presence of variable number of blanks.

This made my "coreutils gotchas" list:
http://www.pixelbeat.org/docs/coreutils-gotchas.html#uniq



reply via email to

[Prev in Thread] Current Thread [Next in Thread]