emacs-bug-tracker
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[debbugs-tracker] bug#10985: closed (sort -k behavior possible problem:


From: GNU bug Tracking System
Subject: [debbugs-tracker] bug#10985: closed (sort -k behavior possible problem: field span across the boundaries)
Date: Fri, 09 Mar 2012 20:22:02 +0000

Your message dated Fri, 09 Mar 2012 13:20:48 -0700
with message-id <address@hidden>
and subject line Re: bug#10985: sort -k behavior possible problem: field span 
across the boundaries
has caused the debbugs.gnu.org bug report #10985,
regarding sort -k behavior possible problem: field span across the boundaries
to be marked as done.

(If you believe you have received this mail in error, please contact
address@hidden)


-- 
10985: http://debbugs.gnu.org/cgi/bugreport.cgi?bug=10985
GNU Bug Tracking System
Contact address@hidden with problems
--- Begin Message --- Subject: sort -k behavior possible problem: field span across the boundaries Date: Fri, 9 Mar 2012 11:46:45 -0800

Hi

 

While testing different GNU coreutils sort versions on different platforms (Linux and FreeBSD) I found that some behavior is probably not what a utility user expects.

 

Let’s, say, we have to sort (numerically stable) just two lines:

 

$ sort –t “|” –ns –k2.3,2.7 <<!

1|234

1|2|34

!

 

The GNU sort output is:

 

1|234

1|2|34

 

 

The correct output (from my point of view) must be:

 

1|2|34

1|234

 

My reasoning is that applying the key specs “-k2.3,2.7” to string “1|234” we obtain the key “4”, and applying the same key to the string “1|2|34” we must obtain “” (empty string), because the second field is just “2” and symbols from 3rd to 7th position give us an empty string. And the empty string is smaller than a number, numerically, according to the “info sort”.

 

On the other hand, the GNU sort (I suppose) just takes an offset from the field start, without taking into account the real field length. It yields the key “34”, and this is larger, numerically, than “4”.

 

I do not know whether this is an intended behavior or a bug, but this is definitely non-intuitive and not what a reasonable user would expect.

 

Thanks a lot !

Oleg Moskalenko

 


--- End Message ---
--- Begin Message --- Subject: Re: bug#10985: sort -k behavior possible problem: field span across the boundaries Date: Fri, 09 Mar 2012 13:20:48 -0700 User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:10.0.1) Gecko/20120216 Thunderbird/10.0.1
tag 10985 notabug
thanks

On 03/09/2012 12:46 PM, Oleg Moskalenko wrote:
> Hi
> 
> While testing different GNU coreutils sort versions on different platforms 
> (Linux and FreeBSD) I found that some behavior is probably not what a utility 
> user expects.

Thanks for the report.  However, you probably found behavior that is
required by POSIX.

> 
> Let's, say, we have to sort (numerically stable) just two lines:
> 
> $ sort -t "|" -ns -k2.3,2.7 <<!
> 1|234
> 1|2|34
> !

Let's use 'sort --debug' to see what really happened:

$ LC_ALL=C sort --debug -t\| -ns -k2.3,2.7 <<a
> 1|234
> 1|2|34
> a
sort: using simple byte comparison
1|234
    _
1|2|34
    __

So this sorted by locating the start of the second field ("234" of one
line, and "2|34" of the other line), then starting at the 3rd byte past
that location (even if it is in the next field).

This behavior is required by POSIX:

http://pubs.opengroup.org/onlinepubs/9699919799/utilities/sort.html

> 
> The correct output (from my point of view) must be:
> 
> 1|2|34
> 1|234

Sorry, but that interpretation does not match POSIX.

> 
> My reasoning is that applying the key specs "-k2.3,2.7" to string "1|234" we 
> obtain the key "4", and applying the same key to the string "1|2|34" we must 
> obtain "" (empty string),

That's where you are wrong.  POSIX states:

>> The notation:
>> 
>> -k field_start[type][,field_end[type]]
>> 
>> shall define a key field that begins at field_start and ends at field_end 
>> inclusive, unless field_start falls beyond the end of the line or after 
>> field_end, in which case the key field is empty. A missing field_end shall 
>> mean the last character of the line.
>> 
>> A field comprises a maximal sequence of non-separating characters and, in 
>> the absence of option -t, any preceding field separator.
>> 
>> The field_start portion of the keydef option-argument shall have the form:
>> 
>> field_number[.first_character]
>> 
>> Fields and characters within fields shall be numbered starting with 1. The 
>> field_number and first_character pieces, interpreted as positive decimal 
>> integers, shall specify the first character to be used as part of a sort 
>> key. If .first_character is omitted, it shall refer to the first character 
>> of the field.

That is, the field_start 2.3 means to start at the third character past
the second field, regardless if any intermediate field separators are
located, and that _only_ the end of a line (and not another field
separator) can result in an empty key field.

> 
> I do not know whether this is an intended behavior or a bug,

Intended and mandated by the standards.

> but this is definitely non-intuitive and not what a reasonable user would 
> expect.

Perhaps so, but if you want it changed, you need to file a bug report
against POSIX.  As such, I'm going to close out this coreutils bug.

-- 
Eric Blake   address@hidden    +1-919-301-3266
Libvirt virtualization library http://libvirt.org

Attachment: signature.asc
Description: OpenPGP digital signature


--- End Message ---

reply via email to

[Prev in Thread] Current Thread [Next in Thread]