[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: Bug in uniq?
From: |
David Eisner |
Subject: |
Re: Bug in uniq? |
Date: |
Fri, 11 Mar 2005 16:38:17 -0500 |
User-agent: |
Mozilla Thunderbird 1.0 (Windows/20041206) |
It looks like many of the lines end with a carriage return, newline
(\r\n), while the others end
with only a newline. Is it possible the other tools are ignoring line
ending differences?
-David
Ian Sue Wing wrote:
>Greetings,
>
>Yesterday I downloaded and installed a copy of CYGWIN. I am using the
>uniq utility to purge duplicate line entries from a large, tab-delimited
>file with several columns of data. (The file, which I have already run
>through sort, is included as a .bz2 attachment. It has about 60,000 lines.)
>
>I have examined the file visually in a text editor, and confirmed that
>it has duplicate lines. I have loaded the file into excel and calculated
>that there are about 8700 duplicate lines. However, in the CYGWIN Bash
>shell, typing
>
>uniq test_file_for_uniq > foo; diff test_file_for_uniq foo
>
>shows no changes between the files. Examining the uniquified file 'foo'
>in excel reveals it to be identical to the original.
>
>I then fired up my trusty old MKS Toolkit and ran its implementation of
>uniq. Running MKS visual diff on the original and uniquified files
>identified about 8700 line differences, consistent with my earlier
>calculations.
>
>Is this a bug in CYGWIN's implementation of uniq or a or a silly error
>on my part? Last I checked, uniq was simple, straightforward to use, and
>had nuclear-hardened reliability.
>
>-i
>
>
>
>------------------------------------------------------------------------
>
>_______________________________________________
>Bug-coreutils mailing list
>address@hidden
>http://lists.gnu.org/mailman/listinfo/bug-coreutils
>
>
--
---------------------------------------------------------
D a v i d E i s n e r c r a d l e @ u m d . e d u
CALCE EPSC University of Maryland