[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Bug in uniq?
From: |
Ian Sue Wing |
Subject: |
Bug in uniq? |
Date: |
Fri, 11 Mar 2005 15:05:55 -0500 |
User-agent: |
Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.4) Gecko/20030624 Netscape/7.1 (ax) |
Greetings,
Yesterday I downloaded and installed a copy of CYGWIN. I am using the
uniq utility to purge duplicate line entries from a large, tab-delimited
file with several columns of data. (The file, which I have already run
through sort, is included as a .bz2 attachment. It has about 60,000 lines.)
I have examined the file visually in a text editor, and confirmed that
it has duplicate lines. I have loaded the file into excel and calculated
that there are about 8700 duplicate lines. However, in the CYGWIN Bash
shell, typing
uniq test_file_for_uniq > foo; diff test_file_for_uniq foo
shows no changes between the files. Examining the uniquified file 'foo'
in excel reveals it to be identical to the original.
I then fired up my trusty old MKS Toolkit and ran its implementation of
uniq. Running MKS visual diff on the original and uniquified files
identified about 8700 line differences, consistent with my earlier
calculations.
Is this a bug in CYGWIN's implementation of uniq or a or a silly error
on my part? Last I checked, uniq was simple, straightforward to use, and
had nuclear-hardened reliability.
-i
--
Ian Sue Wing 675 Commonwealth Ave.
Assistant Professor Rm. 141, Boston MA 02215
Center for Energy & Environmental Studies Tel: (617) 353-5741
Department of Geography & Environment Fax: (617) 353-5986
Boston University Web: http://people.bu.edu/isw
test_file_for_uniq.bz2
Description: Binary data
- Bug in uniq?,
Ian Sue Wing <=