bug-gnu-utils
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Ver. 3.1.4 & 3.1.3 Windows ports: Chopped record count at large files


From: George Zarkadas
Subject: Ver. 3.1.4 & 3.1.3 Windows ports: Chopped record count at large files
Date: Sun, 27 Aug 2006 13:19:08 +0300

gawk reports a (very) smaller than actual record count when processing a
large (~ 275 MB) text file.

 

This behavior exists in:

3.1.4 version, xmlgawk windows port

    (downloaded from
http://lml.ls.fi.upm.es/~mcollado/xmlgawk/xmlgawk-3.1.4_20040920_mingw.zip)

3.1.3 version, gnuwin32 windows port

    (downloaded from http://sourceforge.net/projects/gnuwin32/ )

 

but not in the 3.0.4 version (mingw windows port) which gives the correct
results (as verified by independent checks).

 

As a consequence and in consistency with the above remark, gawk fails to
extract a subset of records from the file that are located near the end of
it.

 

Attached are included:

1. Results (as copied and pasted from the command line) from (a) running the
count scripts and (b) extracting the subset [files: count_results.txt and
subset_results.txt]

2. The awk scripts in question

 

Additional information

-- The file upon which the scripts operated contains bibliographic records
in bibtex format (converted from the xml file which is supplied by the DBLP
project as downloaded from www.vldb.org <http://www.vldb.org/>  )

-- The scripts were run on two machines with identical results.
Configurations:

OS: Windows XP SP2 (EL) in both

CPU: Pentium M 1.7 GHz  |  Pentium 4 HT 3.0 GHz

RAM: 1 GB  |  2 GB

HDD: 80 GB  |  400 GB

-- A bug report has also been submitted to the gnuwin32 project (no related
contact-info was found for the 3.1.4 port). However I have the feeling that
this is not a windows-port specific behavior; hence this bug report.

 

Kind Regards

George Zarkadas

 

PS: The original file upon which the scripts acted is not included because
of its size (~55 MB zipped) but will be happily supplied if requested.

 

Attachment: count_results.txt
Description: Text document

Attachment: count_dblp_bib2.awk
Description: Binary data

Attachment: count_dblp_bib.awk
Description: Binary data

Attachment: subset_results.txt
Description: Text document

Attachment: get_vldb_subset.awk
Description: Binary data

Attachment: get_vldb_subset2.awk
Description: Binary data


reply via email to

[Prev in Thread] Current Thread [Next in Thread]