bug-gawk
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [bug-gawk] An gawk problem.


From: Andrew J. Schorr
Subject: Re: [bug-gawk] An gawk problem.
Date: Fri, 18 Oct 2013 10:18:12 -0400
User-agent: Mutt/1.5.21 (2010-09-15)

Hi,

> > Date: Fri, 18 Oct 2013 17:41:54 +0800
> > Subject: An gawk problem.
> > From: ?????? <address@hidden>
> > To: address@hidden
> >
> > Hi, Dear Arnold:
> > Thank you for your work on gawk, and thanks for this useful tool so that i
> > can do some things easily.
> >  But recentlly, i have a problem with gawk, i am not sure that this can be
> > called a bug, because i can't determine if this is related to my machine's
> > performance. The following is my problem:
> >
> > awk '{NAME[$1]++}; END {for (name in NAME) print NAME[name], name}'
> > ip_address.txt | sort -nr | head -n1000 | awk '{print $2}'
> >
> > I want to use the command above to get the top 1000 visiter's IP address.
> > When the size of ip_address.txt is small, everything is OK. But when the
> > size of ip_address.txt up to 1 Gbyte, the command above don't have any
> > results and still runing after 3 days.(Meanwhile, the memory of my computer
> > has been swallowed up)

Is it gawk or sort that is taking so long?  If on linux, you can use "top"
or "ps" to try to figure that out.  Or change your command to tell you when
the initial awk command has completed:

awk '{NAME[$1]++}; END {for (name in NAME) print NAME[name], name; print 
"Debug: awk first pass finished" > "/dev/stderr"}' ip_address.txt | sort -nr | 
head -n1000 | awk '{print $2}'

Regards,
Andy



reply via email to

[Prev in Thread] Current Thread [Next in Thread]