help-gnu-emacs
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Most used words in current buffer


From: Udyant Wig
Subject: Re: Most used words in current buffer
Date: Thu, 19 Jul 2018 11:03:45 +0530
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:52.0) Gecko/20100101 Thunderbird/52.9.1

On 07/19/2018 06:15 AM, Bob Proulx wrote:
> Not wanting to be too annoying but I see no hashing in the awk
> solution.  It is using an awk associative array to store the words.
> Perl and Pything call those "hashes" but they are just associative
> arrays.

I understand that associative arrays in awk are built upon hashing.
Kernighan and Pike say

  The implementation of associative memory uses a hashing scheme to
  ensure that access to any element takes about the same time as to any
  other, and that (at least for moderate array sizes) the time doesn't
  depend on how many elements are in the array.

However, on the previous page, in introducing the language construct,
they do take the name _associative array_.

> I will continue to be contrary here and say that awk does a much
> better job of cutting by whitespace separated fields than does cut.
> Both are standard and should be available everywhere.  And here
> because awk is already in use I expect it to be somewhat more
> efficient to use awk again in the pipeline than to use a different
> program.
>
> I also wish to improve the command line somewhat.  Using $* by itself
> does not sufficiently quote program arguments with whitespace.  One
> should use "$@" for that purpose.  Also the old forms of sort and head
> would be better left behind and use the new portable option set for
> them instead.  Let me suggest:
>
>   ' "$@" | sort -k2,2nr | head -n10 | awk '{ print $1 }'
>
> Bob

Thank you for the portable pipeline.  It is interesting to compare it
with the pipeline given in the book:

  $ wordfreq ch4.* | sort +1 -nr | sed 20q | 4

where

  wordfreq is the awk script proper,
  4 is a shell script that prints its input in 4 columns,
  and sed 20q does the equivalent of head -20.

On the last point, they say that given the ease of typing a sed command,
they felt no need to write the program head.

Udyant Wig
-- 
We make our discoveries through our mistakes: we watch one another's
success: and where there is freedom to experiment there is hope to
improve.
                                -- Arthur Quiller-Couch



reply via email to

[Prev in Thread] Current Thread [Next in Thread]