bug-grep
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

bug#32073: Improvements in Grep (Bug#32073)


From: arnold
Subject: bug#32073: Improvements in Grep (Bug#32073)
Date: Wed, 01 Jan 2020 13:24:26 -0700
User-agent: Heirloom mailx 12.5 7/5/10

Hi.

Sergiu Hlihor <address@hidden> wrote:

> Arnold, there is no need to write user code, it is already done in
> benchmarks. One of the standard benchmarks when testing HDDs and SSDs is
> read throughput vs block size and at different queue depths.

I think you're misunderstanding me, or I am misunderstanding you.

As the gawk maintainer, I can choose the buffer size to use every time
I issue a read(2) system call for any given input file.  Gawk currently
uses the smaller of (a) the file's size or (b) the st_blksize member of
the struct stat array.

If I understand you correctly, this is "not enough"; gawk (grep,
cp, etc.) should all use an optimal buffer size that depends upon the
underlying storage hardware where the file is located.

So far, so good, except for: How do I determine what that number is?
I cannot run a benchmark before opening each and every file. I don't
know of a system call that will give me that number. (If there is,
please point me to it.)

Do you just want a command line option or environment variable
that you, as the application user, can set?

If the latter, it happens that gawk will let you set AWKBUFSIZE and
it will use whatever number you supply for doing reads. (This is
even documented.)

HTH,

Arnold





reply via email to

[Prev in Thread] Current Thread [Next in Thread]