bug#32073: Improvements in Grep (Bug#32073)

From: arnold
Subject: bug#32073: Improvements in Grep (Bug#32073)
Date: Wed, 01 Jan 2020 04:19:22 -0700
User-agent: Heirloom mailx 12.5 7/5/10

As a quite serious question, how is someone writing user-level code
supposed to be able to figure out the right buffer size for a particular
file, and to do so portably? ("Show me the code.")

Gawk bases its reads on the st_blksize member in struct stat.  That will
typically be something like 4K - not nearly enough, given your description


Sergiu Hlihor <address@hidden> wrote:

> This topic is getting more and more frustrating. If you rely on OS, then
> you are at the mercy of whatever read ahead configuration you have. And
> read ahead is typically 128KB so does not help that much. A HDD RAID 10
> array with 12 disks and a strip size of 128KB reaches the maximum read
> throughput if read block size is 6 * 128 = 768KB. When issuing read
> requests with 128KB , you only hit one HDD, having 1/6 read throughput.
> With flash the same. A state of the art SSD that can do 5GB/s reads can
> actually do around 1GB/s or less at 128KB block size. Why is so hard to
> understand how hardware works and the fact that you need huge block sizes
> to actually read at full speed? Why not just exposing the read buffer size
> as a configurable parameter, then anyone can just tune it as needed? 96KB
> is purely retarded.
> On Wed, 1 Jan 2020 at 08:52, Paul Eggert <address@hidden> wrote:
> > > This makes me think we should follow Coreutils' lead[0] and increase
> > > grep's initial buffer size from 32KiB, probably to 128KiB.
> >
> > I see that Jim later installed a patch increasing it to 96 KiB.
> >
> > Whatever number is chosen, it's "wrong" for some configuration. And I
> > suppose
> > the particular configuration that Sergiu Hlihor mentioned could be tweaked
> > so
> > that it worked better with grep (and with other programs).
> >
> > I'm inclined to mark this bug report as a wishlist item, in the sense that
> > it'd
> > be nice if grep and/or the OS could pick buffer sizes more intelligently
> > (though
> > it's not clear how grep and/or the OS could go about this).
> >

