I'd rather avoid any additional accounting overhead of a pool. If 4MB
is a reasonable limit, lets make that the new max. I can do some
testing to see where we drop off on performance improvements. We'd
have a default buffer size (smaller than the previous 64, and now 128k
buf size) that is used when we allocate scsi requests; scanning through
send_command() provides a good idea of other scsi command buf usage; and
on reads and writes, keep the capping logic we've had all along, but
bump the max size up to something like 4MB -- or whatever tests results
show as being ideal.
In all, it seems silly to worry about this sort of thing since the
entire process could be contained with process ulimits if this is really
a concern. Are we any more concerned that by splitting the requests
into many smaller requests that we're wasting cpu, pegging the
processor to 100% in some cases?