[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: Degraded performance in cat + patch
From: |
Jim Meyering |
Subject: |
Re: Degraded performance in cat + patch |
Date: |
Fri, 06 Mar 2009 11:46:55 +0100 |
Tzvi Rotshtein wrote:
> I've been using "cat" to feed large files into some data cruncher
> application using something like this:
> cat my_data | data_cruncher
>
> However, cat was reading/writing the file in sub-optimal speeds (not even
> half as fast as the disk & os can provide it). I traced this to the buffer
> size selection algorithm in "cat", while generally provides good balance
> with low memory footprint, it constraints cat from reaching the disk's (or
> OS caches) peak performance.
...
> The ability to specify an explicit (and larger) buffer size has improved the
> performance by a factor of x5 on my test system, which is quite a noticeable
> gain, especially when dealing with files at least 50GB in size.
>
> Let me know what do you think of it. The patch I used is available below.
Thanks, but I don't want to add buffer-size options to cat.
If you really need to specify buffer sizes, you can already
use dd to do that.
However, thanks to your prod, I see that there is room
for improved performance when read and write syscall overhead
(as opposed to data transfer itself) make up a significant
fraction of cat's execution time.
So I'm considering the patch below.
I measured on systems with >=4GB RAM, fast CPUs,
and an input file created with "truncate -s 2G in" (also used
dd if=/dev/zero of=in... to create one of the same apparent size,
but that took a lot more space and made no difference to cat, not
even in the page fault counts)
This is on an Intel Core2 Quad Q9450 @ 2.66GHz running Fedora F10
4KiB buffer (old/orig size):
$ /usr/bin/time src/cat in > /dev/null; \
/usr/bin/time src/cat in > /dev/null; \
/usr/bin/time src/cat in > /dev/null
0.06user 0.80system 0:00.87elapsed 99%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (0major+169minor)pagefaults 0swaps
0.06user 0.80system 0:00.87elapsed 99%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (0major+170minor)pagefaults 0swaps
0.06user 0.80system 0:00.87elapsed 99%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (0major+170minor)pagefaults 0swaps
32KiB buffer, i.e., patched: 33% speed-up:
0.00user 0.58system 0:00.58elapsed 99%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (0major+176minor)pagefaults 0swaps
0.01user 0.57system 0:00.58elapsed 99%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (0major+176minor)pagefaults 0swaps
0.00user 0.57system 0:00.58elapsed 99%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (0major+177minor)pagefaults 0swaps
=============================================
Repeating on an Athlon64 X2 5200+ at 2.6GHz running Fedora rawhide
4KiB buffer (old/orig size):
0.09user 2.08system 0:02.32elapsed 93%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (0major+165minor)pagefaults 0swaps
0.08user 2.06system 0:02.17elapsed 98%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (0major+165minor)pagefaults 0swaps
0.10user 2.14system 0:02.36elapsed 94%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (0major+165minor)pagefaults 0swaps
32KiB buffer, i.e., patched: 50% speed-up:
0.01user 1.00system 0:01.06elapsed 95%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (0major+172minor)pagefaults 0swaps
0.01user 1.01system 0:01.07elapsed 95%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (0major+171minor)pagefaults 0swaps
0.02user 1.00system 0:01.08elapsed 94%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (0major+171minor)pagefaults 0swaps
>From 6dd9c564a0cba6eec95102f091c6692a5ab48876 Mon Sep 17 00:00:00 2001
From: Jim Meyering <address@hidden>
Date: Fri, 6 Mar 2009 10:27:43 +0100
Subject: [PATCH] cat: use larger buffer sizes to reduce read/write-syscall
overhead
* src/cat.c (max): Remove definition. Use MAX from system.h instead.
(compute_buffer_size): New function.
(main): Use it, to compute larger input and output buffer sizes
derived from st_blksize, now typically 32KiB rather than 4KiB.
Suggestion from Tzvi Rotshtein.
---
THANKS | 1 +
src/cat.c | 18 ++++++++++--------
2 files changed, 11 insertions(+), 8 deletions(-)
diff --git a/THANKS b/THANKS
index e8c7b5c..c4e900b 100644
--- a/THANKS
+++ b/THANKS
@@ -553,6 +553,7 @@ Torbjorn Granlund address@hidden
Torbjorn Lindgren address@hidden
Torsten Landschoff address@hidden
Tristan Miller address@hidden
+Tzvi Rotshtein address@hidden
Ulrich Drepper address@hidden
Ulrich Hermisson address@hidden
Urs Thuermann address@hidden
diff --git a/src/cat.c b/src/cat.c
index 543e5cf..04eb204 100644
--- a/src/cat.c
+++ b/src/cat.c
@@ -1,5 +1,5 @@
/* cat -- concatenate files and print on the standard output.
- Copyright (C) 88, 90, 91, 1995-2008 Free Software Foundation, Inc.
+ Copyright (C) 88, 90, 91, 1995-2009 Free Software Foundation, Inc.
This program is free software: you can redistribute it and/or modify
it under the terms of the GNU General Public License as published by
@@ -48,10 +48,6 @@
proper_name_utf8 ("Torbjorn Granlund", "Torbj\303\266rn Granlund"), \
proper_name ("Richard M. Stallman")
-/* Undefine, to avoid warning about redefinition on some systems. */
-#undef max
-#define max(h,i) ((h) > (i) ? (h) : (i))
-
/* Name of input file. May be "-". */
static char const *infile;
@@ -82,6 +78,12 @@ static char *line_num_end = line_buf + LINE_COUNTER_BUF_LEN
- 3;
/* Preserves the `cat' function's local `newlines' between invocations. */
static int newlines2 = 0;
+static inline size_t
+compute_buffer_size (struct stat st)
+{
+ return MIN (8 * ST_BLKSIZE (st), 32 * 1024);
+}
+
void
usage (int status)
{
@@ -640,7 +642,7 @@ main (int argc, char **argv)
if (fstat (STDOUT_FILENO, &stat_buf) < 0)
error (EXIT_FAILURE, errno, _("standard output"));
- outsize = ST_BLKSIZE (stat_buf);
+ outsize = compute_buffer_size (stat_buf);
/* Input file can be output file for non-regular files.
fstat on pipes returns S_IFSOCK on some systems, S_IFIFO
on others, so the checking should not be done for those types,
@@ -704,7 +706,7 @@ main (int argc, char **argv)
ok = false;
goto contin;
}
- insize = ST_BLKSIZE (stat_buf);
+ insize = compute_buffer_size (stat_buf);
/* Compare the device and i-node numbers of this input file with
the corresponding values of the (output file associated with)
@@ -726,7 +728,7 @@ main (int argc, char **argv)
if (! (number | show_ends | show_nonprinting
| show_tabs | squeeze_blank))
{
- insize = max (insize, outsize);
+ insize = MAX (insize, outsize);
inbuf = xmalloc (insize + page_size - 1);
ok &= simple_cat (ptr_align (inbuf, page_size), insize);
--
1.6.2.rc1.285.gc5f54
- Degraded performance in cat + patch, Tzvi Rotshtein, 2009/03/06
- Re: Degraded performance in cat + patch, Pádraig Brady, 2009/03/06
- Re: Degraded performance in cat + patch,
Jim Meyering <=
- Re: Degraded performance in cat + patch, Pádraig Brady, 2009/03/06
- Re: Degraded performance in cat + patch, Pádraig Brady, 2009/03/06
- Re: Degraded performance in cat + patch, Pádraig Brady, 2009/03/06
- Re: Degraded performance in cat + patch, Jim Meyering, 2009/03/06
- Re: Degraded performance in cat + patch, Pádraig Brady, 2009/03/06
- Re: Degraded performance in cat + patch, Jim Meyering, 2009/03/06
- Re: Degraded performance in cat + patch, Pádraig Brady, 2009/03/06
- Re: Degraded performance in cat + patch, Jim Meyering, 2009/03/07
- Re: Degraded performance in cat + patch, Pádraig Brady, 2009/03/11
- Re: Degraded performance in cat + patch, Jim Meyering, 2009/03/11