bug-coreutils
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Degraded performance in cat + patch


From: Jim Meyering
Subject: Re: Degraded performance in cat + patch
Date: Fri, 06 Mar 2009 11:46:55 +0100

Tzvi Rotshtein wrote:
> I've been using "cat" to feed large files into some data cruncher
> application using something like this:
>    cat my_data | data_cruncher
>
> However, cat was reading/writing the file in sub-optimal speeds (not even
> half as fast as the disk & os can provide it). I traced this to the buffer
> size selection algorithm in "cat", while generally provides good balance
> with low memory footprint, it constraints cat from reaching the disk's (or
> OS caches) peak performance.
...
> The ability to specify an explicit (and larger) buffer size has improved the
> performance by a factor of x5 on my test system, which is quite a noticeable
> gain, especially when dealing with files at least 50GB in size.
>
> Let me know what do you think of it. The patch I used is available below.

Thanks, but I don't want to add buffer-size options to cat.
If you really need to specify buffer sizes, you can already
use dd to do that.

However, thanks to your prod, I see that there is room
for improved performance when read and write syscall overhead
(as opposed to data transfer itself) make up a significant
fraction of cat's execution time.

So I'm considering the patch below.
I measured on systems with >=4GB RAM, fast CPUs,
and an input file created with "truncate -s 2G in" (also used
dd if=/dev/zero of=in... to create one of the same apparent size,
but that took a lot more space and made no difference to cat, not
even in the page fault counts)

This is on an Intel Core2 Quad Q9450 @ 2.66GHz running Fedora F10

  4KiB buffer (old/orig size):

    $ /usr/bin/time src/cat in > /dev/null; \
    /usr/bin/time src/cat in > /dev/null; \
    /usr/bin/time src/cat in > /dev/null
    0.06user 0.80system 0:00.87elapsed 99%CPU (0avgtext+0avgdata 0maxresident)k
    0inputs+0outputs (0major+169minor)pagefaults 0swaps
    0.06user 0.80system 0:00.87elapsed 99%CPU (0avgtext+0avgdata 0maxresident)k
    0inputs+0outputs (0major+170minor)pagefaults 0swaps
    0.06user 0.80system 0:00.87elapsed 99%CPU (0avgtext+0avgdata 0maxresident)k
    0inputs+0outputs (0major+170minor)pagefaults 0swaps

  32KiB buffer, i.e., patched: 33% speed-up:

    0.00user 0.58system 0:00.58elapsed 99%CPU (0avgtext+0avgdata 0maxresident)k
    0inputs+0outputs (0major+176minor)pagefaults 0swaps
    0.01user 0.57system 0:00.58elapsed 99%CPU (0avgtext+0avgdata 0maxresident)k
    0inputs+0outputs (0major+176minor)pagefaults 0swaps
    0.00user 0.57system 0:00.58elapsed 99%CPU (0avgtext+0avgdata 0maxresident)k
    0inputs+0outputs (0major+177minor)pagefaults 0swaps

=============================================
Repeating on an Athlon64 X2 5200+ at 2.6GHz running Fedora rawhide

  4KiB buffer (old/orig size):

    0.09user 2.08system 0:02.32elapsed 93%CPU (0avgtext+0avgdata 0maxresident)k
    0inputs+0outputs (0major+165minor)pagefaults 0swaps
    0.08user 2.06system 0:02.17elapsed 98%CPU (0avgtext+0avgdata 0maxresident)k
    0inputs+0outputs (0major+165minor)pagefaults 0swaps
    0.10user 2.14system 0:02.36elapsed 94%CPU (0avgtext+0avgdata 0maxresident)k
    0inputs+0outputs (0major+165minor)pagefaults 0swaps

  32KiB buffer, i.e., patched: 50% speed-up:

    0.01user 1.00system 0:01.06elapsed 95%CPU (0avgtext+0avgdata 0maxresident)k
    0inputs+0outputs (0major+172minor)pagefaults 0swaps
    0.01user 1.01system 0:01.07elapsed 95%CPU (0avgtext+0avgdata 0maxresident)k
    0inputs+0outputs (0major+171minor)pagefaults 0swaps
    0.02user 1.00system 0:01.08elapsed 94%CPU (0avgtext+0avgdata 0maxresident)k
    0inputs+0outputs (0major+171minor)pagefaults 0swaps


>From 6dd9c564a0cba6eec95102f091c6692a5ab48876 Mon Sep 17 00:00:00 2001
From: Jim Meyering <address@hidden>
Date: Fri, 6 Mar 2009 10:27:43 +0100
Subject: [PATCH] cat: use larger buffer sizes to reduce read/write-syscall 
overhead

* src/cat.c (max): Remove definition.  Use MAX from system.h instead.
(compute_buffer_size): New function.
(main): Use it, to compute larger input and output buffer sizes
derived from st_blksize, now typically 32KiB rather than 4KiB.
Suggestion from Tzvi Rotshtein.
---
 THANKS    |    1 +
 src/cat.c |   18 ++++++++++--------
 2 files changed, 11 insertions(+), 8 deletions(-)

diff --git a/THANKS b/THANKS
index e8c7b5c..c4e900b 100644
--- a/THANKS
+++ b/THANKS
@@ -553,6 +553,7 @@ Torbjorn Granlund                   address@hidden
 Torbjorn Lindgren                   address@hidden
 Torsten Landschoff                  address@hidden
 Tristan Miller                      address@hidden
+Tzvi Rotshtein                      address@hidden
 Ulrich Drepper                      address@hidden
 Ulrich Hermisson                    address@hidden
 Urs Thuermann                       address@hidden
diff --git a/src/cat.c b/src/cat.c
index 543e5cf..04eb204 100644
--- a/src/cat.c
+++ b/src/cat.c
@@ -1,5 +1,5 @@
 /* cat -- concatenate files and print on the standard output.
-   Copyright (C) 88, 90, 91, 1995-2008 Free Software Foundation, Inc.
+   Copyright (C) 88, 90, 91, 1995-2009 Free Software Foundation, Inc.

    This program is free software: you can redistribute it and/or modify
    it under the terms of the GNU General Public License as published by
@@ -48,10 +48,6 @@
   proper_name_utf8 ("Torbjorn Granlund", "Torbj\303\266rn Granlund"), \
   proper_name ("Richard M. Stallman")

-/* Undefine, to avoid warning about redefinition on some systems.  */
-#undef max
-#define max(h,i) ((h) > (i) ? (h) : (i))
-
 /* Name of input file.  May be "-".  */
 static char const *infile;

@@ -82,6 +78,12 @@ static char *line_num_end = line_buf + LINE_COUNTER_BUF_LEN 
- 3;
 /* Preserves the `cat' function's local `newlines' between invocations.  */
 static int newlines2 = 0;

+static inline size_t
+compute_buffer_size (struct stat st)
+{
+  return MIN (8 * ST_BLKSIZE (st), 32 * 1024);
+}
+
 void
 usage (int status)
 {
@@ -640,7 +642,7 @@ main (int argc, char **argv)
   if (fstat (STDOUT_FILENO, &stat_buf) < 0)
     error (EXIT_FAILURE, errno, _("standard output"));

-  outsize = ST_BLKSIZE (stat_buf);
+  outsize = compute_buffer_size (stat_buf);
   /* Input file can be output file for non-regular files.
      fstat on pipes returns S_IFSOCK on some systems, S_IFIFO
      on others, so the checking should not be done for those types,
@@ -704,7 +706,7 @@ main (int argc, char **argv)
          ok = false;
          goto contin;
        }
-      insize = ST_BLKSIZE (stat_buf);
+      insize = compute_buffer_size (stat_buf);

       /* Compare the device and i-node numbers of this input file with
         the corresponding values of the (output file associated with)
@@ -726,7 +728,7 @@ main (int argc, char **argv)
       if (! (number | show_ends | show_nonprinting
             | show_tabs | squeeze_blank))
        {
-         insize = max (insize, outsize);
+         insize = MAX (insize, outsize);
          inbuf = xmalloc (insize + page_size - 1);

          ok &= simple_cat (ptr_align (inbuf, page_size), insize);
--
1.6.2.rc1.285.gc5f54




reply via email to

[Prev in Thread] Current Thread [Next in Thread]