bug-coreutils
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: feature request: gzip/bzip support for sort


From: Jim Meyering
Subject: Re: feature request: gzip/bzip support for sort
Date: Sun, 21 Jan 2007 19:14:03 +0100

Dan Hipschman <address@hidden> wrote:
> I think this patch addresses everything Paul mentioned in his critique
> of my last attempt.  I did look at gnulib pipe module, but there were
> some problems with using it "out of the box".  First, it takes a

Hi Dan,

Thanks for doing all that.
Not to look the gift horse in the mouth, but it'd be nice
if you wrote ChangeLog entries, too.  And even (gasp! :-)
a test case or two.  Of course, we'd expect such a test case
(probably named tests/misc/sort-compress, and based on
tests/sample-test) to have this line in it:

  . $srcdir/../very-expensive

If you don't have time for that, I'll take care of it, eventually.

Here are a few more comments.

Default to just "gzip", not /bin/gzip.  The latter may not exist;
your patch already handles that, but /bin/gzip may not be the first
gzip in PATH.  Also, don't bother with the access-XOK check.
There's no point in incurring even such a small overhead in the
general case, when no temporary is used.

may-be-used-uninit:
  sort.c: In function 'pipe_fork':
  sort.c:696: warning: 'pid' may be used uninitialized in this function

Coreutils policy -- via "make distcheck" -- dictates that file-global
variables be declared static.

In pipe_fork callers, use named constants, not "2" and "8".
i.e.
enum {
  /* Explain why this number is smaller than... */
  MAX_FORK_RETRIES_COMPRESS = 2,
  /* ...this one.  */
  MAX_FORK_RETRIES_DECOMPRESS = 8
};
--------------
I've done the above in the patch below.
But please address the FIXME I've added.
The rest I've left for you.

Have you considered using the gnulib hash module rather than
rolling your own?  There are examples in many of coreutils/src/*.c.

For this,
> +#define PROC_THRESH 2
Use an "enum", as above.  It's more debugger-friendly.
Use a more descriptive name.

reap and register_proc need brief comments describing what
they do and the meaning/function of their parameters.

===================
Here's a start.  The patch is relative to your latest posted patch:

        * src/sort.c (MAX_FORK_RETRIES_COMPRESS, MAX_FORK_RETRIES_DECOMPRESS):
        In pipe_fork callers, use these named constants, not "2" and "8".
        (proctab, nprocs): Declare to be "static".
        (pipe_fork) [lint]: Initialize local, pid,
        to avoid unwarranted may-be-used-uninitialized warning.
        (create_temp): Use the active voice.  Describe parameters, too.
        (main): Default to "gzip", not "/bin/gzip".
        Don't bother with the access-X_OK check.

diff --git a/src/sort.c b/src/sort.c
index 9604faa..da10759 100644
--- a/src/sort.c
+++ b/src/sort.c
@@ -93,6 +93,15 @@ enum
     SORT_FAILURE = 2
   };

+enum
+  {
+    /* FIXME: explain why this number is smaller than... */
+    MAX_FORK_RETRIES_COMPRESS = 2,
+
+    /* ...this one.  */
+    MAX_FORK_RETRIES_DECOMPRESS = 8
+  };
+
 /* The representation of the decimal point in the current locale.  */
 static int decimal_point;

@@ -440,11 +449,11 @@ struct procnode
    processes in a timely manner so as not to exhaust system resources,
    so we store the info on whether the process is still running, or has
    been reaped here.  */
-struct procnode *proctab[PROCTABSZ];
+static struct procnode *proctab[PROCTABSZ];

 /* The total number of forked processes (compressors and decompressors)
    that have not been reaped yet. */
-size_t nprocs;
+static size_t nprocs;

 /* The number of child processes we'll allow before we try to reap them. */
 #define PROC_THRESH 2
@@ -693,7 +702,7 @@ pipe_fork (int pipefds[2], size_t tries)
   sigset_t oldset;
   int saved_errno;
   unsigned int wait_retry = 1;
-  pid_t pid;
+  pid_t pid IF_LINT (= 0);

   if (pipe (pipefds) < 0)
     return -1;
@@ -744,7 +753,9 @@ pipe_fork (int pipefds[2], size_t tries)
 #endif
 }

-/* Creates a temp file and compression program to filter output to it.  */
+/* Create a temporary file and start a compression program to filter output
+   to that file.  Set *PFP to the file handle and if *PPID is non-NULL,
+   set it to the PID of the newly-created process.  */

 static char *
 create_temp (FILE **pfp, pid_t *ppid)
@@ -757,7 +768,7 @@ create_temp (FILE **pfp, pid_t *ppid)
     {
       int pipefds[2];

-      node->pid = pipe_fork (pipefds, 2);
+      node->pid = pipe_fork (pipefds, MAX_FORK_RETRIES_COMPRESS);
       if (0 < node->pid)
        {
          close (tempfd);
@@ -810,7 +821,7 @@ open_temp (const char *name, pid_t pid)
   if (compress_program)
     {
       int pipefds[2];
-      pid_t child_pid = pipe_fork (pipefds, 8);
+      pid_t child_pid = pipe_fork (pipefds, MAX_FORK_RETRIES_DECOMPRESS);

       if (0 < child_pid)
        {
@@ -3030,20 +3041,9 @@ main (int argc, char **argv)
     }

   compress_program = getenv ("GNUSORT_COMPRESSOR");
-  if (! compress_program)
-    {
-      static const char *compressors[] = { "/bin/gzip" };
-      enum { ncompressors = sizeof compressors / sizeof compressors[0] };
-      size_t i;
-
-      for (i = 0; i < ncompressors; ++i)
-       if (access (compressors[i], X_OK) == 0)
-         {
-           compress_program = compressors[i];
-           break;
-         }
-    }
-  else if (compress_program[0] == '\0')
+  if (compress_program == NULL)
+    compress_program = "gzip";
+  else if (*compress_program == '\0')
     compress_program = NULL;

   if (checkonly)




reply via email to

[Prev in Thread] Current Thread [Next in Thread]