bug-coreutils
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: sort -R: a more-conservative approach


From: Frederik Eaton
Subject: Re: sort -R: a more-conservative approach
Date: Tue, 13 Dec 2005 00:12:36 +0000
User-agent: mutt-ng/devel-r472 (Debian)

Well, I guess it's only a factor of two, but there's a difference
between bugs that cause a segfault in a tool that many people use, and
bugs that might cause a new feature with 0 existing users to not be
100% cryptographically secure. That's just my two cents though

Frederik

On Mon, Dec 12, 2005 at 02:52:39PM -0800, Paul Eggert wrote:
> After looking through the sort -R patch and noting the segfault it
> caused on 64-bit systems
> <http://lists.gnu.org/archive/html/bug-coreutils/2005-12/msg00137.html>
> I thought I'd back off to a more-conservative approach for now.
> 
> 2005-12-12  Paul Eggert  <address@hidden>
> 
>       * Version 6.0-cvs.
> 
>       Install a more-conservative approach for sort -R.  It's the
>       same basic idea as the existing code, except it uses the full ISAAC
>       approach (called the "more kosher" approach in the existing comments).
>       This makes "sort -R" quite a bit slower (about a factor of 2 on my
>       little tests involving 10000 lines on a 2.4 GHz P4), but I think it's
>       better to be conservative here at first, and review any performance
>       improvements carefully.
>       * .x-sc_require_config_h: Add src/rand-isaac.c.
>       * src/rand-isaac.h: Remove.  All uses now simply include rand-isaac.c.
>       * src/Makefile.am (noinst_HEADERS): Remove rand-isaac.h.
>       (shred_SOURCES, sort_SOURCES): Remove.
>       (EXTRA_DIST): Add rand-isaac.c.
>       * src/rand-isaac.c: Revert to what used to be in shred.c, without
>       changing it to allow for varying numbers of words in the state.
>       Alter so that we include rand-isaac.c directly rather than
>       compiling it and linking to it.  Don't include config.h or
>       system.h; that's the includer's responsibility.
>       Omit functions that are specific to shred.
>       (ISAAC_LOG, ISAAC_WORDS, ISAAC_BYTES, struct isaac_state, ind):
>       (isaac_step, struct irand_state):
>       Resurrect these, with the same defns that used to be in shred.c.
>       (ISAAC_SIZE, isaac_new, isaac_copy): Remove.
>       (isaac_refill, isaac_seed_start, isaac_seed_data, irand_init, irand32):
>       static again.
>       (struct isaac_state, isaac_refill, isaac_mix, isaac_init):
>       (isaac_seed_start, isaac_seed_data, isaac_seed_finish, isaac_seed):
>       (irand_init, irand32, irand_mod):
>       Number of words is constant again.
>       (struct irand_state, irand_init, irand32, irand_mod): Move to shred.c.
>       * src/shred.c: Include rand-isaac.c rather than rand-isaac.h.
>       * src/sort.c: Likewise.
>       * src/shred.c (fillrand, dopass, main): Undo previous change.
>       (struct irand_state, irand_init, irand32, irand_mod): Moved back here,
>       from rand-isaac.c.
>       * src/sort.c: Don't include md5.h; it wasn't needed.
>       (struct keyfield): Rename random_hash to random, for consistency
>       with the other member names.  All uses changed.
>       (usage): Tweak wording to mention STRING for --seed option.
>       (short_options): Rorder for consistency with other programs.
>       (rand_state): Now a struct, not a pointer to one.  All uses changed.
>       (HASH_WORDS, HASH_SIZE): Remove.
>       (get_hash): Remove comments around resbuf size, since we can assume C89.
>       Use a "more-kosher" (but slower) approach of invoking isaac_refill.
>       (keycompare): Adjust to the new get_hash.
>       Add a FIXME.
>       (badfieldspec): Omit recently-introduced comment; it isn't needed.
>       (main): Don't set need_random simply because gkey has it set; that
>       doesn't necessarily mean we'll need random numbers.
>       Redo seeding to match new get_hash approach.
> 
> Index: .x-sc_require_config_h
> ===================================================================
> RCS file: /fetish/cu/.x-sc_require_config_h,v
> retrieving revision 1.2
> retrieving revision 1.3
> diff -p -u -r1.2 -r1.3
> --- .x-sc_require_config_h    24 Nov 2005 09:04:34 -0000      1.2
> +++ .x-sc_require_config_h    12 Dec 2005 22:41:42 -0000      1.3
> @@ -22,4 +22,5 @@
>  ^src/ls-dir\.c$
>  ^src/ls-ls\.c$
>  ^src/ls-vdir\.c$
> +^src/rand-isaac\.c$
>  ^src/tac-pipe\.c$
> Index: src/Makefile.am
> ===================================================================
> RCS file: /fetish/cu/src/Makefile.am,v
> retrieving revision 1.61
> retrieving revision 1.63
> diff -p -u -r1.61 -r1.63
> --- src/Makefile.am   10 Dec 2005 22:54:31 -0000      1.61
> +++ src/Makefile.am   12 Dec 2005 22:42:37 -0000      1.63
> @@ -40,13 +40,12 @@ noinst_HEADERS = \
>    dircolors.h \
>    fs.h \
>    ls.h \
> -  rand-isaac.h \
>    remove.h \
>    system.h \
>    wheel-size.h \
>    wheel.h
>  
> -EXTRA_DIST = dcgen dircolors.hin tac-pipe.c \
> +EXTRA_DIST = dcgen dircolors.hin rand-isaac.c tac-pipe.c \
>    groups.sh wheel-gen.pl extract-magic
>  CLEANFILES = $(SCRIPTS) su
>  
> @@ -175,8 +174,6 @@ vdir_SOURCES = ls.c ls-vdir.c
>  ls_SOURCES = ls.c ls-ls.c
>  chown_SOURCES = chown.c chown-core.c
>  chgrp_SOURCES = chgrp.c chown-core.c
> -shred_SOURCES = shred.c rand-isaac.c
> -sort_SOURCES = sort.c rand-isaac.c
>  
>  mv_SOURCES = mv.c copy.c cp-hash.c remove.c
>  rm_SOURCES = rm.c remove.c
> Index: src/rand-isaac.c
> ===================================================================
> RCS file: /fetish/cu/src/rand-isaac.c,v
> retrieving revision 1.4
> retrieving revision 1.6
> diff -p -u -r1.4 -r1.6
> --- src/rand-isaac.c  10 Dec 2005 22:10:53 -0000      1.4
> +++ src/rand-isaac.c  12 Dec 2005 22:42:58 -0000      1.6
> @@ -1,4 +1,4 @@
> -/* rand-isaac.c - ISAAC random number generator
> +/* Bob Jenkins's cryptographic random number generator, ISAAC.
>  
>     Copyright (C) 1999-2005 Free Software Foundation, Inc.
>     Copyright (C) 1997, 1998, 1999 Colin Plumb.
> @@ -21,13 +21,9 @@
>  
>  /*
>   * --------------------------------------------------------------------
> - *     Bob Jenkins' cryptographic random number generator, ISAAC.
> - *     Hacked by Colin Plumb.
> - *
> - * We need a source of random numbers for some of the overwrite data
> - * for shred. Cryptographically secure is desirable, but it's not
> - * life-or-death so I can be a little bit experimental in the choice
> - * of RNGs here.
> + * We need a source of random numbers for some data.
> + * Cryptographically secure is desirable, but it's not life-or-death
> + * so I can be a little bit experimental in the choice of RNGs here.
>   *
>   * This generator is based somewhat on RC4, but has analysis
>   * <http://burtleburtle.net/bob/rand/isaacafa.html>
> @@ -36,86 +32,72 @@
>   * --------------------------------------------------------------------
>   */
>  
> -
> -#ifdef HAVE_CONFIG_H
> -# include <config.h>
> -#endif
> -
> -#include "system.h"
>  #include "gethrxtime.h"
>  
> -#include "rand-isaac.h"
> -
> -#define ISAAC_SIZE(words) \
> -    (sizeof (struct isaac_state) - \
> -    (ISAAC_MAX_WORDS - words) * sizeof (uint32_t))
> +/* Size of the state tables to use.  ISAAC_LOG should be at least 3,
> +   and smaller values give less security.  */
> +#define ISAAC_LOG 8
> +#define ISAAC_WORDS (1 << ISAAC_LOG)
> +#define ISAAC_BYTES (ISAAC_WORDS * sizeof (uint32_t))
>  
> -/*
> - * Create a new state, optionally at the location specified. This
> - * should be called to create/initialize each new isaac_state. 'words'
> - * must be a power of 2, and should be at least 8. Smaller values give
> - * less security.
> - */
> -extern struct isaac_state *
> -isaac_new (struct isaac_state *s, int words)
> -{
> -  size_t w;
> -  size_t ss = ISAAC_SIZE (words);
> -  if (!s)
> -    {
> -      s = xmalloc (ss);
> -    }
> -  memset (s, 0, ss);
> -  s->words = words;
> -  s->log = 0;
> -  for (w=1; w<words; w<<=1, s->log++) {}
> -  return s;
> -}
> +/* RNG state variables */
> +struct isaac_state
> +  {
> +    uint32_t mm[ISAAC_WORDS];        /* Main state array */
> +    uint32_t iv[8];          /* Seeding initial vector */
> +    uint32_t a, b, c;                /* Extra index variables */
> +  };
> +
> +/* This index operation is more efficient on many processors */
> +#define ind(mm, x) \
> +  (* (uint32_t *) ((char *) (mm) \
> +                + ((x) & (ISAAC_WORDS - 1) * sizeof (uint32_t))))
>  
>  /*
> - * Copy a state. Destination block must be at least as large as
> - * source.
> + * The central step.  This uses two temporaries, x and y.  mm is the
> + * whole state array, while m is a pointer to the current word.  off is
> + * the offset from m to the word ISAAC_WORDS/2 words away in the mm array,
> + * i.e. +/- ISAAC_WORDS/2.
>   */
> -extern void
> -isaac_copy (struct isaac_state *dst, struct isaac_state *src)
> -{
> -  memcpy(dst, src, ISAAC_SIZE (src->words));
> -}
> +#define isaac_step(mix, a, b, mm, m, off, r) \
> +( \
> +  a = ((a) ^ (mix)) + (m)[off], \
> +  x = *(m), \
> +  *(m) = y = ind (mm, x) + (a) + (b), \
> +  *(r) = b = ind (mm, (y) >> ISAAC_LOG) + x \
> +)
>  
>  /*
>   * Refill the entire R array, and update S.
>   */
> -extern void
> -isaac_refill (struct isaac_state *s, uint32_t r[/* s>-words */])
> +static void
> +isaac_refill (struct isaac_state *s, uint32_t r[/* ISAAC_WORDS */])
>  {
>    uint32_t a, b;             /* Caches of a and b */
>    uint32_t x, y;             /* Temps needed by isaac_step macro */
>    uint32_t *m = s->mm;       /* Pointer into state array */
> -  int w = s->words;
>  
>    a = s->a;
>    b = s->b + (++s->c);
>  
>    do
>      {
> -      isaac_step (s, a << 13, a, b, m, w / 2, r);
> -      isaac_step (s, a >> 6, a, b, m + 1, w / 2, r + 1);
> -      isaac_step (s, a << 2, a, b, m + 2, w / 2, r + 2);
> -      isaac_step (s, a >> 16, a, b, m + 3, w / 2, r + 3);
> +      isaac_step (a << 13, a, b, s->mm, m, ISAAC_WORDS / 2, r);
> +      isaac_step (a >> 6, a, b, s->mm, m + 1, ISAAC_WORDS / 2, r + 1);
> +      isaac_step (a << 2, a, b, s->mm, m + 2, ISAAC_WORDS / 2, r + 2);
> +      isaac_step (a >> 16, a, b, s->mm, m + 3, ISAAC_WORDS / 2, r + 3);
>        r += 4;
>      }
> -  while ((m += 4) < s->mm + w / 2);
> -
> +  while ((m += 4) < s->mm + ISAAC_WORDS / 2);
>    do
>      {
> -      isaac_step (s, a << 13, a, b, m, -w / 2, r);
> -      isaac_step (s, a >> 6, a, b, m + 1, -w / 2, r + 1);
> -      isaac_step (s, a << 2, a, b, m + 2, -w / 2, r + 2);
> -      isaac_step (s, a >> 16, a, b, m + 3, -w / 2, r + 3);
> +      isaac_step (a << 13, a, b, s->mm, m, -ISAAC_WORDS / 2, r);
> +      isaac_step (a >> 6, a, b, s->mm, m + 1, -ISAAC_WORDS / 2, r + 1);
> +      isaac_step (a << 2, a, b, s->mm, m + 2, -ISAAC_WORDS / 2, r + 2);
> +      isaac_step (a >> 16, a, b, s->mm, m + 3, -ISAAC_WORDS / 2, r + 3);
>        r += 4;
>      }
> -  while ((m += 4) < s->mm + w);
> -
> +  while ((m += 4) < s->mm + ISAAC_WORDS);
>    s->a = a;
>    s->b = b;
>  }
> @@ -137,7 +119,7 @@ isaac_refill (struct isaac_state *s, uin
>  
>  /* The basic ISAAC initialization pass.  */
>  static void
> -isaac_mix (struct isaac_state *s, uint32_t const seed[/* s->words */])
> +isaac_mix (struct isaac_state *s, uint32_t const seed[/* ISAAC_WORDS */])
>  {
>    int i;
>    uint32_t a = s->iv[0];
> @@ -148,9 +130,8 @@ isaac_mix (struct isaac_state *s, uint32
>    uint32_t f = s->iv[5];
>    uint32_t g = s->iv[6];
>    uint32_t h = s->iv[7];
> -  uint32_t w = s->words;
>  
> -  for (i = 0; i < w; i += 8)
> +  for (i = 0; i < ISAAC_WORDS; i += 8)
>      {
>        a += seed[i];
>        b += seed[i + 1];
> @@ -185,15 +166,15 @@ isaac_mix (struct isaac_state *s, uint32
>  
>  #if 0 /* Provided for reference only; not used in this code */
>  /*
> - * Initialize the ISAAC RNG with the given seed material. Its size
> - * MUST be a multiple of s->words * sizeof (uint32_t), and may be
> + * Initialize the ISAAC RNG with the given seed material.
> + * Its size MUST be a multiple of ISAAC_BYTES, and may be
>   * stored in the s->mm array.
>   *
>   * This is a generalization of the original ISAAC initialization code
> - * to support larger seed sizes. For seed sizes of 0 and s->words *
> - * sizeof (uint32_t), it is identical.
> + * to support larger seed sizes.  For seed sizes of 0 and ISAAC_BYTES,
> + * it is identical.
>   */
> -extern void
> +static void
>  isaac_init (struct isaac_state *s, uint32_t const *seed, size_t seedsize)
>  {
>    static uint32_t const iv[8] =
> @@ -219,10 +200,10 @@ isaac_init (struct isaac_state *s, uint3
>        /* First pass (as in reference ISAAC code) */
>        isaac_mix (s, seed);
>        /* Second and subsequent passes (extension to ISAAC) */
> -      while (seedsize -= s->words * sizeof (uint32_t))
> +      while (seedsize -= ISAAC_BYTES)
>       {
> -       seed += s->words;
> -       for (i = 0; i < s->words; i++)
> +       seed += ISAAC_WORDS;
> +       for (i = 0; i < ISAAC_WORDS; i++)
>           s->mm[i] += seed[i];
>         isaac_mix (s, s->mm);
>       }
> @@ -230,7 +211,7 @@ isaac_init (struct isaac_state *s, uint3
>    else
>      {
>        /* The no seed case (as in reference ISAAC code) */
> -      for (i = 0; i < s->words; i++)
> +      for (i = 0; i < ISAAC_WORDS; i++)
>       s->mm[i] = 0;
>      }
>  
> @@ -240,7 +221,7 @@ isaac_init (struct isaac_state *s, uint3
>  #endif
>  
>  /* Start seeding an ISAAC structire */
> -extern void
> +static void
>  isaac_seed_start (struct isaac_state *s)
>  {
>    static uint32_t const iv[8] =
> @@ -260,15 +241,14 @@ isaac_seed_start (struct isaac_state *s)
>    for (i = 0; i < 8; i++)
>      s->iv[i] = iv[i];
>  
> -  /* used to do memset here to clear the state array, but now it's
> -     done in isaac_new */
> +  memset (s->mm, 0, sizeof s->mm);
>  
>    /* s->c gets used for a data pointer during the seeding phase */
>    s->a = s->b = s->c = 0;
>  }
>  
>  /* Add a buffer of seed material */
> -extern void
> +static void
>  isaac_seed_data (struct isaac_state *s, void const *buffer, size_t size)
>  {
>    unsigned char const *buf = buffer;
> @@ -276,7 +256,7 @@ isaac_seed_data (struct isaac_state *s, 
>    size_t avail;
>    size_t i;
>  
> -  avail = s->words - (size_t) s->c;  /* s->c is used as a write pointer */
> +  avail = sizeof s->mm - (size_t) s->c;      /* s->c is used as a write 
> pointer */
>  
>    /* Do any full buffers that are necessary */
>    while (size > avail)
> @@ -288,7 +268,7 @@ isaac_seed_data (struct isaac_state *s, 
>        size -= avail;
>        isaac_mix (s, s->mm);
>        s->c = 0;
> -      avail = s->words;
> +      avail = sizeof s->mm;
>      }
>  
>    /* And the final partial block */
> @@ -300,7 +280,7 @@ isaac_seed_data (struct isaac_state *s, 
>  
>  
>  /* End of seeding phase; get everything ready to produce output. */
> -extern void
> +static void
>  isaac_seed_finish (struct isaac_state *s)
>  {
>    isaac_mix (s, s->mm);
> @@ -314,7 +294,7 @@ isaac_seed_finish (struct isaac_state *s
>   * Get seed material.  16 bytes (128 bits) is plenty, but if we have
>   * /dev/urandom, we get 32 bytes = 256 bits for complete overkill.
>   */
> -extern void
> +static void
>  isaac_seed (struct isaac_state *s)
>  {
>    isaac_seed_start (s);
> @@ -353,56 +333,3 @@ isaac_seed (struct isaac_state *s)
>  
>    isaac_seed_finish (s);
>  }
> -
> -extern void
> -irand_init (struct irand_state *r, struct isaac_state *s)
> -{
> -  r->numleft = 0;
> -  r->s = s;
> -  memset (r->r, 0, s->words * sizeof (uint32_t));
> -}
> -
> -/*
> - * We take from the end of the block deliberately, so if we need
> - * only a small number of values, we choose the final ones which are
> - * marginally better mixed than the initial ones.
> - */
> -extern uint32_t
> -irand32 (struct irand_state *r)
> -{
> -  if (!r->numleft)
> -    {
> -      isaac_refill (r->s, r->r);
> -      r->numleft = r->s->words;
> -    }
> -  return r->r[--r->numleft];
> -}
> -
> -/*
> - * Return a uniformly distributed random number between 0 and n,
> - * inclusive.  Thus, the result is modulo n+1.
> - *
> - * Theory of operation: as x steps through every possible 32-bit number,
> - * x % n takes each value at least 2^32 / n times (rounded down), but
> - * the values less than 2^32 % n are taken one additional time.  Thus,
> - * x % n is not perfectly uniform.  To fix this, the values of x less
> - * than 2^32 % n are disallowed, and if the RNG produces one, we ask
> - * for a new value.
> - */
> -extern uint32_t
> -irand_mod (struct irand_state *r, uint32_t n)
> -{
> -  uint32_t x;
> -  uint32_t lim;
> -
> -  if (!++n)
> -    return irand32 (r);
> -
> -  lim = -n % n;                      /* == (2**32-n) % n == 2**32 % n */
> -  do
> -    {
> -      x = irand32 (r);
> -    }
> -  while (x < lim);
> -  return x % n;
> -}
> Index: src/shred.c
> ===================================================================
> RCS file: /fetish/cu/src/shred.c,v
> retrieving revision 1.119
> retrieving revision 1.121
> diff -p -u -r1.119 -r1.121
> --- src/shred.c       10 Dec 2005 10:02:24 -0000      1.119
> +++ src/shred.c       12 Dec 2005 22:43:16 -0000      1.121
> @@ -108,7 +108,8 @@
>  #include "inttostr.h"
>  #include "quotearg.h"                /* For quotearg_colon */
>  #include "quote.h"           /* For quotearg_colon */
> -#include "rand-isaac.h"              /* Random number stuff */
> +
> +#include "rand-isaac.c"
>  
>  #define DEFAULT_PASSES 25    /* Default */
>  
> @@ -226,6 +227,66 @@ to be recovered later.\n\
>    exit (status);
>  }
>  
> +/* Single-word RNG built on top of ISAAC */
> +struct irand_state
> +{
> +  uint32_t r[ISAAC_WORDS];
> +  unsigned int numleft;
> +  struct isaac_state *s;
> +};
> +
> +static void
> +irand_init (struct irand_state *r, struct isaac_state *s)
> +{
> +  r->numleft = 0;
> +  r->s = s;
> +}
> +
> +/*
> + * We take from the end of the block deliberately, so if we need
> + * only a small number of values, we choose the final ones which are
> + * marginally better mixed than the initial ones.
> + */
> +static uint32_t
> +irand32 (struct irand_state *r)
> +{
> +  if (!r->numleft)
> +    {
> +      isaac_refill (r->s, r->r);
> +      r->numleft = ISAAC_WORDS;
> +    }
> +  return r->r[--r->numleft];
> +}
> +
> +/*
> + * Return a uniformly distributed random number between 0 and n,
> + * inclusive.  Thus, the result is modulo n+1.
> + *
> + * Theory of operation: as x steps through every possible 32-bit number,
> + * x % n takes each value at least 2^32 / n times (rounded down), but
> + * the values less than 2^32 % n are taken one additional time.  Thus,
> + * x % n is not perfectly uniform.  To fix this, the values of x less
> + * than 2^32 % n are disallowed, and if the RNG produces one, we ask
> + * for a new value.
> + */
> +static uint32_t
> +irand_mod (struct irand_state *r, uint32_t n)
> +{
> +  uint32_t x;
> +  uint32_t lim;
> +
> +  if (!++n)
> +    return irand32 (r);
> +
> +  lim = -n % n;                      /* == (2**32-n) % n == 2**32 % n */
> +  do
> +    {
> +      x = irand32 (r);
> +    }
> +  while (x < lim);
> +  return x % n;
> +}
> +
>  /*
>   * Fill a buffer with a fixed pattern.
>   *
> @@ -255,19 +316,18 @@ fillpattern (int type, unsigned char *r,
>  
>  /*
>   * Fill a buffer, R (of size SIZE_MAX), with random data.
> - * SIZE is rounded UP to a multiple of s->words * sizeof (uint32_t).
> + * SIZE is rounded UP to a multiple of ISAAC_BYTES.
>   */
>  static void
>  fillrand (struct isaac_state *s, uint32_t *r, size_t size_max, size_t size)
>  {
> -  size_t bytes = s->words * sizeof (uint32_t);
> -  size = (size + bytes - 1) / bytes;
> +  size = (size + ISAAC_BYTES - 1) / ISAAC_BYTES;
>    assert (size <= size_max);
>  
>    while (size--)
>      {
>        isaac_refill (s, r);
> -      r += s->words;
> +      r += ISAAC_WORDS;
>      }
>  }
>  
> @@ -369,7 +429,7 @@ dopass (int fd, char const *qname, off_t
>    size_t soff;                       /* Offset into buffer for next write */
>    ssize_t ssize;             /* Return value from write */
>    uint32_t *r;                       /* Fill pattern.  */
> -  size_t rsize = 3 * MAX (s->words, 1024) * sizeof *r; /* Fill size.  */
> +  size_t rsize = 3 * MAX (ISAAC_WORDS, 1024) * sizeof *r; /* Fill size.  */
>    size_t ralign = lcm (getpagesize (), sizeof *r); /* Fill alignment.  */
>    char pass_string[PASS_NAME_SIZE];  /* Name of current pass */
>    bool write_error = false;
> @@ -1103,7 +1163,6 @@ main (int argc, char **argv)
>  
>    atexit (close_stdout);
>  
> -  isaac_new (&s, ISAAC_MAX_WORDS);
>    isaac_seed (&s);
>  
>    memset (&flags, 0, sizeof flags);
> Index: src/sort.c
> ===================================================================
> RCS file: /fetish/cu/src/sort.c,v
> retrieving revision 1.332
> retrieving revision 1.333
> diff -p -u -r1.332 -r1.333
> --- src/sort.c        10 Dec 2005 10:04:12 -0000      1.332
> +++ src/sort.c        12 Dec 2005 22:09:27 -0000      1.333
> @@ -30,10 +30,8 @@
>  #include "error.h"
>  #include "hard-locale.h"
>  #include "inttostr.h"
> -#include "md5.h"
>  #include "physmem.h"
>  #include "posixver.h"
> -#include "rand-isaac.h"
>  #include "quote.h"
>  #include "stdlib--.h"
>  #include "stdio--.h"
> @@ -79,7 +77,7 @@ double strtod ();
>  # define DEFAULT_TMPDIR "/tmp"
>  #endif
>  
> -#define SORT_ISAAC_WORDS 8
> +#include "rand-isaac.c"
>  
>  /* Exit statuses.  */
>  enum
> @@ -151,7 +149,7 @@ struct keyfield
>    bool numeric;                      /* Flag for numeric comparison.  Handle
>                                  strings of digits with optional decimal
>                                  point, but no exponential notation. */
> -  bool random_hash;          /* Shuffle by sorting on random hash of key */
> +  bool random;                       /* Sort by random hash of key.  */
>    bool general_numeric;              /* Flag for general, numeric comparison.
>                                  Handle numbers in exponential notation. */
>    bool month;                        /* Flag for comparison by month name. */
> @@ -319,7 +317,7 @@ Other options:\n\
>    -k, --key=POS1[,POS2]     start a key at POS1, end it at POS2 (origin 1)\n\
>    -m, --merge               merge already sorted files; do not sort\n\
>    -o, --output=FILE         write result to FILE instead of standard 
> output\n\
> -      --seed=STRING         use specified seed for random number generator\n\
> +      --seed=STRING         seed random hash function with STRING\n\
>    -s, --stable              stabilize sort by disabling last-resort 
> comparison\n\
>    -S, --buffer-size=SIZE    use SIZE for main memory buffer\n\
>  "), stdout);
> @@ -361,12 +359,15 @@ native byte values.\n\
>    exit (status);
>  }
>  
> -static char const short_options[] = "-bcdfgik:mMno:rRsS:t:T:uy:z";
> +/* For long options that have no equivalent short option, use a
> +   non-character as a pseudo short option, starting with CHAR_MAX + 1.  */
>  enum
>  {
>    SEED_OPTION = CHAR_MAX + 1
>  };
>  
> +static char const short_options[] = "-bcdfgik:mMno:rRsS:t:T:uy:z";
> +
>  static struct option const long_options[] =
>  {
>    {"ignore-leading-blanks", no_argument, NULL, 'b'},
> @@ -1166,27 +1167,19 @@ getmonth (char const *month, size_t len)
>    return 0;
>  }
>  
> -static struct isaac_state *rand_state;
> -
> -#define HASH_WORDS 1
> -#define HASH_SIZE (HASH_WORDS * sizeof (uint32_t))
> +/* The ISAAC state resulting from the user-specified seed, or from
> +   random data taken from the environment.  */
> +static struct isaac_state rand_state;
>  
> +/* Set *RESULT to the ISAAC state resulting from applying TEXT (with
> +   length LEN) to rand_state.  */
>  static void
> -get_hash (char const *text, size_t len, uint32_t resbuf[/*HASH_WORDS*/])
> +get_hash (char const *text, size_t len, uint32_t resbuf[ISAAC_WORDS])
>  {
> -  struct isaac_state s;
> -  int i;
> -  isaac_copy (&s, rand_state);
> +  struct isaac_state s = rand_state;
>    isaac_seed_data (&s, text, len);
> -  /* we can call isaac_seed_finish multiple times, but should only
> -     call isaac_seed_start once */
>    isaac_seed_finish (&s);
> -
> -  /* getting an irand_state and drawing random numbers would be more
> -     kosher here, but also slower. so we just peek at the ISAAC state
> -     array instead */
> -  for (i = 0; i < HASH_WORDS; i++)
> -    resbuf[i] = s.mm[s.words - 1 - i];
> +  isaac_refill (&s, resbuf);
>  }
>  
>  /* Compare two lines A and B trying every key in sequence until there
> @@ -1217,15 +1210,27 @@ keycompare (const struct line *a, const 
>  
>        /* Actually compare the fields. */
>  
> -      if (key->random_hash)
> +      if (key->random)
>          {
> -          uint32_t diga[HASH_WORDS];
> -          uint32_t digb[HASH_WORDS];
> +       int i;
> +       uint32_t diga[ISAAC_WORDS];
> +       uint32_t digb[ISAAC_WORDS];
>            get_hash (texta, lena, diga);
>            get_hash (textb, lenb, digb);
> -          diff = memcmp (diga, digb, sizeof (diga));
> -       if (diff)
> -         goto not_equal;
> +
> +       /* Go backwards, since the last words are marginally better
> +          mixed.  */
> +       for (i = ISAAC_WORDS - 1; 0 <= i; i--)
> +         if (diga[i] != digb[i])
> +           {
> +             diff = (diga[i] < digb[i] ? -1 : 1);
> +             goto not_equal;
> +           }
> +
> +       /* FIXME: Should break ties by rehashing with a different
> +          random hashing function (and repeat until the tie is
> +          broken) unless --seed was specified.  The probability of
> +          this being needed should be infinitesimal.  */
>          }
>  
>        if (key->numeric | key->general_numeric)
> @@ -2038,7 +2043,6 @@ badfieldspec (char const *spec, char con
>  {
>    error (SORT_FAILURE, 0, _("%s: invalid field specification %s"),
>        _(msgid), quote (spec));
> -  /* added to satisfy compiler for NORETURN: */
>    abort ();
>  }
>  
> @@ -2132,7 +2136,7 @@ set_ordering (const char *s, struct keyf
>         key->numeric = true;
>         break;
>       case 'R':
> -       key->random_hash = true;
> +       key->random = true;
>         break;
>       case 'r':
>         key->reverse = true;
> @@ -2162,8 +2166,8 @@ main (int argc, char **argv)
>    int c = 0;
>    bool checkonly = false;
>    bool mergeonly = false;
> -  char const *randseed = NULL;
> -  bool need_rand = false;
> +  char const *seed = NULL;
> +  bool need_random = false;
>    size_t nfiles = 0;
>    bool posixly_correct = (getenv ("POSIXLY_CORRECT") != NULL);
>    bool obsolete_usage = (posix2_version () < 200112);
> @@ -2242,7 +2246,7 @@ main (int argc, char **argv)
>    gkey.sword = gkey.eword = SIZE_MAX;
>    gkey.ignore = NULL;
>    gkey.translate = NULL;
> -  gkey.numeric = gkey.general_numeric = gkey.random_hash = false;
> +  gkey.numeric = gkey.general_numeric = gkey.random = false;
>    gkey.month = gkey.reverse = false;
>    gkey.skipsblanks = gkey.skipeblanks = false;
>  
> @@ -2397,7 +2401,7 @@ main (int argc, char **argv)
>         break;
>  
>          case SEED_OPTION:
> -          randseed = optarg;
> +       seed = optarg;
>            break;
>  
>       case 's':
> @@ -2474,16 +2478,14 @@ main (int argc, char **argv)
>       }
>      }
>  
> -  need_rand |= gkey.random_hash;
>    /* Inheritance of global options to individual keys. */
>    for (key = keylist; key; key = key->next)
>      {
> -      need_rand |= key->random_hash;
>        if (! (key->ignore || key->translate
>               || (key->skipsblanks | key->reverse
>                   | key->skipeblanks | key->month | key->numeric
>                   | key->general_numeric
> -                 | key->random_hash)))
> +                 | key->random)))
>          {
>            key->ignore = gkey.ignore;
>            key->translate = gkey.translate;
> @@ -2492,31 +2494,35 @@ main (int argc, char **argv)
>            key->month = gkey.month;
>            key->numeric = gkey.numeric;
>            key->general_numeric = gkey.general_numeric;
> -          key->random_hash = gkey.random_hash;
> +          key->random = gkey.random;
>            key->reverse = gkey.reverse;
>          }
> -    }
>  
> -  if (need_rand)
> -    {
> -      rand_state = isaac_new (NULL, SORT_ISAAC_WORDS);
> -      if (randseed)
> -        {
> -          isaac_seed_start (rand_state);
> -          isaac_seed_data (rand_state, randseed, strlen (randseed));
> -          isaac_seed_finish (rand_state);
> -        }
> -      else
> -        isaac_seed (rand_state);
> +      need_random |= key->random;
>      }
>  
>    if (!keylist && (gkey.ignore || gkey.translate
>                  || (gkey.skipsblanks | gkey.skipeblanks | gkey.month
>                      | gkey.numeric | gkey.general_numeric
> -                       | gkey.random_hash)))
> -    insertkey (&gkey);
> +                       | gkey.random)))
> +    {
> +      insertkey (&gkey);
> +      need_random |= gkey.random;
> +    }
> +
>    reverse = gkey.reverse;
>  
> +  if (need_random)
> +    {
> +      if (seed)
> +     {
> +       isaac_seed_start (&rand_state);
> +       isaac_seed_data (&rand_state, seed, strlen (seed));
> +     }
> +      else
> +     isaac_seed (&rand_state);
> +    }
> +
>    if (temp_dir_count == 0)
>      {
>        char const *tmp_dir = getenv ("TMPDIR");
> 




reply via email to

[Prev in Thread] Current Thread [Next in Thread]