bug-coreutils
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

bug#6524: du now uses less than half as much memory, sometimes


From: Jim Meyering
Subject: bug#6524: du now uses less than half as much memory, sometimes
Date: Mon, 28 Jun 2010 00:09:04 +0200

FYI,
This is just a preliminary heads-up.  At least one detail must be
addressed before I push: I haven't written justification for the
change to hash.c (I confess that I don't remember -- parts of this
change date back to Dec 2008 ;-)

Feedback welcome, as usual.
---------------------------

Quick summary, here's the NEWS addition:

   du now uses less than half as much memory when operating on trees
   with many hard-linked files.  With --count-links (-l), or when
   operating on trees with no hard-linked files, there is no change.

This started about two years ago, when someone reported that du
would use what seemed like an inordinate amount of RAM when
operating on many hard-linked files.

As you can see (read the comments and logs below), du must use *some*
space for each and every file with a link count of 2 or greater.
However, there was plenty of room for improvement.

>From the last two change sets, (included mostly for reference -- I
doubt I'll push either), you can see that the improved du now consumes
far less memory than the old one.  Somewhat less than half as much.
Note the vertical scale change and that the total instruction count
(x axis) is nearly identical; the new one is a tiny bit faster.

[Notice the peaks; they occur each time hash.c rehashes into a larger table.
 Also notice that with the old version, memory usage grows between rehash
 calls, while in the new one, it is flat. ]

+On an x86_64 system:
+
+--------------------------------------------------------------------------------
+Command:            ./src/du.old -a /t/z
+Massif arguments:   --massif-out-file=m.old
+ms_print arguments: m.old
+--------------------------------------------------------------------------------
+
+
+    MB
+5.618^                                                              #
+     |                                                              #
+     |                                                              #
+     |                                                              #
+     |                                                              #
+     |                                                              #     :::
+     |                                            @                 #::::::::
+     |                                            @                 #::::::::
+     |                                            @               ::#::::::::
+     |                                            @       :@:::::@::#:::::::@
+     |                               @            @:::@::::@:::::@::#:::::::@
+     |                               @            @:::@::::@:::::@::#:::::::@
+     |                               @   ::::@::::@:::@::::@:::::@::#:::::::@
+     |                      @@       @:::: ::@::::@:::@::::@:::::@::#:::::::@
+     |                      @     :::@: :: ::@::::@:::@::::@:::::@::#:::::::@
+     |               @      @ :::::::@: :: ::@::::@:::@::::@:::::@::#:::::::@
+     |               @::::::@ :: ::::@: :: ::@::::@:::@::::@:::::@::#:::::::@
+     |             ::@:: :: @ :: ::::@: :: ::@::::@:::@::::@:::::@::#:::::::@
+     |     @  :@:::::@:: :: @ :: ::::@: :: ::@::::@:::@::::@:::::@::#:::::::@
+     |  ::@@:::@:: ::@:: :: @ :: ::::@: :: ::@::::@:::@::::@:::::@::#:::::::@
+   0 
+----------------------------------------------------------------------->Mi
+     0                                                                   189.6
+
+Number of snapshots: 69
+ Detailed snapshots: [4, 5, 9, 14, 19, 26, 32, 37, 42, 47, 54, 57 (peak), 67]
+
+--------------------------------------------------------------------------------
+  n        time(i)         total(B)   useful-heap(B) extra-heap(B)    stacks(B)
+--------------------------------------------------------------------------------
+  0              0                0                0             0            0
+  1      2,515,906          185,176          170,412        14,764            0
+  2      5,834,589          258,000          231,344        26,656            0
+--------------------------------------------------------------------------------
+Command:            ./src/du.new -a /t/z
+Massif arguments:   --massif-out-file=m.new
+ms_print arguments: m.new
+--------------------------------------------------------------------------------
+
+
+    MB
+3.145^                                                     #
+     |                                                     #
+     |                                                     #
+     |                                                     #
+     |                                                     #
+     |                                                     #
+     |                                     @@              #
+     |                                     @               #
+     |                                     @               # ::
+     |                                     @               #:::::@::::@:::::@
+     |                          @          @               #:::::@::::@:::::@
+     |                          @          @               #:::::@::::@:::::@
+     |                          @          @ ::::::::@:::::#:::::@::::@:::::@
+     |                  @@      @          @ ::::: ::@:::::#:::::@::::@:::::@
+     |                  @       @::::::::::@ ::::: ::@:::::#:::::@::::@:::::@
+     |                  @       @::::::::::@ ::::: ::@:::::#:::::@::::@:::::@
+     |                  @ ::::::@::::::::::@ ::::: ::@:::::#:::::@::::@:::::@
+     |             :::@@@ ::::::@::::::::::@ ::::: ::@:::::#:::::@::::@:::::@
+     |       @ :::@:: @@@ ::::::@::::::::::@ ::::: ::@:::::#:::::@::::@:::::@
+     |  :@::@@@:::@:: @@@ ::::::@::::::::::@ ::::: ::@:::::#:::::@::::@:::::@
+   0 
+----------------------------------------------------------------------->Mi
+     0                                                                   188.8
+
+Number of snapshots: 93
+ Detailed snapshots: [4, 7, 8, 9, 14, 18, 19, 21, 29, 40, 51, 60 (peak), 70, 
80, 90]
+
+--------------------------------------------------------------------------------
+  n        time(i)         total(B)   useful-heap(B) extra-heap(B)    stacks(B)
+--------------------------------------------------------------------------------
+  0              0                0                0             0            0
+  1      3,286,259          134,120          129,254         4,866            0
+  2      5,525,860          226,432          218,599         7,833            0
--
1.7.1.764.g84d391




>From bc84b17d6436e9efb8d8946bd4b6e28ef40621b8 Mon Sep 17 00:00:00 2001
From: Jim Meyering <address@hidden>
Date: Fri, 11 Jun 2010 19:43:49 +0200
Subject: [PATCH 1/7] include patched hash module

* gl/lib/hash.c.diff: New file.
* gl/lib/hash.h.diff: New file.
---
 gl/lib/hash.c.diff |   96 ++++++++++++++++++++++++++++++++++++++++++++++++++++
 gl/lib/hash.h.diff |   12 ++++++
 2 files changed, 108 insertions(+), 0 deletions(-)
 create mode 100644 gl/lib/hash.c.diff
 create mode 100644 gl/lib/hash.h.diff

diff --git a/gl/lib/hash.c.diff b/gl/lib/hash.c.diff
new file mode 100644
index 0000000..72b1cb4
--- /dev/null
+++ b/gl/lib/hash.c.diff
@@ -0,0 +1,96 @@
+diff --git a/lib/hash.c b/lib/hash.c
+index f9abb9f..97dbd40 100644
+--- a/lib/hash.c
++++ b/lib/hash.c
+@@ -1020,14 +1020,13 @@ hash_rehash (Hash_table *table, size_t candidate)
+   return false;
+ }
+
+-/* If ENTRY matches an entry already in the hash table, return the pointer
+-   to the entry from the table.  Otherwise, insert ENTRY and return ENTRY.
+-   Return NULL if the storage required for insertion cannot be allocated.
+-   This implementation does not support duplicate entries or insertion of
+-   NULL.  */
+-
+-void *
+-hash_insert (Hash_table *table, const void *entry)
++/* Return -1 upon memory allocation failure.
++   Return 1 if insertion succeeded.
++   Return 0 if there is already a matching entry in the table,
++   and in that case, if MATCHED_ENT is non-NULL, set *MATCHED_ENT
++   to that entry. */
++int
++hash_insert0 (Hash_table *table, void const *entry, void const **matched_ent)
+ {
+   void *data;
+   struct hash_entry *bucket;
+@@ -1038,7 +1037,11 @@ hash_insert (Hash_table *table, const void *entry)
+
+   /* If there's a matching entry already in the table, return that.  */
+   if ((data = hash_find_entry (table, entry, &bucket, false)) != NULL)
+-    return data;
++    {
++      if (matched_ent)
++        *matched_ent = data;
++      return 0;
++    }
+
+   /* If the growth threshold of the buckets in use has been reached, increase
+      the table size and rehash.  There's no point in checking the number of
+@@ -1062,11 +1065,11 @@ hash_insert (Hash_table *table, const void *entry)
+                 * tuning->growth_threshold));
+
+           if (SIZE_MAX <= candidate)
+-            return NULL;
++            return -1;
+
+           /* If the rehash fails, arrange to return NULL.  */
+           if (!hash_rehash (table, candidate))
+-            return NULL;
++            return -1;
+
+           /* Update the bucket we are interested in.  */
+           if (hash_find_entry (table, entry, &bucket, false) != NULL)
+@@ -1081,7 +1084,7 @@ hash_insert (Hash_table *table, const void *entry)
+       struct hash_entry *new_entry = allocate_entry (table);
+
+       if (new_entry == NULL)
+-        return NULL;
++        return -1;
+
+       /* Add ENTRY in the overflow of the bucket.  */
+
+@@ -1089,7 +1092,7 @@ hash_insert (Hash_table *table, const void *entry)
+       new_entry->next = bucket->next;
+       bucket->next = new_entry;
+       table->n_entries++;
+-      return (void *) entry;
++      return 1;
+     }
+
+   /* Add ENTRY right in the bucket head.  */
+@@ -1098,7 +1101,23 @@ hash_insert (Hash_table *table, const void *entry)
+   table->n_entries++;
+   table->n_buckets_used++;
+
+-  return (void *) entry;
++  return 1;
++}
++
++/* If ENTRY matches an entry already in the hash table, return the pointer
++   to the entry from the table.  Otherwise, insert ENTRY and return ENTRY.
++   Return NULL if the storage required for insertion cannot be allocated.
++   This implementation does not support duplicate entries or insertion of
++   NULL.  */
++
++void *
++hash_insert (Hash_table *table, void const *entry)
++{
++  void const *matched_ent;
++  int err = hash_insert0 (table, entry, &matched_ent);
++  return (err == -1
++          ? NULL
++          : (void *) (err == 0 ? matched_ent : entry));
+ }
+
+ /* If ENTRY is already in the table, remove it and return the just-deleted
diff --git a/gl/lib/hash.h.diff b/gl/lib/hash.h.diff
new file mode 100644
index 0000000..8219911
--- /dev/null
+++ b/gl/lib/hash.h.diff
@@ -0,0 +1,12 @@
+diff --git a/lib/hash.h b/lib/hash.h
+index e795cdc..0ae3fe5 100644
+--- a/lib/hash.h
++++ b/lib/hash.h
+@@ -88,6 +88,7 @@ void hash_free (Hash_table *);
+ /* Insertion and deletion.  */
+ bool hash_rehash (Hash_table *, size_t) ATTRIBUTE_WUR;
+ void *hash_insert (Hash_table *, const void *) ATTRIBUTE_WUR;
++int hash_insert0 (Hash_table *table, const void *entry, const void 
**matched_ent);
+ void *hash_delete (Hash_table *, const void *);
+
+ #endif
--
1.7.1.764.g84d391


>From 563f6db676ff8c6e72345f0e446a707c534cdb58 Mon Sep 17 00:00:00 2001
From: Jim Meyering <address@hidden>
Date: Sun, 27 Jun 2010 23:26:46 +0200
Subject: [PATCH 2/7] dev-map: map device number to small non-negative

* gl/lib/dev-map.c: New file.
* gl/lib/dev-map.h: Declarations.
* gl/modules/dev-map: Define primary modules.
* gl/modules/dev-map-tests: Define test module.
* gl/tests/test-dev-map.c: Test it.
---
 gl/lib/dev-map.c         |  110 ++++++++++++++++++++++++++++++++++++++++++++++
 gl/lib/dev-map.h         |   23 ++++++++++
 gl/modules/dev-map       |   23 ++++++++++
 gl/modules/dev-map-tests |   10 ++++
 gl/tests/test-dev-map.c  |   63 ++++++++++++++++++++++++++
 5 files changed, 229 insertions(+), 0 deletions(-)
 create mode 100644 gl/lib/dev-map.c
 create mode 100644 gl/lib/dev-map.h
 create mode 100644 gl/modules/dev-map
 create mode 100644 gl/modules/dev-map-tests
 create mode 100644 gl/tests/test-dev-map.c

diff --git a/gl/lib/dev-map.c b/gl/lib/dev-map.c
new file mode 100644
index 0000000..363f5af
--- /dev/null
+++ b/gl/lib/dev-map.c
@@ -0,0 +1,110 @@
+/* Map a dev_t device number to a small integer.
+
+   Copyright 2009-2010 Free Software Foundation, Inc.
+
+   This program is free software: you can redistribute it and/or modify
+   it under the terms of the GNU General Public License as published by
+   the Free Software Foundation; either version 3 of the License, or
+   (at your option) any later version.
+
+   This program is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+   GNU General Public License for more details.
+
+   You should have received a copy of the GNU General Public License
+   along with this program.  If not, see <http://www.gnu.org/licenses/>.  */
+
+/* written by Jim Meyering */
+
+#include <config.h>
+#include "dev-map.h"
+
+#include <stdio.h>
+#include <assert.h>
+#include <stdint.h>
+#include <stdlib.h>
+
+struct dev_map_ent
+{
+  dev_t dev;
+  uint32_t mapped_dev;
+};
+
+enum { INITIAL_DEV_MAP_TABLE_SIZE = 31 };
+
+static size_t
+dev_hash (void const *x, size_t table_size)
+{
+  struct dev_map_ent const *p = x;
+  return (uintmax_t) p->dev % table_size;
+}
+
+/* Compare two dev_map_ent structs on "dev".
+   Return true if they are the same.  */
+static bool
+dev_compare (void const *x, void const *y)
+{
+  struct dev_map_ent const *a = x;
+  struct dev_map_ent const *b = y;
+  return a->dev == b->dev ? true : false;
+}
+
+/* Initialize state.  */
+int
+dev_map_init (struct dev_map *dev_map)
+{
+  assert (dev_map != NULL);
+  dev_map->n_device = 0;
+  dev_map->dev_map = hash_initialize (INITIAL_DEV_MAP_TABLE_SIZE, NULL,
+                                            dev_hash, dev_compare, free);
+  return dev_map->dev_map ? 0 : -1;
+}
+
+void
+dev_map_free (struct dev_map *dev_map)
+{
+  hash_free (dev_map->dev_map);
+}
+
+/* Insert device number, DEV, into the map, DEV_MAP if it's not there already,
+   and return the nonnegative, mapped and usually smaller, number.
+   If DEV is already in DEV_MAP, return the existing mapped value.
+   Upon memory allocation failure, return -1.  */
+int
+dev_map_insert (struct dev_map *dev_map, dev_t dev)
+{
+  /* Attempt to insert <DEV,n_device> in the map.
+     Possible outcomes:
+     - DEV was already there, with a different necessarily smaller value
+     - DEV was not there, (thus was just inserted)
+     - insert failed: out of memory
+     Return -1 on out of memory.
+  */
+
+  struct dev_map_ent *ent_from_table;
+  struct dev_map_ent *ent = malloc (sizeof *ent);
+  if (!ent)
+    return -1;
+
+  ent->dev = dev;
+  ent->mapped_dev = dev_map->n_device;
+
+  ent_from_table = hash_insert (dev_map->dev_map, ent);
+  if (ent_from_table == NULL)
+    {
+      /* Insertion failed due to lack of memory.  */
+      return -1;
+    }
+
+  if (ent_from_table == ent)
+    {
+      /* Insertion succeeded: new device.  */
+      return dev_map->n_device++;
+    }
+
+  /* That DEV value is already in the table, so ENT was not inserted.
+     Free it and return the already-associated value.  */
+  free (ent);
+  return ent_from_table->mapped_dev;
+}
diff --git a/gl/lib/dev-map.h b/gl/lib/dev-map.h
new file mode 100644
index 0000000..f093d90
--- /dev/null
+++ b/gl/lib/dev-map.h
@@ -0,0 +1,23 @@
+#include <stddef.h>
+#include <sys/types.h>
+#include <sys/stat.h>
+#include "hash.h"
+
+struct dev_map
+{
+  /* KEY,VAL pair, where KEY is the raw st_dev value
+     and VAL is the small number that maps to.  */
+  struct hash_table *dev_map;
+  size_t n_device;
+};
+
+#undef _ATTRIBUTE_NONNULL_
+#if __GNUC__ == 3 && __GNUC_MINOR__ >= 3 || 3 < __GNUC__
+# define _ATTRIBUTE_NONNULL_(m) __attribute__ ((__nonnull__ (m)))
+#else
+# define _ATTRIBUTE_NONNULL_(m)
+#endif
+
+int dev_map_init (struct dev_map *) _ATTRIBUTE_NONNULL_ (1);
+void dev_map_free (struct dev_map *) _ATTRIBUTE_NONNULL_ (1);
+int dev_map_insert (struct dev_map *, dev_t) _ATTRIBUTE_NONNULL_ (1);
diff --git a/gl/modules/dev-map b/gl/modules/dev-map
new file mode 100644
index 0000000..91f437b
--- /dev/null
+++ b/gl/modules/dev-map
@@ -0,0 +1,23 @@
+Description:
+maintain a mapping of dev_t numbers to small integers
+
+Files:
+lib/dev-map.c
+lib/dev-map.h
+
+Depends-on:
+hash
+
+configure.ac:
+
+Makefile.am:
+lib_SOURCES += dev-map.c dev-map.h
+
+Include:
+"dev-map.h"
+
+License
+GPL
+
+Maintainer:
+Jim Meyering
diff --git a/gl/modules/dev-map-tests b/gl/modules/dev-map-tests
new file mode 100644
index 0000000..4bec2e6
--- /dev/null
+++ b/gl/modules/dev-map-tests
@@ -0,0 +1,10 @@
+Files:
+tests/test-dev-map.c
+
+Depends-on:
+
+configure.ac:
+
+Makefile.am:
+TESTS += test-dev-map
+check_PROGRAMS += test-dev-map
diff --git a/gl/tests/test-dev-map.c b/gl/tests/test-dev-map.c
new file mode 100644
index 0000000..98ba630
--- /dev/null
+++ b/gl/tests/test-dev-map.c
@@ -0,0 +1,63 @@
+/* Test the dev-map module.
+   Copyright (C) 2010 Free Software Foundation, Inc.
+
+   This program is free software: you can redistribute it and/or modify
+   it under the terms of the GNU General Public License as published by
+   the Free Software Foundation; either version 3 of the License, or
+   (at your option) any later version.
+
+   This program is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+   GNU General Public License for more details.
+
+   You should have received a copy of the GNU General Public License
+   along with this program.  If not, see <http://www.gnu.org/licenses/>.  */
+
+/* Written by Jim Meyering.  */
+
+#include <config.h>
+#include <stdlib.h>
+#include <stdio.h>
+#include <string.h>
+
+/* FIXME: once/if in gnulib, use #include "macros.h" in place of this */
+#define ASSERT(expr) \
+  do                                                                         \
+    {                                                                        \
+      if (!(expr))                                                           \
+        {                                                                    \
+          fprintf (stderr, "%s:%d: assertion failed\n", __FILE__, __LINE__); \
+          fflush (stderr);                                                   \
+          abort ();                                                          \
+        }                                                                    \
+    }                                                                        \
+  while (0)
+
+#include "dev-map.h"
+
+/* Risky: this is also defined in di-set.c; they should match.  */
+#define N_DEV_BITS_4 5
+
+int
+main ()
+{
+  /* set_program_name (argv[0]); placate overzealous "syntax-check" test.  */
+  struct dev_map dev_map;
+  ASSERT (dev_map_init (&dev_map) == 0);
+
+  ASSERT (dev_map_insert (&dev_map, 42) == 0);
+  ASSERT (dev_map_insert (&dev_map, 42) == 0);
+  ASSERT (dev_map_insert (&dev_map, 398) == 1);
+  ASSERT (dev_map_insert (&dev_map, 398) == 1);
+
+  int32_t i;
+  for (i = 0; i < (1 << N_DEV_BITS_4); i++)
+    {
+      ASSERT (dev_map_insert (&dev_map, 10000+i) == 2+i);
+    }
+
+  dev_map_free (&dev_map);
+
+  return 0;
+}
--
1.7.1.764.g84d391


>From 4fe5a595909b4f8d7b0f96d4d0a18d1e0f8906d2 Mon Sep 17 00:00:00 2001
From: Jim Meyering <address@hidden>
Date: Sun, 27 Jun 2010 23:29:07 +0200
Subject: [PATCH 3/7] di-set: manipulate sets of dev/inode pairs efficiently

* gl/lib/di-set.c: Implementation.
* gl/lib/di-set.h: Declarations.
* gl/modules/di-set: Define module.
* gl/modules/di-set-tests: Define test module.
* gl/tests/test-di-set.c: Likewise.
---
 gl/lib/di-set.c         |  276 +++++++++++++++++++++++++++++++++++++++++++++++
 gl/lib/di-set.h         |   28 +++++
 gl/modules/di-set       |   25 +++++
 gl/modules/di-set-tests |   10 ++
 gl/tests/test-di-set.c  |   85 +++++++++++++++
 5 files changed, 424 insertions(+), 0 deletions(-)
 create mode 100644 gl/lib/di-set.c
 create mode 100644 gl/lib/di-set.h
 create mode 100644 gl/modules/di-set
 create mode 100644 gl/modules/di-set-tests
 create mode 100644 gl/tests/test-di-set.c

diff --git a/gl/lib/di-set.c b/gl/lib/di-set.c
new file mode 100644
index 0000000..3c4717b
--- /dev/null
+++ b/gl/lib/di-set.c
@@ -0,0 +1,276 @@
+/* Set operations for device-inode pairs stored in a space-efficient manner.
+
+   Copyright 2009-2010 Free Software Foundation, Inc.
+
+   This program is free software: you can redistribute it and/or modify
+   it under the terms of the GNU General Public License as published by
+   the Free Software Foundation; either version 3 of the License, or
+   (at your option) any later version.
+
+   This program is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+   GNU General Public License for more details.
+
+   You should have received a copy of the GNU General Public License
+   along with this program.  If not, see <http://www.gnu.org/licenses/>.  */
+
+/* written by Jim Meyering */
+
+#include <config.h>
+#include "di-set.h"
+
+#include <stdio.h>
+#include <assert.h>
+#include <stdint.h>
+#include <stdlib.h>
+#include <sys/types.h>
+#include <sys/stat.h>
+
+#include "verify.h"
+
+/* Set operations for device-inode pairs stored in a space-efficient manner.
+   A naive mapping uses 16 bytes to save a single st_dev, st_ino pair.
+   However, in many applications, the vast majority of actual device,inode
+   number pairs can be efficiently compressed to fit in 8 or even 4 bytes,
+   by using a separate table to map a relatively small number of devices
+   to small integers.  */
+
+#define N_DEV_BITS_4 5
+#define N_INO_BITS_4 (32 - N_DEV_BITS_4 - 2 - 1)
+
+#define N_DEV_BITS_8 8
+#define N_INO_BITS_8 (64 - N_DEV_BITS_8 - 2 - 1)
+
+/* Note how the last bit is always set.
+   This is required, in order to be able to distinguish
+   an encoded di_ent value from a malloc-returned pointer,
+   which must be 4-byte-aligned or better.  */
+struct dev_ino_4
+{
+  uint32_t mode:2; /* must be first */
+  uint32_t short_ino:N_INO_BITS_4;
+  uint32_t mapped_dev:N_DEV_BITS_4;
+  uint32_t always_set:1;
+};
+verify (N_DEV_BITS_4 <= 8 * sizeof (int));
+verify (sizeof (struct dev_ino_4) == 4);
+
+struct dev_ino_8
+{
+  uint32_t mode:2; /* must be first */
+  uint64_t short_ino:N_INO_BITS_8;
+  uint32_t mapped_dev:N_DEV_BITS_8;
+  uint32_t always_set:1;
+};
+verify (sizeof (struct dev_ino_8) == 8);
+
+struct dev_ino_full
+{
+  uint32_t mode:2; /* must be first */
+  dev_t dev;
+  ino_t ino;
+};
+
+enum di_mode
+{
+  DI_MODE_4 = 1,
+  DI_MODE_8 = 2,
+  DI_MODE_FULL = 3
+};
+
+/*
+     di_mode     raw_inode      mapped dev    always_set
+          \____________|_______________\_____/
+  4-byte | 2|          25            |  5  |1|          mapped_dev
+         `----------------------------------------------------|-----.
+  8-byte | 2|        53                                  |    8   |1|
+         `----------------------------------------------------------'
+*/
+struct di_ent
+{
+  union
+  {
+    struct dev_ino_4 di4;
+    struct dev_ino_8 di8;
+    struct dev_ino_full full;
+    uint32_t u32;
+    uint64_t u64;
+    void *ptr;
+  } u;
+};
+
+struct dev_map_ent
+{
+  dev_t dev;
+  uint32_t mapped_dev;
+};
+
+static inline bool
+is_encoded_ptr (struct di_ent const *v)
+{
+  return (size_t) v % 4;
+}
+
+static struct di_ent
+decode_ptr (struct di_ent const *v)
+{
+  if (!is_encoded_ptr (v))
+    return *v;
+
+  struct di_ent di;
+  di.u.ptr = (void *) v;
+  return di;
+}
+
+static size_t
+di_ent_hash (void const *x, size_t table_size)
+{
+  struct di_ent e = decode_ptr (x);
+  return (e.u.di4.mode == DI_MODE_4
+          ? e.u.u32
+          : (e.u.di4.mode == DI_MODE_8
+             ? e.u.u64
+             : e.u.full.ino)) % table_size;
+}
+
+/* Compare two di_ent structs.
+   Return true if they are the same.  */
+static bool
+di_ent_compare (void const *x, void const *y)
+{
+  struct di_ent a = decode_ptr (x);
+  struct di_ent b = decode_ptr (y);
+  if (a.u.di4.mode != b.u.di4.mode)
+    return false;
+
+  if (a.u.di4.mode == DI_MODE_4)
+    return (a.u.di4.short_ino == b.u.di4.short_ino
+            && a.u.di4.mapped_dev == b.u.di4.mapped_dev);
+
+  if (a.u.di8.mode == DI_MODE_8)
+    return (a.u.di8.short_ino == b.u.di8.short_ino
+            && a.u.di8.mapped_dev == b.u.di8.mapped_dev);
+
+  return (a.u.full.ino == b.u.full.ino
+          && a.u.full.dev == b.u.full.dev);
+}
+
+static void
+di_ent_free (void *v)
+{
+  if ( ! is_encoded_ptr (v))
+    free (v);
+}
+
+int
+di_set_init (struct di_set_state *dis, size_t initial_size)
+{
+  if (dev_map_init (&dis->dev_map) < 0)
+    return -1;
+
+  dis->di_set = hash_initialize (initial_size, NULL,
+                                 di_ent_hash, di_ent_compare, di_ent_free);
+  return dis->di_set ? 0 : -1;
+}
+
+void
+di_set_free (struct di_set_state *dis)
+{
+  dev_map_free (&dis->dev_map);
+  hash_free (dis->di_set);
+}
+
+/* Given a device-inode set, DIS, create an entry for the DEV,INO
+   pair, and store it in *V.  If possible, encode DEV,INO into the pointer
+   itself, but if not, allocate space for a full "struct di_ent" and set *V
+   to that pointer.  Upon memory allocation failure, return -1.
+   Otherwise return 0.  */
+int
+di_ent_create (struct di_set_state *dis,
+               dev_t dev, ino_t ino,
+               struct di_ent **v)
+{
+  static int prev_m = -1;
+  static dev_t prev_dev = -1;
+  struct di_ent di_ent;
+  int mapped_dev;
+
+  if (dev == prev_dev)
+    mapped_dev = prev_m;
+  else
+    {
+      mapped_dev = dev_map_insert (&dis->dev_map, dev);
+      if (mapped_dev < 0)
+        return -1;
+      prev_dev = dev;
+      prev_m = mapped_dev;
+    }
+
+  if (mapped_dev < (1 << N_DEV_BITS_4)
+      && ino < (1 << N_INO_BITS_4))
+    {
+#if lint
+      /* When this struct is smaller than a pointer, initialize
+         the pointer so tools like valgrind don't complain about
+         the uninitialized bytes.  */
+      if (sizeof di_ent.u.di4 < sizeof di_ent.u.ptr)
+        di_ent.u.ptr = NULL;
+#endif
+      di_ent.u.di4.mode = DI_MODE_4;
+      di_ent.u.di4.short_ino = ino;
+      di_ent.u.di4.mapped_dev = mapped_dev;
+      di_ent.u.di4.always_set = 1;
+      *v = di_ent.u.ptr;
+    }
+  else if (mapped_dev < (1 << N_DEV_BITS_8)
+           && ino < ((uint64_t) 1 << N_INO_BITS_8))
+    {
+      di_ent.u.di8.mode = DI_MODE_8;
+      di_ent.u.di8.short_ino = ino;
+      di_ent.u.di8.mapped_dev = mapped_dev;
+      di_ent.u.di8.always_set = 1;
+      *v = di_ent.u.ptr;
+    }
+  else
+    {
+      /* Handle the case in which INO is too large or in which (far less
+         likely) we encounter hard-linked files on 2^N_DEV_BITS_8
+         different devices.  */
+      struct di_ent *p = malloc (sizeof *p);
+      if (!p)
+        return -1;
+      assert ((size_t) p % 4 == 0);
+      p->u.full.mode = DI_MODE_FULL;
+      p->u.full.ino = ino;
+      p->u.full.dev = dev;
+      *v = p;
+    }
+
+  return 0;
+}
+
+/* Attempt to insert the DEV,INO pair into the set, DIS.
+   If it matches a pair already in DIS, don't modify DIS and return 0.
+   Otherwise, if insertion is successful, return 1.
+   Upon any failure return -1.  */
+int
+di_set_insert (struct di_set_state *dis, dev_t dev, ino_t ino)
+{
+  struct di_ent *v;
+  if (di_ent_create (dis, dev, ino, &v) < 0)
+    return -1;
+
+  int err = hash_insert0 (dis->di_set, v, NULL);
+  if (err == -1) /* Insertion failed due to lack of memory.  */
+    return -1;
+
+  if (err == 1) /* Insertion succeeded.  */
+    return 1;
+
+  /* That pair is already in the table, so ENT was not inserted.  Free it.  */
+  if (! is_encoded_ptr (v))
+    free (v);
+
+  return 0;
+}
diff --git a/gl/lib/di-set.h b/gl/lib/di-set.h
new file mode 100644
index 0000000..f90d0dd
--- /dev/null
+++ b/gl/lib/di-set.h
@@ -0,0 +1,28 @@
+#include "dev-map.h"
+
+struct di_set_state
+{
+  /* A map to help compact device numbers.  */
+  struct dev_map dev_map;
+
+  /* A set of compact dev,inode pairs.  */
+  struct hash_table *di_set;
+};
+
+#undef _ATTRIBUTE_NONNULL_
+#if __GNUC__ == 3 && __GNUC_MINOR__ >= 3 || 3 < __GNUC__
+# define _ATTRIBUTE_NONNULL_(m,...) __attribute__ ((__nonnull__ (m)))
+#else
+# define _ATTRIBUTE_NONNULL_(m,...)
+#endif
+
+int di_set_init (struct di_set_state *, size_t) _ATTRIBUTE_NONNULL_ (1);
+void di_set_free (struct di_set_state *) _ATTRIBUTE_NONNULL_ (1);
+int di_set_insert (struct di_set_state *, dev_t, ino_t)
+  _ATTRIBUTE_NONNULL_ (1);
+
+struct di_ent;
+int di_ent_create (struct di_set_state *di_set_state,
+                   dev_t dev, ino_t ino,
+                   struct di_ent **di_ent)
+  _ATTRIBUTE_NONNULL_ (1,4);
diff --git a/gl/modules/di-set b/gl/modules/di-set
new file mode 100644
index 0000000..fe52778
--- /dev/null
+++ b/gl/modules/di-set
@@ -0,0 +1,25 @@
+Description:
+manipulate sets of device-inode pairs efficiently
+
+Files:
+lib/di-set.c
+lib/di-set.h
+
+Depends-on:
+dev-map
+hash
+verify
+
+configure.ac:
+
+Makefile.am:
+lib_SOURCES += di-set.c di-set.h
+
+Include:
+"di-set.h"
+
+License
+GPL
+
+Maintainer:
+Jim Meyering
diff --git a/gl/modules/di-set-tests b/gl/modules/di-set-tests
new file mode 100644
index 0000000..d60f7fd
--- /dev/null
+++ b/gl/modules/di-set-tests
@@ -0,0 +1,10 @@
+Files:
+tests/test-di-set.c
+
+Depends-on:
+
+configure.ac:
+
+Makefile.am:
+TESTS += test-di-set
+check_PROGRAMS += test-di-set
diff --git a/gl/tests/test-di-set.c b/gl/tests/test-di-set.c
new file mode 100644
index 0000000..4d5823e
--- /dev/null
+++ b/gl/tests/test-di-set.c
@@ -0,0 +1,85 @@
+/* Test the dev-map module.
+   Copyright (C) 2010 Free Software Foundation, Inc.
+
+   This program is free software: you can redistribute it and/or modify
+   it under the terms of the GNU General Public License as published by
+   the Free Software Foundation; either version 3 of the License, or
+   (at your option) any later version.
+
+   This program is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+   GNU General Public License for more details.
+
+   You should have received a copy of the GNU General Public License
+   along with this program.  If not, see <http://www.gnu.org/licenses/>.  */
+
+/* Written by Jim Meyering.  */
+
+#include <config.h>
+#include <stdlib.h>
+#include <stdio.h>
+#include <string.h>
+#include <stdint.h>
+
+#define ASSERT(expr) \
+  do                                                                         \
+    {                                                                        \
+      if (!(expr))                                                           \
+        {                                                                    \
+          fprintf (stderr, "%s:%d: assertion failed\n", __FILE__, __LINE__); \
+          fflush (stderr);                                                   \
+          abort ();                                                          \
+        }                                                                    \
+    }                                                                        \
+  while (0)
+
+#include "di-set.h"
+
+/* FIXME: ugly duplication of code from di-set.c */
+#define N_DEV_BITS_4 5
+#define N_INO_BITS_4 (32 - N_DEV_BITS_4 - 2 - 1)
+
+#define N_DEV_BITS_8 8
+#define N_INO_BITS_8 (64 - N_DEV_BITS_8 - 2 - 1)
+
+int
+main (void)
+{
+  /* set_program_name (argv[0]); placate overzealous "syntax-check" test.  */
+  size_t initial_size = 61;
+  /* "real" code might prefer to avoid the allocation here, simply
+     declaring "struct di_set_state dis;", do a global substitution,
+     s/\<dis\>/\&dis/, and remove the final free.  */
+  struct di_set_state *dis = malloc (sizeof *dis);
+  ASSERT (dis);
+  ASSERT (di_set_init (dis, initial_size) == 0);
+
+  struct di_ent *di_ent;
+  ASSERT (di_ent_create (dis, 1, 1, &di_ent) == 0);
+  ASSERT (di_ent_create (dis, 1 << N_DEV_BITS_4, 1, &di_ent) == 0);
+  ASSERT (di_ent_create (dis, 1, 1 << N_INO_BITS_4, &di_ent) == 0);
+  ASSERT (di_ent_create (dis, 1,
+                         (uint64_t) 1 << N_INO_BITS_8, &di_ent) == 0);
+  free (di_ent);
+
+  ASSERT (di_set_insert (dis, 2, 5) == 1); /* first insertion succeeds */
+  ASSERT (di_set_insert (dis, 2, 5) == 0); /* duplicate fails */
+  ASSERT (di_set_insert (dis, 3, 5) == 1); /* diff dev, duplicate inode is ok 
*/
+  ASSERT (di_set_insert (dis, 2, 8) == 1); /* same dev, different inode is ok 
*/
+
+  /* very large inode number */
+  ASSERT (di_set_insert (dis, 5, (uint64_t) 1 << 63) == 1);
+  ASSERT (di_set_insert (dis, 5, (uint64_t) 1 << 63) == 0); /* dup */
+
+  unsigned int i;
+  for (i = 0; i < 3000; i++)
+    ASSERT (di_set_insert (dis, 9, i) == 1);
+  for (i = 0; i < 3000; i++)
+    ASSERT (di_set_insert (dis, 9, i) == 0); /* duplicate fails */
+
+  di_set_free (dis);
+  free (dis);
+
+  return 0;
+}
--
1.7.1.764.g84d391


>From ffe335589aea4429c0f9b3bb49e4fbe7bce509db Mon Sep 17 00:00:00 2001
From: Jim Meyering <address@hidden>
Date: Tue, 9 Dec 2008 08:49:41 +0100
Subject: [PATCH 4/7] du: use less than half as much memory when tracking hard 
links

When processing a hard-linked file, du must keep track of the file's
device and inode numbers in order to avoid counting its storage
more than once.  When du would process many hard linked files --
as are created by some backup tools -- the amount of memory required
for the supporting data structure could become prohibitively large.
This patch takes advantage of the fact that the amount of information
in the numbers of the typical dev,inode pair is far less than even
32 bits, and hence usually fits in the space of a pointer, be it
32 or 64 bits wide.  A typical du traversal examines files on no
more than a handful of distinct devices, so the device number can
be encoded in just a few bits.  Similarly, few inode numbers use
all of the high bits in an ino_t.  Before, we would represent the
dev,inode pair using a naive struct, and allocate space for each.
Thus, an entry in the hash table consisted of a pointer (to that
struct) and a "next" pointer.  With this change, we encode the
dev,inode information and put those bits in place of the pointer,
and thus do away with the need to allocate additional space for
each dev,inode pair.

* src/du.c: Include "di-set.h".
Don't include "hash.h"; it's no longer used.
(INITIAL_DI_SET_SIZE): Define.
(di_set): New global, to replace "htab".
(entry_hash, entry_compare, hash_init): Remove functions.
(hash_ins): Use di-set functions, rather than ones from the hash module.
(main): Likewise.
* bootstrap.conf (gnulib_modules): Add the new di-set module.
* NEWS (New features): Mention it.
---
 NEWS           |    4 +++
 bootstrap.conf |    1 +
 src/du.c       |   73 +++++++------------------------------------------------
 3 files changed, 15 insertions(+), 63 deletions(-)

diff --git a/NEWS b/NEWS
index 3e170c5..e22969b 100644
--- a/NEWS
+++ b/NEWS
@@ -7,6 +7,10 @@ GNU coreutils NEWS                                    -*- 
outline -*-
   du recognizes -d N as equivalent to --max-depth=N, for compatibility
   with FreeBSD.

+  du now uses less than half as much memory when operating on trees
+  with many hard-linked files.  With --count-links (-l), or when
+  operating on trees with no hard-linked files, there is no change.
+
   sort now accepts the --debug option, to highlight the part of the
   line significant in the sort, and warn about questionable options.

diff --git a/bootstrap.conf b/bootstrap.conf
index 6e85c9a..644c18b 100644
--- a/bootstrap.conf
+++ b/bootstrap.conf
@@ -69,6 +69,7 @@ gnulib_modules="
   cycle-check
   d-ino
   d-type
+  di-set
   diacrit
   dirfd
   dirname
diff --git a/src/du.c b/src/du.c
index 2704f0f..5838085 100644
--- a/src/du.c
+++ b/src/du.c
@@ -30,10 +30,10 @@
 #include "system.h"
 #include "argmatch.h"
 #include "argv-iter.h"
+#include "di-set.h"
 #include "error.h"
 #include "exclude.h"
 #include "fprintftime.h"
-#include "hash.h"
 #include "human.h"
 #include "quote.h"
 #include "quotearg.h"
@@ -61,18 +61,10 @@ extern bool fts_debug;
 #endif

 /* Initial size of the hash table.  */
-#define INITIAL_TABLE_SIZE 103
-
-/* Hash structure for inode and device numbers.  The separate entry
-   structure makes it easier to rehash "in place".  */
-struct entry
-{
-  ino_t st_ino;
-  dev_t st_dev;
-};
+enum { INITIAL_DI_SET_SIZE = 103 };

 /* A set of dev/ino pairs.  */
-static Hash_table *htab;
+static struct di_set_state di_set;

 /* Define a class for collecting directory information. */

@@ -334,66 +326,20 @@ Mandatory arguments to long options are mandatory for 
short options too.\n\
   exit (status);
 }

-static size_t
-entry_hash (void const *x, size_t table_size)
-{
-  struct entry const *p = x;
-
-  /* Ignoring the device number here should be fine.  */
-  /* The cast to uintmax_t prevents negative remainders
-     if st_ino is negative.  */
-  return (uintmax_t) p->st_ino % table_size;
-}
-
-/* Compare two dev/ino pairs.  Return true if they are the same.  */
-static bool
-entry_compare (void const *x, void const *y)
-{
-  struct entry const *a = x;
-  struct entry const *b = y;
-  return SAME_INODE (*a, *b) ? true : false;
-}
-
 /* Try to insert the INO/DEV pair into the global table, HTAB.
    Return true if the pair is successfully inserted,
    false if the pair is already in the table.  */
 static bool
 hash_ins (ino_t ino, dev_t dev)
 {
-  struct entry *ent;
-  struct entry *ent_from_table;
-
-  ent = xmalloc (sizeof *ent);
-  ent->st_ino = ino;
-  ent->st_dev = dev;
-
-  ent_from_table = hash_insert (htab, ent);
-  if (ent_from_table == NULL)
+  int inserted = di_set_insert (&di_set, dev, ino);
+  if (inserted < 0)
     {
       /* Insertion failed due to lack of memory.  */
       xalloc_die ();
     }

-  if (ent_from_table == ent)
-    {
-      /* Insertion succeeded.  */
-      return true;
-    }
-
-  /* That pair is already in the table, so ENT was not inserted.  Free it.  */
-  free (ent);
-
-  return false;
-}
-
-/* Initialize the hash table.  */
-static void
-hash_init (void)
-{
-  htab = hash_initialize (INITIAL_TABLE_SIZE, NULL,
-                          entry_hash, entry_compare, free);
-  if (htab == NULL)
-    xalloc_die ();
+  return inserted ? true : false;
 }

 /* FIXME: this code is nearly identical to code in date.c  */
@@ -945,8 +891,9 @@ main (int argc, char **argv)
   if (!ai)
     xalloc_die ();

-  /* Initialize the hash structure for inode numbers.  */
-  hash_init ();
+  /* Initialize the set of dev,inode pairs.  */
+  if (di_set_init (&di_set, INITIAL_DI_SET_SIZE))
+    xalloc_die ();

   bit_flags |= symlink_deref_bits;
   static char *temp_argv[] = { NULL, NULL };
@@ -1024,7 +971,7 @@ main (int argc, char **argv)
   if (print_grand_total)
     print_size (&tot_dui, _("total"));

-  hash_free (htab);
+  di_set_free (&di_set);

   exit (ok ? EXIT_SUCCESS : EXIT_FAILURE);
 }
--
1.7.1.764.g84d391


>From 83e06bbe9764aacb99f946cd1a2cda65340eaf14 Mon Sep 17 00:00:00 2001
From: Jim Meyering <address@hidden>
Date: Fri, 4 Jun 2010 15:30:47 +0200
Subject: [PATCH 5/7] du: increase the initial dev-inode set size

* src/du.c (INITIAL_DI_SET_SIZE): Increase to the prime just under
1024.  This gives a speed-up of about 2% when processing a tree
containing 100,000 files, each with a link count greater than 1,
all pointing to files in some other tree.

Show how/when this change is useful:

  $ z-mktree --br=300 --root=/tmp/z --depth=2
  $ cp -lr /tmp/z /tmp/.j
  $ ./mem-usage-compare src/du -a /tmp/z

---------------------------------------------------------------------
Command:            ./src/du.old -a /t/z
Massif arguments:   --massif-out-file=m.old
ms_print arguments: m.old
---------------------------------------------------------------------

  MB
 5.618^...                                                #
   |   ...                                                #
   |   ...                                                #
   |   ...                                                #
   |   ...                                                #
   |   ...                                                #     ::
   |   ...                              @                 #:::::::
   |   ...                              @                 #:::::::@
   |   ...                              @               ::#:::::::@
   |   ...                              @      :@:::::::::#:::::::@
   |   ...                 @            @:::::::@:::::::::#:::::::@:
   |   ...                 @            @::: :::@:::::::::#:::::::@:
   |   ...                 @      ::@:::@::: :::@:::::::::#:::::::@:
   |   ...                 @::::::::@:::@::: :::@:::::::::#:::::::@:
   |   ...               :@@::::: ::@:::@::: :::@:::::::::#:::::::@:
   |   ...         :@:::::@@::::: ::@:::@::: :::@:::::::::#:::::::@:
   |   ...   :::::::@::: :@@::::: ::@:::@::: :::@:::::::::#:::::::@:
   |   ...:::: ::: :@::: :@@::::: ::@:::@::: :::@:::::::::#:::::::@:
   |   ...:: : ::: :@::: :@@::::: ::@:::@::: :::@:::::::::#:::::::@:
   |  :...:: : ::: :@::: :@@::::: ::@:::@::: :::@:::::::::#:::::::@:
 0 +---...--------------------------------------------------------->Mi
   0   ...                                                     188.4

Number of snapshots: 68
 Detailed snapshots: [1, 3, 11, 19, 24, 25, 33, 37, 44, 55 (peak), 65]

---------------------------------------------------------------------
Command:            ./src/du.new -a /t/z
Massif arguments:   --massif-out-file=m.new
ms_print arguments: m.new
---------------------------------------------------------------------

    MB
   3.090^...                                      #
     |   ...                                      #
     |   ...                                      #
     |   ...                                      #
     |   ...                                      #
     |   ...                                      #
     |   ...                       @              #
     |   ...                       @              #
     |   ...                       @              #:: :: ::
     |   ...                       @              #:::::@::::@::::@::
     |   ...            @          @              #:::::@::::@::::@::
     |   ...            @          @              #:::::@::::@::::@::
     |   ...            @          @:::::::::::@::#:::::@::::@::::@::
     |   ...    @       @          @:::::::::::@::#:::::@::::@::::@::
     |   ...    @       @:::::::::@@:::::::::::@::#:::::@::::@::::@::
     |   ...    @       @::::: :::@@:::::::::::@::#:::::@::::@::::@::
     |   ...    @:@::@@:@::::: :::@@:::::::::::@::#:::::@::::@::::@::
     |   ...::@:@:@: @ :@::::: :::@@:::::::::::@::#:::::@::::@::::@::
     | ::...::@:@:@: @ :@::::: :::@@:::::::::::@::#:::::@::::@::::@::
     | ::...::@:@:@: @ :@::::: :::@@:::::::::::@::#:::::@::::@::::@::
   0 +---...--------------------------------------------------------->Mi
     0   ...                                                     184.8

Number of snapshots: 96
 Detailed snapshots: [16, 18, 20, 24, 26, 38, 39, 56, 61 (peak), 71,
                      81, 91]

du: INITIAL_DI_SET_SIZE = 1021
---
 src/du.c |    2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/src/du.c b/src/du.c
index 5838085..d88a4f3 100644
--- a/src/du.c
+++ b/src/du.c
@@ -61,7 +61,7 @@ extern bool fts_debug;
 #endif

 /* Initial size of the hash table.  */
-enum { INITIAL_DI_SET_SIZE = 103 };
+enum { INITIAL_DI_SET_SIZE = 1021 };

 /* A set of dev/ino pairs.  */
 static struct di_set_state di_set;
--
1.7.1.764.g84d391


>From 9fdda7b3dfccbac42b1edd91124e2346322d445a Mon Sep 17 00:00:00 2001
From: Jim Meyering <address@hidden>
Date: Sat, 13 Dec 2008 16:33:15 +0100
Subject: [PATCH 6/7] mem-usage-compare: new script

---
 mem-usage-compare |   18 ++++++++++++++++++
 1 files changed, 18 insertions(+), 0 deletions(-)
 create mode 100755 mem-usage-compare

diff --git a/mem-usage-compare b/mem-usage-compare
new file mode 100755
index 0000000..c7d31e7
--- /dev/null
+++ b/mem-usage-compare
@@ -0,0 +1,18 @@
+#!/bin/sh
+prog=$1
+shift
+args=$@
+
+old=./$prog.old
+new=./$prog.new
+
+$old --version || exit 1
+$new --version || exit 1
+
+valgrind --tool=massif --massif-out-file=m.old -- $old $args > _m.out.old
+valgrind --tool=massif --massif-out-file=m.new -- $new $args > _m.out.new
+
+diff -u _m.out.old _m.out.new || exit 1
+
+ms_print m.old | head -40
+ms_print m.new | head -40
--
1.7.1.764.g84d391


>From 267f06ab788634a3a6a9bf2b1f467f9e974fd292 Mon Sep 17 00:00:00 2001
From: Jim Meyering <address@hidden>
Date: Sat, 12 Jun 2010 19:08:38 +0200
Subject: [PATCH 7/7] FIXME: show how new du uses less than half the memory

---
 README-di-set |   82 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++
 1 files changed, 82 insertions(+), 0 deletions(-)
 create mode 100644 README-di-set

diff --git a/README-di-set b/README-di-set
new file mode 100644
index 0000000..bcf46a0
--- /dev/null
+++ b/README-di-set
@@ -0,0 +1,82 @@
+On an x86_64 system:
+
+--------------------------------------------------------------------------------
+Command:            ./src/du.old -a /t/z
+Massif arguments:   --massif-out-file=m.old
+ms_print arguments: m.old
+--------------------------------------------------------------------------------
+
+
+    MB
+5.618^                                                              #
+     |                                                              #
+     |                                                              #
+     |                                                              #
+     |                                                              #
+     |                                                              #     :::
+     |                                            @                 #::::::::
+     |                                            @                 #::::::::
+     |                                            @               ::#::::::::
+     |                                            @       :@:::::@::#:::::::@
+     |                               @            @:::@::::@:::::@::#:::::::@
+     |                               @            @:::@::::@:::::@::#:::::::@
+     |                               @   ::::@::::@:::@::::@:::::@::#:::::::@
+     |                      @@       @:::: ::@::::@:::@::::@:::::@::#:::::::@
+     |                      @     :::@: :: ::@::::@:::@::::@:::::@::#:::::::@
+     |               @      @ :::::::@: :: ::@::::@:::@::::@:::::@::#:::::::@
+     |               @::::::@ :: ::::@: :: ::@::::@:::@::::@:::::@::#:::::::@
+     |             ::@:: :: @ :: ::::@: :: ::@::::@:::@::::@:::::@::#:::::::@
+     |     @  :@:::::@:: :: @ :: ::::@: :: ::@::::@:::@::::@:::::@::#:::::::@
+     |  ::@@:::@:: ::@:: :: @ :: ::::@: :: ::@::::@:::@::::@:::::@::#:::::::@
+   0 
+----------------------------------------------------------------------->Mi
+     0                                                                   189.6
+
+Number of snapshots: 69
+ Detailed snapshots: [4, 5, 9, 14, 19, 26, 32, 37, 42, 47, 54, 57 (peak), 67]
+
+--------------------------------------------------------------------------------
+  n        time(i)         total(B)   useful-heap(B) extra-heap(B)    stacks(B)
+--------------------------------------------------------------------------------
+  0              0                0                0             0            0
+  1      2,515,906          185,176          170,412        14,764            0
+  2      5,834,589          258,000          231,344        26,656            0
+--------------------------------------------------------------------------------
+Command:            ./src/du.new -a /t/z
+Massif arguments:   --massif-out-file=m.new
+ms_print arguments: m.new
+--------------------------------------------------------------------------------
+
+
+    MB
+3.145^                                                     #
+     |                                                     #
+     |                                                     #
+     |                                                     #
+     |                                                     #
+     |                                                     #
+     |                                     @@              #
+     |                                     @               #
+     |                                     @               # ::
+     |                                     @               #:::::@::::@:::::@
+     |                          @          @               #:::::@::::@:::::@
+     |                          @          @               #:::::@::::@:::::@
+     |                          @          @ ::::::::@:::::#:::::@::::@:::::@
+     |                  @@      @          @ ::::: ::@:::::#:::::@::::@:::::@
+     |                  @       @::::::::::@ ::::: ::@:::::#:::::@::::@:::::@
+     |                  @       @::::::::::@ ::::: ::@:::::#:::::@::::@:::::@
+     |                  @ ::::::@::::::::::@ ::::: ::@:::::#:::::@::::@:::::@
+     |             :::@@@ ::::::@::::::::::@ ::::: ::@:::::#:::::@::::@:::::@
+     |       @ :::@:: @@@ ::::::@::::::::::@ ::::: ::@:::::#:::::@::::@:::::@
+     |  :@::@@@:::@:: @@@ ::::::@::::::::::@ ::::: ::@:::::#:::::@::::@:::::@
+   0 
+----------------------------------------------------------------------->Mi
+     0                                                                   188.8
+
+Number of snapshots: 93
+ Detailed snapshots: [4, 7, 8, 9, 14, 18, 19, 21, 29, 40, 51, 60 (peak), 70, 
80, 90]
+
+--------------------------------------------------------------------------------
+  n        time(i)         total(B)   useful-heap(B) extra-heap(B)    stacks(B)
+--------------------------------------------------------------------------------
+  0              0                0                0             0            0
+  1      3,286,259          134,120          129,254         4,866            0
+  2      5,525,860          226,432          218,599         7,833            0
--
1.7.1.764.g84d391





reply via email to

[Prev in Thread] Current Thread [Next in Thread]