coreutils
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Path-Set with Rehashing and Without Fixed-Size Structures


From: Stefan Vargyas
Subject: Path-Set with Rehashing and Without Fixed-Size Structures
Date: Mon, 24 Oct 2016 10:24:27 -0700

Hello Pádraig!

Since my last email, Path-Set evolved a bit the path prescribed therein.
Now it has no more fixed-size data structures and the linear hash table
implementation isn't as simplistic as it was: it is by now equipped with
the 'rehash' operation.

As previously did, I ran a couple of relevant test series on my machine
(a GNU/Linux 64-bit Intel x86 CPU using GCC 4.3.4).

The results obtained from the freshened up dictionary structures are as
follows: for the same two sets of path names used in the previous series
of tests, the gain in *total* memory consumption achieved by the linear
hash path trie data structure relative to (1) the size of the input and
to (2) the total memory usage of Gnulib's generic hash table were:

  ----- ----------------- ----- -----
   set       platform      (1)   (2)
  ----- ----------------- ----- -----
    1    32-bit pointers   32%   48%
    2    32-bit pointers   23%   42%
  ----- ----------------- ----- -----
    1    64-bit pointers  -20%   24%
    2    64-bit pointers  -37%   17%
  ----- ----------------- ----- -----
    1    32-bit offsets     4%   38%
    2    32-bit offsets    -7%   34%
  ----- ----------------- ----- -----

Please recall that the first of the path name sets has cardinality 396K
and size 26M, while the second one -- cardinality 811K and size 47M.

Also recall that the '32-bit offset' platform is to be understood as a
64-bit platform where Path-Set is using 32-bit offsets instead of 64-bit
pointers for referencing the node structures within the node structure
themselves. 

One of the statements I made about the linear hash tables of Path-Set
reads as: 

  >> However, rehashing a linear hash table is (far?) more costly
  >> than to rehash a chained hash table as Gnulib's is. 

This prediction proved to be false. Both dictionary structures based on
linear hash tables -- 'plain-set' and 'path-trie' -- are in fact faster
(only a few exceptions) compared to the corresponding dictionary type
based on Gnulib's hash tables. Nevertheless, these results need to be
taken cum grano salis, since they aren't conclusive yet.

  ----- ----------------- ------------------- -------------------
   set       platform          plain set           path trie
  ----- ----------------- ------------------- -------------------
    1    32-bit pointers   g2 < l2 < l1 < g1   l2 < l1 < g2 < g1 
    1    64-bit pointers   l2 < g2 < l1 < g1   l2 < l1 < g2 < g1 
    1    32-bit offsets    l2 < g2 < l1 < g1   l2 < l1 < g2 < g1 
  ----- ----------------- ------------------- -------------------
    2    32-bit pointers   g2 < l2 < l1 < g1   l2 < l1 < g2 < g1 
    2    64-bit pointers   l2 < g2 < l1 < g1   l2 < l1 < g2 < g1 
    2    32-bit offsets    l2 < g2 < l1 < g1   l2 < l1 < g2 < g1 
  ----- ----------------- ------------------- -------------------

The first two columns have the same meaning as the previous notations.
Further, each 'l' stands for 'linear-hash', each 'g' for 'gnulib-hash',
each '1' for 'rehash-size' being 1.4142 and each '2' for 'rehash-size'
being 2.

It would be interesting and also beneficial to run the same series of
tests of Path-Set on contrasting to and/or less modern platforms than
the present-day x86-64: this to see how the relations between the two
structures change.

The story is presented to its full extent in 'doc/path-set-rehash.txt'.

Next to be done is to have a fresh chained hash table implementation
close to Gnulib's one -- which takes into account the things outlined
in section 3 of 'doc/path-set.txt' and other optimizations which were
applied to the linear hash table implementation.

Sincerely,

Stefan V.


PS: the home page of Path-Set (http://nongnu.org/path-set/) is by
now available thanks to Savannah's hackers approval. Path-Set's git
repository (git://git.sv.nongnu.org/path-set) likewise.






reply via email to

[Prev in Thread] Current Thread [Next in Thread]