gnunet-svn
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[lsd0003] branch master updated: worked up to 3.2.3


From: gnunet
Subject: [lsd0003] branch master updated: worked up to 3.2.3
Date: Thu, 14 Jan 2021 00:17:20 +0100

This is an automated email from the git hooks/post-receive script.

grothoff pushed a commit to branch master
in repository lsd0003.

The following commit(s) were added to refs/heads/master by this push:
     new bf80fc8  worked up to 3.2.3
bf80fc8 is described below

commit bf80fc8ddd88d96144457f40a3ba452c9d7c2433
Author: Christian Grothoff <christian@grothoff.org>
AuthorDate: Thu Jan 14 00:16:34 2021 +0100

    worked up to 3.2.3
---
 draft-summermatter-set-union.xml | 159 ++++++++++++++++++++++++++++++---------
 1 file changed, 122 insertions(+), 37 deletions(-)

diff --git a/draft-summermatter-set-union.xml b/draft-summermatter-set-union.xml
index 74fc17d..f19bb8f 100644
--- a/draft-summermatter-set-union.xml
+++ b/draft-summermatter-set-union.xml
@@ -314,47 +314,61 @@
                 decode and set difference. This two extra operations are 
useful to efficiently extract
                 small differences between large sets.
             </t>
-            <section anchor="ibf_format" numbered="true" toc="default">
-                <name>Format</name>
+            <section anchor="ibf_structure" numbered="true" toc="default">
+                <name>Structure</name>
                 <t>
-                    The storage format of an IBF consists of n buckets that 
store a hash value
-                    and a signed counter. The decision how many bits are used 
for the counter and for the map filed
-                    is a tradeoff and has to be adjusted to the CPU 
architecture and setsize for optimal performance.
+                    An IBF consists of a mapping function M and
+                    L buckets that each store a signed
+                    counter and an XHASH.  An XHASH is the XOR of various
+                    hash values.  As before, the
+                    values used for k, L and the number of bits used
+                    for the signed counter and the XHASH depend
+                    on the set size and various other trade-offs,
+                    including the CPU architecture.
                 </t>
                 <t>
-                    If the IBF size is to small or the mapping function does 
not spread out the elements uniformly a counter can overflow,
-                    in this case the highest and the lowest number the signed 
counter can accept means infinite.
-                    If a counter of one bucket is set to infinite the IBF can 
not be decoded anymore and size of the IBF
-                    or the size of the counter has to be adjusted.
-                    For a IBF with 4-bit counters this means if the counter of 
one bucket is set to 7 or -8 the IBF is invalid.
-                    The implementation described in this document the size of 
the counter and the HASH Value is 32-bit. This means
-                    the counter of this implementation can reach a minimum of 
-2147483647 and a maximum of 2147483646, before the
-                    counter overflows and the IBF cant be decoded anymore.
+                    If the IBF size is to small or the mapping
+                    function does not spread out the elements
+                    uniformly, the signed counter can overflow or
+                    underflow. As with the CBF, the "maximum" value is
+                    thus used to represent "infinite".  As there is no
+                    need to distinguish between overflow and
+                    underflow, the most canonical representation of
+                    "infinite" would be the minimum value of the
+                    counter in the canonical 2-complement
+                    interpretation.  For example, given a 4-bit
+                    counter a value of -8 would be used to represent
+                    "infinity".
                 </t>
-                    <figure anchor="figure_ibf_format">
+                    <figure anchor="figure_ibf_structure">
                         <artwork name="" type="" align="left" alt=""><![CDATA[
             bucket-0     bucket-1       bucket-2      bucket-3
         +-------------+-------------+-------------+-------------+-------
   count |   COUNTER   |   COUNTER   |   COUNTER   |   COUNTER   |  C...
         +-------------+-------------+-------------+-------------+------
-  value | HASH VALUE  | HASH VALUE  | HASH VALUE  | HASH VALUE  |  H...
+  value |    XHASH    |    XHASH    |    XHASH    |     XHASH   |  X...
         +-------------+-------------+-------------+-------------+-------
                  ]]></artwork>
                     </figure>
 
             </section>
             <section anchor="ibf_operations" numbered="true" toc="default">
-                <name>Operations</name>
+              <name>Operations</name>
+              <t>
+                When an IBF is created, all counters and XHASH values of
+                all buckets are initialized to zero.
+              </t>
                 <section anchor="ibv_operations_insert" numbered="true" 
toc="default">
                     <name>Insert Element</name>
                     <t>
-                        To add an element to a IBF the element is mapped as 
described in the <xref target="cbf" format="title" /> section on
-                        the IBF. Where the map function hits the counter is 
increased by one and the value
-                        field is updated with the XOR product of the new hash 
and the previously stored value.
+                      To add an element to a IBF, the element is mapped to a 
subset of k buckets using
+                      the mapping function M as described in the <xref 
target="bf" format="title" /> section introducing
+                      BFs. For the buckets selected by the mapping function, 
the counter is increased by one and the value
+                      field is set to the XOR of the hash of the element and 
the previously stored XHASH.
                     </t>
                     <t>
-                        In the following example the insert operation for the 
element [0101] with the hash 4242 and
-                        the element [1100] with the hash 0101 is demonstrated.
+                        In the following example the insert operation for the 
element [0101] with the hash 0x4242 and
+                        the element [1100] with the hash 0x0101 is 
demonstrated.
                     </t>
                     <t>Empty IBF:</t>
                     <figure anchor="figure_ibf_insert_0">
@@ -367,14 +381,14 @@
         +-------------+-------------+-------------+-------------+
                  ]]></artwork>
                     </figure>
-                    <t>Insert first element: [0101] with hash 4242:</t>
+                    <t>Insert first element: [0101] with hash 0x4242:</t>
                     <figure anchor="figure_ibf_insert_1">
                         <artwork name="" type="" align="left" alt=""><![CDATA[
             bucket-0     bucket-1       bucket-2      bucket-3
         +-------------+-------------+-------------+-------------+
   count |      0      |      1      |      0      |      1      |
         +-------------+-------------+-------------+-------------+
-  value |     0000    |     4242    |     0000    |     4242    |
+  value |     0000    |   0x4242    |     0000    |   0x4242    |
         +-------------+-------------+-------------+-------------+
                  ]]></artwork>
                     </figure>
@@ -385,7 +399,7 @@
         +-------------+-------------+-------------+-------------+
   count |      1      |      2      |      0      |      1      |
         +-------------+-------------+-------------+-------------+
-  value |     0101    |     4343    |     0000    |     4242    |
+  value |     0101    |   0x4343    |     0000    |   0x4242    |
         +-------------+-------------+-------------+-------------+
                  ]]></artwork>
                     </figure>
@@ -393,12 +407,12 @@
                 <section anchor="ibf_operations_remove" numbered="true" 
toc="default">
                     <name>Remove Element</name>
                     <t>
-                        To remove an element from the IBF the element is 
mapped with the same map functions that was used to previously insert
-                        the Element. Then all the bucket that got hit by the 
map function are reduced by one and the value field of the buckets is
-                        replaced by a new value containing the old value xored 
with the hash of the removed Element.
+                        To remove an element from the IBF the element is again 
mapped to a subset of the buckets using M.
+                        Then all the counters of the buckets selected by M are 
reduced by one, and the XHASH is
+                        replaced by the XOR of the old XHASH and the hash of 
the element being removed.
                     </t>
                     <t>
-                        In the following example the remove operation for the 
element [1100] with the hash 0101 is demonstrated.
+                        In the following example the remove operation for the 
element [1100] with the hash 0x0101 is demonstrated.
                     </t>
                 <t>IBF with encoded elements:</t>
                 <figure anchor="figure_ibf_remove_0">
@@ -407,11 +421,11 @@
         +-------------+-------------+-------------+-------------+
   count |      1      |      2      |      0      |      1      |
         +-------------+-------------+-------------+-------------+
-  value |     0101    |     4343    |     0000    |     4242    |
+  value |   0x0101    |   0x4343    |     0000    |   0x4242    |
         +-------------+-------------+-------------+-------------+
                  ]]></artwork>
                 </figure>
-                    <t>Remove element [1100] with hash 0101 from the IBF:</t>
+                    <t>Remove element [1100] with hash 0x0101 from the IBF:</t>
                     <figure anchor="figure_ibf_remove_1">
                         <artwork name="" type="" align="left" alt=""><![CDATA[
             bucket-0     bucket-1       bucket-2      bucket-3
@@ -422,7 +436,30 @@
         +-------------+-------------+-------------+-------------+
                  ]]></artwork>
                     </figure>
+                    <t>
+                      Note that it is possible to "remove" elements from an 
IBF that were never present
+                      in the IBF in the first place. A negative counter value 
is thus indicative of
+                      elements that were removed without having been added.  
Note that an IBF bucket
+                      counter of zero no longer warrants that an element 
mapped to that bucket is not
+                      present in the set: a bucket with a counter of zero can 
be the result of one
+                      element being added and a different element (mapped to 
the same bucket) being removed.
+                      To check that an element is not present requires a 
counter of zero and an
+                      XHASH of zero --- and some assurance that there was no 
hash collision.  Thus,
+                      IBFs are not suitable to replace BFs or IBFs.
+                    </t>
+                    <t>
+                      Buckets in an IBF with a counter of 1 or -1 are crucial 
for decoding an IBF, as
+                      they might represent only a single element, with the 
XHASH being the hash of that element.
+                      Following Eppstein (CITE), we will call buckets that 
only represent a single
+                      element pure buckets.
+                      Note that due to the possibility of multiple insertion 
and removal operations
+                      affecting the same bucket, not all buckets with a 
counter of 1 or -1 are
+                      actually pure buckets.  Sometimes a counter can be 1 or 
-1 because N elements
+                      mapped to that bucket were added while N-1 or N+1 
different elements also
+                      mapped to that bucket were removed.
+                    </t>
                 </section>
+
                 <section anchor="ibf_operations_decode" numbered="true" 
toc="default">
                     <name>Decode IBF</name>
                     <t>
@@ -482,21 +519,31 @@
                  ]]></artwork>
                     </figure>
                 </section>
+
                 <section anchor="ibv_operations_setdiff" numbered="true" 
toc="default">
                     <name>Set Difference</name>
                     <t>
-                        One of the most interesting capability of IBFs is the 
possibility to easily calculate the difference
-                        between two IBFs.
-                        To calculate the difference between two IBFs its only 
necessary to XOR the value of both IBFs bucket by bucket and
-                        subtracting the counts which results in a new IBF with 
the same amount of buckets.
+                        Given addition and removal as defined above, it is 
possible to define an operation on IBFs
+                        that computes an IBF representing the set difference.  
Suppose IBF1 represents set A, and
+                        IBF2 represents set B.  Then this set difference 
operation will compute IBF3 which
+                        represents the set A - B --- without needing elements 
from set A or B.
+
+                        To calculate the IBF representing this set difference, 
both IBFs must have the same
+                        length L, the same number of buckets per element k and 
use the same map M.  Given this,
+                        one can compute the IBF representing the set 
difference by taking the XOR of the XHASH values
+                        of the respective buckets and subtracting the 
respective counters.  Care should be taken
+                        to handle overflows and underflows by setting the 
counter to "infinity" as necessary.
+                        The result is a new IBF with the same number of 
buckets representing the set difference.
+                    </t>
+                    <t>
                         This new IBF can be decoded as described in section 
<xref target="ibf_operations_decode" format="counter" />.
                         The new IBF can have two types of pure buckets with 
counter set to 1 or -1. If the counter is set to 1
                         the element is missing in the secondary set and if the 
counter is set to -1 the element is missing in
                         the primary set.
                     </t>
                     <t>
-                        Through this addition/subtraction its possible that a 
bucket seams pure but in reality isn't pure,
-                        a falsely pure bucket. This case its easy to detect by 
checking if an element exist in either set, if
+                        Through this addition/subtraction its possible that a 
counter is 1 or -1, but is not pure.
+                        This case its easy to detect by checking if an element 
exist in either set, if
                         the element does not exist its a falsely pure bucket 
in this case continue test the
                         next pure bucket until one is found that is truly 
pure. If no truly pure buckets are found the IBF
                         fails to decode.
@@ -546,7 +593,45 @@
             </section>
 
             </section>
-        </section>
+
+            <section anchor="ibf_format" numbered="true" toc="default">
+                <name>Wire format</name>
+                <t>
+                    To facilitate a reasonably CPU-efficient
+                    implementation, this specification requires the
+                    IBF counter to always use 8 bits.  Fewer bits
+                    would result in a paritcularly inefficient
+                    implementation, while more bits are rarely useful
+                    as sets with so many elements should likely be
+                    represented using a larger number of buckets. This
+                    means the counter of this design can reach a
+                    minimum of -127 and a maximum of 127 before the
+                    counter reaches "infinity" (-128).
+                </t>
+                <t>
+                    For the "HASH VALUE", we always use a 32-bit
+                    representation.  Here, it is mostly important to
+                    avoid collisions, where different elements are
+                    mapped to the same hash.  The protocol is designed
+                    to handle occasional collisions, so while with
+                    32-bits there remains a significant chance of
+                    accidental collisions, at 32-bits the chance is
+                    generally believed to be sufficiently small enough
+                    for the protocol to handle those cases efficiently
+                    for a wide range of use-cases.  Smaller hash
+                    values would safe bandwidth, but drastically
+                    increase the chance of collisions even for
+                    moderately-sized sets.  Larger hash values would
+                    reduce the chance of collisions for very large
+                    sets, alas with very large sets the bandwidth used
+                    by the protocol is also typically larger and thus
+                    additional round-trips to address a few collisions
+                    are unlikely to have a major impact.  32-bit is
+                    also again a reasonable size for many CPU
+                    architectures.
+                </t>
+            </section>
+          </section>
 
         <section anchor="se" numbered="true" toc="default">
             <name>Strata Estimator</name>

-- 
To stop receiving notification emails like this one, please contact
gnunet@gnunet.org.



reply via email to

[Prev in Thread] Current Thread [Next in Thread]