[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[lsd0003] branch master updated: minor points
From: |
gnunet |
Subject: |
[lsd0003] branch master updated: minor points |
Date: |
Tue, 08 Jun 2021 10:14:38 +0200 |
This is an automated email from the git hooks/post-receive script.
grothoff pushed a commit to branch master
in repository lsd0003.
The following commit(s) were added to refs/heads/master by this push:
new 645fdcc minor points
645fdcc is described below
commit 645fdcc94eceb374a55878bdf03c128ce935eb32
Author: Christian Grothoff <christian@grothoff.org>
AuthorDate: Tue Jun 8 10:11:58 2021 +0200
minor points
---
draft-summermatter-set-union.xml | 77 +++++++++++++++++++++-------------------
1 file changed, 40 insertions(+), 37 deletions(-)
diff --git a/draft-summermatter-set-union.xml b/draft-summermatter-set-union.xml
index cef2baa..8ed3591 100644
--- a/draft-summermatter-set-union.xml
+++ b/draft-summermatter-set-union.xml
@@ -62,7 +62,7 @@
<name>Introduction</name>
<t>
This document describes a Byzantine fault-tolerant set
reconciliation protocol used to efficient and securely
- synchronize two sets of elements between two peers.
+ compute the union of two sets across a network.
</t>
<t>
This Byzantine fault-tolerant set reconciliation
@@ -97,7 +97,7 @@
<t>
The protocol described in this report is generic and
suitable for a wide range of applications. As a result,
- the internal structure of the elements in the sets must
+ the internal structure of the elements in the sets MUST
be defined and verified by the application using the
protocol. This document thus does not cover the element
structure, except for imposing a limit on the maximum
@@ -108,7 +108,7 @@
the number of network round-trips and the number of bytes
sent over the network. Thus, for the protocol to choose
the right parameters for a given situation, applications
- using the protocol must provide a parameter that specifies
+ using the protocol SHOULD provide a parameter that specifies
the cost-ratio of round-trips vs. bandwidth usage. Given
this trade-off factor, the protocol will then choose parameters
that minimize the total execution costs. In particular, there
@@ -241,7 +241,7 @@
]]></artwork>
</figure>
<t>
- The parameters L and k depend on the set size and must be
+ The parameters L and k depend on the set size and MUST be
chosen carefully to ensure that the BF does not return too
many false-positives.
</t>
@@ -296,7 +296,7 @@
order of a bucket reaches "infinity", it is no longer
incremented or decremented.
</t>
<t>
- The parameters L and k and the number of bits allocated to
the counters should depend on the set size.
+ The parameters L and k and the number of bits allocated to
the counters depend on the set size.
An IBF will degenerate when subjected to insert and remove
iterations of different elements, and eventually all
buckets will reach "infinity". The speed of the degradation
will depend on the choice of L and k in
relation to the number of elements stored in the IBF.
@@ -487,12 +487,12 @@ hashSum | 0x0000 | 0x4242 | 0x0000 |
0x4242 |
decoder returns an IDSUM value that is actually the
XOR of several IDSUMs.
This is primarily detected by checking that the
HASHSUM is the hash of the IDSUM.
Only if the HASHSUM also matches, the bucket could be
pure. Additionally,
- one should check that the IDSUM value actually would
be mapped by M to
+ one MUST check that the IDSUM value actually would be
mapped by M to
the respective bucket. If not, there was a hash
collision.
</t>
<t>
The very rare case that after all these checks a
bucket is still
- falsely identified as pure must be detected (say by
determining that
+ falsely identified as pure MUST be detected (say by
determining that
extracted element IDs do not match any actual
elements), and addressed
at a higher level in the protocol. As these failures
are probabilistic
and depend on element IDs and the IBF construction,
they can typically
@@ -501,7 +501,7 @@ hashSum | 0x0000 | 0x4242 | 0x0000 |
0x4242 |
a different mapping function M.
A more common scenario (especially if L was too small)
is that
IBF decoding fails because there is no pure bucket. In
this case, the
- higher-level protocol should also retry using
different parameters.
+ higher-level protocol SHOULD also retry using
different parameters.
</t>
<t>
Suppose the IBF contains a pure bucket. In this case,
the IDSUM in the
@@ -509,7 +509,7 @@ hashSum | 0x0000 | 0x4242 | 0x0000 |
0x4242 |
to remove that element from the IBF (by inserting it
if the counter
was negative, and by removing it if the counter was
positive). This
is likely to cause other buckets to become pure,
allowing further
- elements to be decoded. Eventually, decoding should
succeed with
+ elements to be decoded. Eventually, decoding ought to
succeed with
all counters and IDSUM and HASHSUM values reach zero.
However, it is also
possible that an IBF only partly decodes and then
decoding fails after
obtaining some elements.
@@ -576,10 +576,10 @@ hashSum | 0x0000 | 0x0000 | 0x0000 |
0x0000 |
IBF2 represents set B. Then this set difference
operation will compute IBF3 which
represents the set A - B --- without having to
transfer the elements from set A or B.
- To calculate the IBF representing this set difference,
both IBFs must have the same
+ To calculate the IBF representing this set difference,
both IBFs MUST have the same
length L, the same number of buckets per element k and
use the same map M. Given this,
one can compute the IBF representing the set
difference by taking the XOR of the IDSUM and HASHSUM values
- of the respective buckets and subtracting the
respective counters. Care should be taken
+ of the respective buckets and subtracting the
respective counters. Care MUST be taken
to handle overflows and underflows by setting the
counter to "infinity" as necessary.
The result is a new IBF with the same number of
buckets representing the set difference.
</t>
@@ -645,10 +645,12 @@ hashSum | 0x0101 | 0x0101 | 0x5050 |
0x0000 |
<t>
To facilitate a reasonably CPU-efficient
implementation, this specification requires the
- IBF counter always to use 8 bits. Fewer bits
+ IBF counter always to use 8 bits.
+ FIXME: I thought we lifted this constraint!?
+ Fewer bits
would result in a particularly inefficient
implementation, while more bits are rarely useful
- as sets with so many elements should likely be
+ as sets with so many elements should be
represented using a larger number of buckets. This
means the counter of this design can reach a
minimum of -127 and a maximum of 127 before the
@@ -656,7 +658,7 @@ hashSum | 0x0101 | 0x0101 | 0x5050 |
0x0000 |
</t>
<t>
For the "IDSUM", we always use a 64-bit representation.
- The IDSUM value must have sufficient entropy for the
+ The IDSUM value MUST have sufficient entropy for the
mapping function M to yield reasonably random buckets
even for very large values of L. With a 32 bit
value the chance that multiple elements may be mapped
@@ -679,7 +681,7 @@ hashSum | 0x0101 | 0x0101 | 0x5050 |
0x0000 |
to handle occasional collisions, so while with
32-bits there remains a chance of
accidental collisions, at 32 bit the chance is
- generally believed to be sufficiently small
+ generally believed to be sufficiently small
for the protocol to handle those cases efficiently
for a wide range of use-cases. Smaller hash
values would safe bandwidth, but also drastically
@@ -812,7 +814,7 @@ FUNCTION get_bucket_id (key, number_of_buckets_per_element,
ibf_size)
Basically a Strata Estimator (SE) is a series of IBFs
(with a rather small value of L)
in which increasingly large subsets of the full set
of elements are added to each IBF. For the n-th IBF, the
function selecting the
- subset of elements should sample to select
(probabilistically) 1/(2^n) of all
+ subset of elements MUST sample to select
(probabilistically) 1/(2^n) of all
elements. This can be done by counting the number of
trailing bits set to "1"
in an element ID, and then inserting the element into the
IBF identified by that
counter. As a result, all elements will be mapped to one
IBF, with the n-th
@@ -852,7 +854,7 @@ FUNCTION get_bucket_id (key, number_of_buckets_per_element,
ibf_size)
<t>
The second possibility is that the difference of the sets is
small compared to the set size.
In this case, an efficient "delta" synchronization mode is
more efficient. These two possibilities given,
- the first steps of the protocol are used to determine which
mode should be used.
+ the first steps of the protocol are used to determine which
mode MUST be used.
</t>
<t>
@@ -979,7 +981,7 @@ FUNCTION get_bucket_id (key, number_of_buckets_per_element,
ibf_size)
<dt><em><xref target="messages_ibf" format="title"
/></em> message:</dt>
<dd>
If an <em><xref target="messages_ibf"
format="title" /></em> message is received, this
- indicates that decoding of the IBF on the
active site has failed and roles should be swapped.
+ indicates that decoding of the IBF on the
active site has failed and roles will be swapped.
The receiving passive peer transitions into
the <strong>Expecting IBF Last</strong> state,
and waits for more <em><xref
target="messages_ibf" format="title" /></em> messages
or the final <em><xref
target="messages_ibf_last" format="title" /></em> message to be received.
@@ -1050,8 +1052,10 @@ FUNCTION get_bucket_id (key,
number_of_buckets_per_element, ibf_size)
set there was probably a bucket decoded that
was not pure so potentially all <em><xref target="messages_offer"
format="title" /></em>
and <em><xref target="messages_demand"
format="title" /></em> messages sent later are invalid
in this case a role change active -> passive
with a new IBF is easiest.
+
If a demand for the same element is received
multiple times the demands should be
discarded.
+ FIXME: I thought demanding the same element
multiple times is now verboten?
</dd>
<dt><em><xref target="messages_elements"
format="title" /></em> message:</dt>
<dd>
@@ -1098,7 +1102,7 @@ FUNCTION get_bucket_id (key,
number_of_buckets_per_element, ibf_size)
differences or if the byte-size of the elements is large.
If the set difference is estimated to be large
the <xref target="modeofoperation_full-sync"
format="title" /> is
more efficient. The exact heuristics and parameters on
which the protocol decides which mode
- should be used are described in the <xref
target="performance" format="title" /> section of this document.
+ MUST be used are described in the <xref
target="performance" format="title" /> section of this document.
</t>
<t>
There are two main cases when a <xref
target="modeofoperation_full-sync" format="title" />
@@ -1106,7 +1110,7 @@ FUNCTION get_bucket_id (key,
number_of_buckets_per_element, ibf_size)
The first case is when one of the peers announces having
an empty set. This is announced by setting
the SETSIZE field in the <em><xref target="messages_se"
format="title" /></em> to 0.
The second case is if the application requests full
synchronization explicitly.
- This is useful for testing and should not be used in
production.
+ This is useful for testing and MUST NOT be used in
production.
</t>
</section>
</section>
@@ -1211,7 +1215,7 @@ FUNCTION get_bucket_id (key,
number_of_buckets_per_element, ibf_size)
<dd>
the type of SETU_P2P_REQUEST_IBF as registered in
<xref target="gana" format="title" /> in network byte order.
</dd>
-
+
<dt>SIZE</dt>
<dd>
is a 32-bit unsigned integer which signals the
number of buckets in the IBF.
@@ -1223,7 +1227,7 @@ FUNCTION get_bucket_id (key,
number_of_buckets_per_element, ibf_size)
in the <xref
target="performance_counter_variable_size" format="title" /> section.
</dd>
-
+
<dt>OFFSET</dt>
<dd>
is a 32-bit unsigned integer which signals the
offset to the following ibf slices in the original.
@@ -1235,7 +1239,7 @@ FUNCTION get_bucket_id (key,
number_of_buckets_per_element, ibf_size)
</dd>
<dt>IBF-SLICE</dt>
<dd>
-
+
<t>
are variable numbers of slices in an array. A
single slice contains multiple 64-bit IDSUMS,
32-bit HASHSUMS and 1-64bit COUNTERS of
variable size. In the network order the array of IDSUMS is first, followed
@@ -1243,7 +1247,7 @@ FUNCTION get_bucket_id (key,
number_of_buckets_per_element, ibf_size)
by MIN( SIZE - OFFSET,
MAX_BUCKETS_PER_MESSAGE). MAX_BUCKETS_PER_MESSAGE is defined as
32768 divided by the BUCKET_SIZE which is
13-byte (104-bit).
</t>
-
+
<t>
To get the IDSUM field, all IDs hitting a
bucket are added up with a binary XOR operation.
See <xref target="ibf_format_id_generation"
format="title" /> details about ID generation.
@@ -1257,7 +1261,7 @@ FUNCTION get_bucket_id (key,
number_of_buckets_per_element, ibf_size)
The algorithm to find the correct bucket in
which the ID and the HASH have to be added
is described in detail in section <xref
target="ibf_format_bucket_identification" format="title" />.
</t>
-
+
<t>
Test vectors for an implementation can be
found in the <xref target="test_vectors" format="title" /> section
</t>
@@ -1976,8 +1980,8 @@ FUNCTION ibf_get_max_counter(ibf)
# INPUTS:
# ibf: The IBF
-# offset: The offset which defines the starting point from which bucket the
compress operation should start
-# count: The number of buckets to in the array that should be compressed
+# offset: The offset which defines the starting point from which bucket the
compress operation starts
+# count: The number of buckets to in the array that will be compressed
# OUTPUTS:
# returns: An byte array of compressed counters to send over the network
@@ -2031,8 +2035,8 @@ FUNCTION pack_counter(ibf, offset, count)
# INPUTS:
# ibf: The IBF
-# offset: The offset which defines the starting point from which bucket the
compress operation should start
-# count: The number of buckets to in the array that should be compressed
+# offset: The offset which defines the starting point from which bucket the
compress operation starts
+# count: The number of buckets to in the array that will be compressed
# counter_bit_length: The bit length of the counter can be found in the ibf
message in the ibf_counter_bit_length field
# packed_data: A byte array which contains the data packed with the
pack_counter function
# OUTPUTS:
@@ -2106,7 +2110,7 @@ FUNCTION unpack_counter(ibf, offset, count,
counter_bit_length, packed_data)
</t>
<t>
The formal format of all messages needs to be properly validated.
This is important to prevent many
- attacks on the code. The application data should be validated by
the application using
+ attacks on the code. The application data MUST be validated by the
application using
the protocol not by the implementation of the protocol.
In case the format validation fails the set operation MUST be
terminated.
</t>
@@ -2539,7 +2543,7 @@ FUNCTION validate_messages_full_done(full_done_msg,
full_element_msg_store,
In the Active Decoding state it is important to prevent an
attacker from
generating and passing an unlimited amount of IBFs, that
do not decode or
even worse, generate an IBF constructed, to send the peers
in an endless loop.
- To prevent an endless loop in decoding, a loop detection
should be implemented.
+ To prevent an endless loop in decoding, a loop detection
MUST be implemented.
The simplest solution would be to prevent decoding of more
than a given amount of elements.
A more robust solution is to implement a algorithm that
detects a loop by
analyzing past partially decoded IBFs. This can be archived
@@ -2749,7 +2753,7 @@ FUNCTION validate_messages_done(done_msg, offer_msg_store,
to ensure this, multiple factors need to be
considered: The absolute plausible maximum of
elements in a set, which has to be predefined
according
to the use case and the maximal plausible element
increase since the last
- successful set reconciliation, which should be
either predefined or can be calculated
+ successful set reconciliation, which SHOULD be
either predefined or can be calculated
with the gaussian distribution function over all
passed set reconciliations.
</t>
<t>
@@ -2811,7 +2815,7 @@ FUNCTION validate_messages_se(se_msg, remote_setsize,
local_setsize)
When the full done message is received from the
remote peer all
elements, that the remote peer has committed to,
need to be received,
otherwise the operation MUST be terminated. After
receiving the
- full done message no future elements should be
accepted.
+ full done message, future elements MUST NOT be
accepted.
The 512-bit hash of the complete reconciled set
contained in
the full done message is required to ensure that
both sets are truly identical. If
the sets differ, a resynchronisation is required.
The number of possible
@@ -2831,10 +2835,10 @@ FUNCTION validate_messages_se(se_msg, remote_setsize,
local_setsize)
<dd>
<t>
In case an IBF message is received by the peer a
active/passive role switch
- is initiated by the active decoding remote peer.
In this moment the peer should
+ is initiated by the active decoding remote peer.
In this moment the peer MUST
wait for all open offers and demands to be
fulfilled, to prevent
retransmission before switching into active
decoding operation mode.
- A switch into active decoding mode should only be
permitted for
+ A switch into active decoding mode MUST only be
permitted for
a predefined number of times as described in the
top section
of the security section.
</t>
@@ -2885,7 +2889,7 @@ FUNCTION validate_messages_ibf(offer_msg_store,
demand_msg_store, element_msg_st
predefined absolute maximum of elements in the set,
which is defined
by real world limitations.
To implement this restrictions, a list with all
received inquiries
- should be stored and new inquiries should be checked
against.
+ MUST be stored and new inquiries MUST be checked
against.
</t>
<figure
anchor="security_states_passive_decoding_inquiry_code">
<artwork name="" type="" align="left"
alt=""><![CDATA[
@@ -3179,4 +3183,3 @@ counter serie 3: 0x440B
</section>
</back>
</rfc>
-
--
To stop receiving notification emails like this one, please contact
gnunet@gnunet.org.
[Prev in Thread] |
Current Thread |
[Next in Thread] |
- [lsd0003] branch master updated: minor points,
gnunet <=