[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[GNUnet-SVN] r1501 - GNUnet-docs/WWW
From: |
grothoff |
Subject: |
[GNUnet-SVN] r1501 - GNUnet-docs/WWW |
Date: |
Wed, 13 Jul 2005 20:44:06 -0700 (PDT) |
Author: grothoff
Date: 2005-07-13 20:44:03 -0700 (Wed, 13 Jul 2005)
New Revision: 1501
Modified:
GNUnet-docs/WWW/documentation.php3
GNUnet-docs/WWW/encoding.php3
Log:
update
Modified: GNUnet-docs/WWW/documentation.php3
===================================================================
--- GNUnet-docs/WWW/documentation.php3 2005-07-14 03:23:49 UTC (rev 1500)
+++ GNUnet-docs/WWW/documentation.php3 2005-07-14 03:44:03 UTC (rev 1501)
@@ -51,8 +51,7 @@
LI("Special topics");
echo "<ol>\n";
LILI("smtp.php3","Transport performance and SMTP setup");
-LILI("namespace.php3","Namespaces and Directories (outdated)");
-LILI("encoding.php3","Content encoding (outdated)");
+LILI("encoding.php3","Content encoding");
echo "</ol>\n";
echo "</ol>\n";
include("html_footer.php3");
Modified: GNUnet-docs/WWW/encoding.php3
===================================================================
--- GNUnet-docs/WWW/encoding.php3 2005-07-14 03:23:49 UTC (rev 1500)
+++ GNUnet-docs/WWW/encoding.php3 2005-07-14 03:44:03 UTC (rev 1501)
@@ -1,196 +1,44 @@
<?php
include("scripts.php3");
-$title = "Encoding content for anonymous file-sharing in GNUnet";
-$description = "ESED II, a new encoding for anonymous file sharing in GNUnet";
+$title = "The Encoding for Censorship-Resistant Sharing";
+$description = "ECRS, the censorship-resistant content encoding for GNUnet
file sharing";
include("html_header.php3");
-H2("The GNUnet AFS Encoding");
-// FIXME: update for 0.7.0
+H2("The Encoding for Censorship-Resistant Sharing");
BP();
-W("This page is intended to describe the revised GNUnet content encoding which
was implemented in version 0.5.0.");
-W("GNUnet version 0.7.x again modifies the encoding.");
-W("This new scheme is called ECRS (Encoding for Censorship Resistant Sharing)
and is described in %s research paper.",
+W("When GNUnet shares files, it uses a content encoding that is called ECRS,
the Encoding for Censorship-Resistant Sharing.");
+W("Most of ECRS is described in %s research paper.",
extlink_("download/ecrs.ps", "this"));
-W("This page is preserved primarily for historical purposes.");
+W("ECRS obsoletes the previous ESED and ESED II encodings which were used in
GNUnet before version 0.7.0.");
+W("The rest of this page assumes that the reader is familiar with the %s.",
+ extlink_("download/ecrs.ps", "ECRS paper"));
+W("What follows is a description of some minor extensions that GNUnet makes
over what is described in the paper.");
+W("The reason why these extensions are not in the paper is that we felt that
they were obvious or trivial extensions to the original scheme and thus did not
warrant space in the research report.");
EP();
-H3("What is the GNUnet AFS Encoding");
+H3("NBlocks");
BP();
-W("The GNUnet AFS Encoding is the way GNUnet encodes files for anonymous
sharing.");
-W("The first version of the encoding was presented in %s.",
- extlink_("download/esed.ps","Efficient Sharing of Encrypted Data"));
-W("We refer to this original scheme (which is used in GNUnet 0.4.7 and before)
as the ESED scheme.");
-W("This page is discussing the problems with the ESED encoding and tries to
come up with solutions.");
-W("The resulting scheme is called ESED II.");
-
+W("An <tt>NBlock</tt> is, simply speaking, an advertisement for a namespace.");
+W("The <tt>NBlock</tt> is supposed to contain metadata describing the content
of the namespace, in particular the root of the namespace.");
+W("The format of an <tt>NBlock</tt> is almost the same as that of an
<tt>SBlock</tt>.");
+W("In particular, the <tt>NBlock</tt> is signed using the pseudonym.");
+W("By convention, the identifier of the <tt>NBlock</tt> must be chosen to be
all-zeros.");
+W("This allows peers to search for <tt>NBlock</tt> in order to find out more
about a namespace.");
EP();
-H3("Problems with ESED");
-echo "<dl>\n";
-DT("Content Migration");
-echo "<dd>";
-W("1k blocks in ESED can only migrate when they are requested.");
-W("If a node would push ESED blocks out into the network, the triple-hash does
not protect against malicious nodes constructing bad pairs.");
-BR();BR();
-W("There are two attacks, and only one would work.");
-W("We describe both attacks and elaborate why they work (or why not).");
-BR();
-echo "<ol>";
-echo "<li>";
-W("The first is an adversary that sends out garbage into the world, hoping to
push useful content out of the network.");
-W("Since the content message in the ESED scheme consists of a double-hash and
encrypted data that are not related in any obvious way, nodes may think that
garbage messages are perfectly valid and store them, even if the blocks would
not properly decrypt.");
-BR();
-W("This attack will not work since the blocks will never be requested (they
are highly unlikely to match any real request) and thus only nodes with excess
capacity will store them.");
-W("Even those nodes will time-out and replace the content quickly with blocks
that are actually requested.");
-echo "</li>\n";
-echo "<li>";
-W("The second attack involves an adversary that wants to censor a keyword or a
specific piece of content.");
-W("If the adversary knows the respective double-hash of the key or content, he
can push content messages into the network that map the good double-hash to bad
(random) data.");
-W("Nodes that try to receive the data would then get garbage.");
-BR();
-W("The attack only really works if nodes would make copies of content even if
they do not have a matching query spreading.");
-W("In that case, the adversary would be able to massively spread bad content
in the network (the good guys are unable to spread the good blocks to the same
extent), and the content would stay since the queries are popular.");
-W("In the current implementation, this attack is not a problem since nodes
only keep content if there was a matching query, and the chance that an
adversary can spread bad content compared to the good content depends entirely
on its proportional presence in the network compared to nodes offering the good
content.");
-W("Thus the adversary can not make the bad replies dominate the network unless
the adversary dominates the network overall.");
-BR();
-W("In GNUnet 0.5.0 we want to allow active spreading of content in order to
make better use of available disk-space.");
-W("This requires changing the encoding to make the attack described above
impossible.");
-echo "</li>\n";
-echo "</ol></dd>\n";
-DT("Random Disk Accesses");
-echo "<dd>";
-W("The current encoding with 1k blocks requires GNUnet to perform an excessive
amount of disc accesses.");
-W("In particular, every query is a random hash, and 8 GB of content require a
256 MB lookup database just to resolve if the reply is available.");
-W("256 MB can typically not be kept in memory.");
-W("Since the queries are random hashes, every one of them requires a random
disc access.");
-W("This is <em>the</em> major limiting factor for GNUnet AFS scalability,
since modern drives can do at most 10-20 random accesses per second, while the
network bandwidth would allow GNUnet to route more than 100 queries per second
on a slow modem line.");
-BR();
-W("There are two main approaches to solve this problem.");
-echo "<ol>\n";
-echo "<li>";
-W("By reducing the number of different queries that are required to index a GB
of data to a more reasonable value (say 1 MB per GB instead of the current 16
MB), we would be able to cache the index in memory and thus give us a
practically unlimited number of random accesses.");
-W("The problem here is, that a larger block size (currently 1k) would require
fragmentation for some transport protocols.");
-W("An implementation of fragmentation as in IP would require retransmission of
the entire block if a single fragment was lost, which might mean on some lines
that we would become horribly inefficient.");
-W("Also, fragmentation would require nodes to keep more state (the fragments),
increasing memory requirements.");
-W("Possible attacks (like sending only the first fragment) would compilicate
the implementation greatly.");
-echo "</li>\n";
-echo "<li>";
-W("If the node would be able to do a cheaper test to gauge the chances that
the lookup succeeds, it might reduce the cost for the random disk access down
to a level where say 90 percent of the queries would be filtered up-front.");
-echo "</ol></dd>";
-DT("Resynchronizing the stream");
-echo "<dd>";
-W("In ESED, inserting a single character at the beginning of a file results in
a completely different file, reducing the chances of saving space by storing
equal data only once.");
-W("It is still not clear how bad this really is since for large files (mp3,
ogg, avi), I would suspect that they are hardly ever going to be just a bit
different in a few places.");
-W("A different encoder is rather likely to produce a stream that has no
similarity, but that’s just my intuition.");
-W("It would be nice to see some proof that there can actually be a significant
benefit in practice before even trying to develop some technique to solve
this.");
-echo "</dd>\n";
-DT("Resuming downloads");
-echo "<dd>";
-W("Resuming an aborted download in GNUnet 0.4.7 and earlier requires
re-requesting all the IBlocks of the file.");
-W("While DBlocks are not requested again, this can still be a significant
overhead.");
-W("The major reason for this is an implementation that did not quite allow
fixing this problem easily.");
-W("In particular, we may want to compute IBlocks from the DBlocks that are
available in the tree below.");
-W("This requires us to integrate the code for insertion and downloading.");
-W("This is mostly a code design question, though.");
-echo "</dd></dl>\n";
-BR();BR();
-HR();
-BR();BR();
-H3("Ideas");
+
+H3("KNBlocks");
BP();
-W("Here is the initial set of ideas:");
+W("GNUnet implements <tt>KNBlocks</tt> which are <tt>KBlocks</tt> that instead
of metadata and a file identifier contain an <tt>NBlock</tt> in plaintext.");
+W("In other words, <tt>KNBlocks</tt> enable GNUnet to find <tt>NBlocks</tt>
using the global keyword search.");
+W("The rationale behind <tt>KNBlock</tt>s and <tt>NBlock</tt>s is to enable
peers to automatically discover namespaces and to associate useful information
with the namespaces themselves.");
+W("When GNUnet finds <tt>KNBlocks</tt> during a normal keyword search, it adds
the information to an internal list of discovered namespaces.");
+W("Users looking for interesting namespaces can then inspect this list,
reducing the need for out-of-band discovery of namespaces.");
+W("Naturally, namespaces can also be referenced from directories, but
<tt>KNBlock</tt>s should make it easier to advertise namespaces for the owner
of the pseudonym since they eliminate the need to first create a directory.");
+P();
+W("Collections are also advertised using <tt>KNBlock</tt>s.");
EP();
-echo "<ul><li>";
-W("Use a %s (%s) to avoid disk accesses for data that is not locally
available.",
- ARRAY(extlink_("http://www.nist.gov/dads/HTML/bloomfilt.html","bloom
filter"),
- extlink_("papers/p422-bloom.pdf","paper")));
-echo "</li><li>";
-W("Use CHK for the IBlocks and DBlocks instead of the ESED scheme.");
-W("This would require us to store 2 hashes in the IBlocks, reducing the degree
of the B-tree from 51 to 25, but it would fix the content migration problem.");
-W("For searches, we would still need the triple-hash scheme, but search
results migrate much faster anyway, so push-migration for those should not be
needed anyway.");
-echo "</li><li>";
-W("Instead of larger blocks, use a summary (say a single hash of the
CHK-encoded blocks as part of the IBlock) as a "super" query to
summarize the blocks underneith.");
-W("This works fine for on-demand encoded and locally inserted blocks, but
nodes that want to make copies of content that is flying by on the return-path
would need to capture all 25 blocks in order to be sure that the summary is
accurate.");
-W("But maybe the push-migration possible due to CHK would make fly-by content
migration less important such that we can afford making fly-by migration harder
(or less likely).");
-W("A 25-node super-query would reduce the lookup database for 8 GB from 256 MB
to a "super-lookup-cache" of only 10 MB.");
-W("25 blocks per super-summary-hash would have the advantage that the IBlock
format would work out exactly with the 1k scheme (25 CHKs a 40 bytes, 1
super-hash a 20 bytes, 1 crc with 4 bytes).");
-echo "</li></ul>";
-BR();BR();
-HR();
-BR();BR();
-H3("The new implementation");
-BP();
-W("The good news is, that except for re-synchronizing the stream, the new
implementation addresses all problems noted above.");
-W("The new encoding encrypts every 1k block using the hash of the block,
called the key-hash (Ripe160, Blowfish) and stores that block under the hash of
the encrypted block, called the query-hash (straight forward Freenet style
CHK).");
-W("The IBlock above then contains both the key-hash and the query-hash.");
-W("With 1k bytes per IBlock, we can store up to 25 CHK-hashes (CHK is the
key-hash plus the query-hash) in one IBlock.");
-W("The IBlock also contains a CRC32 (CRC32 of the concatenation of the CRC32s
of the plaintext blocks of all Blocks below the IBlock) and a so-called
<em>super-hash</em>.");
-W("The super-hash is the hash of the concatenation of all the query-hashes in
the Blocks below the IBlock.");
-W("Note that in principle, the super-hash is redundant in the IBlock, but it
fills up the block nicely to 1024 bytes.");
-P();
-W("The RBlock is still encrypted with the 3HASH scheme from ESED, using the
hash of the block as the key, the hash of the hash as the proof and the hash of
the hash of the hash as the query.");
-W("In queries, the peers do not differenciate between the two types of
encodings (CHK or 3HASH); only the replies are different since the 3HASH reply
contains the proof-hash as part of the reply message, whereas the CHK reply
only contains the encrypted 1k block and no hash code.");
-P();
-
-W("This new encoding allows peers to differenciate between downloads and
searches, but the security implications for this should be absolutely minor
since anonymity and deniability are not affected.");
-W("The most significant attack that this allows is a limited form of
censorship where peers would route only searches (3HASH) if they can guess the
query by checking against a whitelist of <em>good</em> keys.");
-W("Since they would still route content, these peers would still earn trust in
the system.");
-W("Note that the impact of this attack is limited since there is a high
probability that GNUnet would eventually route the searches around these
peers.");
-
-P();
-
-W("In order to reduce the number of disc accesses, the new encoding uses
<em>super-queries</em>.");
-W("Super-queries are queries that contain the super-query from the IBlock
(concatenation of the queries of the Blocks below) and a subset (!) of the
queries from the blocks below.");
-W("Note that in the super-query message, the super-query itself is often not
redundant since it may not be possible to reconstruct the super-query from the
subset of the queries (the exception is, if the subset is the full set of
queries).");
-W("Grouping up to 25 queries in one larger message is obviously saving some
bandwidth (especially since the query-hash code is 20 bytes and there are 4
bytes query-header and 20 bytes return-to address and 4 bytes priority overhead
per query, thus grouping 25 queries in one larger message can save up to 24*28
bytes per super-query in bandwidth), but it is not the major reason why they
were introduced.");
-W("The real reason is to save on disc accesses.");
-W("In order to understand this part, we first need to describe the new content
insertion mechanism.");
-
-P();
-
-W("When content is inserted into GNUnet, the local node is told by
gnunet-insert which super-hashes were inserted.");
-W("Since (for inserted content at least initially, but for indexed files
forever) all 25 queries that correspond to the super-query are going to be
available locally, the node will remember the super-query and mark it as
available in the super-bloomfilter.");
-W("When the node receives a super-query message, it will check with the
super-bloomfilter if the super-query has been marked as available.");
-W("If it has been marked as available, the node can be fairly certain that the
data will be locally available and proceed with the (costly) disc accesses to
retrieve the content.");
-W("Note that the node does not have to mark all 25 hash codes in the
super-bloomfilter but just the one super-hash, reducing the size of the
super-bloomfilter by a factor of 25.");
-W("Since the super-bloomfilter resides in memory, this is very important.");
-
-P();
-
-W("Note that the super-query message is not dropped instantly if the
super-bloomfilter says that the query is not available.");
-W("The reason for this is, that GNUnet also wants to be able to migrate
content.");
-W("Since content in GNUnet does not migrate in 25k blocks (but instead in 1k
blocks), super-query hash codes can not be constructed in a secure manner for
migrated content.");
-W("Also, there is no super-query for the Root-Blocks and the top IBlocks.");
-W("Thus, GNUnet has a second bloom-filter for migrated content and 3HASH
queries.");
-W("If the super-query fails (or if the query is an ordinary query with just
one hash code), GNUnet consults this second bloom filter to test if the content
might be available locally.");
-W("In the case of a super-query message, the second bloom-filter is tested for
each of the individual queries of the query message, but not for the
super-query.");
-W("If the second bloom filter test succeeds for one of the queries, GNUnet
will try to locate the content for that query and send a response.");
-W("The other queries are then routed to other peers.");
-W("Note that the second bloom-filter needs to have space for every hash code
that is locally available and not covered by a super-query.");
-W("This makes it significantly more expensive (a factor of 25 in terms of
memory for the bloom-filter) to store migrated content compared to locally
inserted or indexed content.");
-W("The old implementation was limited to 8 GB of locally available content,
assuming that about 6 GB were indexed or inserted locally and only 2 GB were
have migrated from other peers.");
-
-
-P();
-
-W("When routing super-queries, GNUnet may only be able to keep track of a
subset of the queries of the query-message since the routing table may be
full.");
-W("This is one of the cases where the super-query message is reduced to only
contain a subset of the original 25 queries.");
-W("The other case occurs if the originating gnunet-download process has
already a subset of the blocks locally available.");
-
-P();
-
-W("The new gnunet-download implementation has also been improved.");
-W("It stores IBlocks received from the network in FILENAME.X files until the
download has been completed.");
-W("If the download is resumed, the .X files are used to avoid downloading the
IBlocks again.");
-W("Furthermore, gnunet-download attempts to reconstruct IBlocks from the
Blocks that are lower in the tree.");
-W("For this, the <tt>insert</tt> method that is also used by gnunet-insert is
invoked on the Block.");
-W("<tt>insert</tt> is passed <tt>NULL</tt> for the GNUnet socket to ensure
that it does not actually insert the data found on the drive.");
-W("Only if no matching block was found in the .X files and if the
reconstruction attempt failed, gnunet-download actually starts downloading the
blocks.");
-W("Note that this slows down gnunet-download at the beginning a bit since the
insert-check may be costly, especially if the file is large (we may want to
make it optional).");
-
-EP();
-
include("html_footer.php3");
?>
[Prev in Thread] |
Current Thread |
[Next in Thread] |
- [GNUnet-SVN] r1501 - GNUnet-docs/WWW,
grothoff <=