gnunet-svn
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[GNUnet-SVN] r1501 - GNUnet-docs/WWW


From: grothoff
Subject: [GNUnet-SVN] r1501 - GNUnet-docs/WWW
Date: Wed, 13 Jul 2005 20:44:06 -0700 (PDT)

Author: grothoff
Date: 2005-07-13 20:44:03 -0700 (Wed, 13 Jul 2005)
New Revision: 1501

Modified:
   GNUnet-docs/WWW/documentation.php3
   GNUnet-docs/WWW/encoding.php3
Log:
update

Modified: GNUnet-docs/WWW/documentation.php3
===================================================================
--- GNUnet-docs/WWW/documentation.php3  2005-07-14 03:23:49 UTC (rev 1500)
+++ GNUnet-docs/WWW/documentation.php3  2005-07-14 03:44:03 UTC (rev 1501)
@@ -51,8 +51,7 @@
 LI("Special topics"); 
 echo "<ol>\n";
 LILI("smtp.php3","Transport performance and SMTP setup"); 
-LILI("namespace.php3","Namespaces and Directories (outdated)");
-LILI("encoding.php3","Content encoding (outdated)"); 
+LILI("encoding.php3","Content encoding"); 
 echo "</ol>\n";
 echo "</ol>\n";
 include("html_footer.php3");

Modified: GNUnet-docs/WWW/encoding.php3
===================================================================
--- GNUnet-docs/WWW/encoding.php3       2005-07-14 03:23:49 UTC (rev 1500)
+++ GNUnet-docs/WWW/encoding.php3       2005-07-14 03:44:03 UTC (rev 1501)
@@ -1,196 +1,44 @@
 <?php
 include("scripts.php3");
-$title = "Encoding content for anonymous file-sharing in GNUnet";
-$description = "ESED II, a new encoding for anonymous file sharing in GNUnet";
+$title = "The Encoding for Censorship-Resistant Sharing";
+$description = "ECRS, the censorship-resistant content encoding for GNUnet 
file sharing";
 include("html_header.php3");
 
-H2("The GNUnet AFS Encoding");
-// FIXME: update for 0.7.0
+H2("The Encoding for Censorship-Resistant Sharing");
 BP();
-W("This page is intended to describe the revised GNUnet content encoding which 
was implemented in version 0.5.0.");
-W("GNUnet version 0.7.x again modifies the encoding.");
-W("This new scheme is called ECRS (Encoding for Censorship Resistant Sharing) 
and is described in %s research paper.",
+W("When GNUnet shares files, it uses a content encoding that is called ECRS, 
the Encoding for Censorship-Resistant Sharing.");
+W("Most of ECRS is described in %s research paper.",
   extlink_("download/ecrs.ps", "this"));
-W("This page is preserved primarily for historical purposes.");
+W("ECRS obsoletes the previous ESED and ESED II encodings which were used in 
GNUnet before version 0.7.0.");
+W("The rest of this page assumes that the reader is familiar with the %s.",
+  extlink_("download/ecrs.ps", "ECRS paper"));
+W("What follows is a description of some minor extensions that GNUnet makes 
over what is described in the paper.");
+W("The reason why these extensions are not in the paper is that we felt that 
they were obvious or trivial extensions to the original scheme and thus did not 
warrant space in the research report.");
 EP();
 
-H3("What is the GNUnet AFS Encoding");
+H3("NBlocks");
 BP();
-W("The GNUnet AFS Encoding is the way GNUnet encodes files for anonymous 
sharing.");
-W("The first version of the encoding was presented in %s.",
-  extlink_("download/esed.ps","Efficient Sharing of Encrypted Data"));
-W("We refer to this original scheme (which is used in GNUnet 0.4.7 and before) 
as the ESED scheme.");
-W("This page is discussing the problems with the ESED encoding and tries to 
come up with solutions.");
-W("The resulting scheme is called ESED II.");
-
+W("An <tt>NBlock</tt> is, simply speaking, an advertisement for a namespace.");
+W("The <tt>NBlock</tt> is supposed to contain metadata describing the content 
of the namespace, in particular the root of the namespace.");
+W("The format of an <tt>NBlock</tt> is almost the same as that of an 
<tt>SBlock</tt>.");
+W("In particular, the <tt>NBlock</tt> is signed using the pseudonym.");
+W("By convention, the identifier of the <tt>NBlock</tt> must be chosen to be 
all-zeros.");
+W("This allows peers to search for <tt>NBlock</tt> in order to find out more 
about a namespace.");
 EP();
-H3("Problems with ESED");
-echo "<dl>\n";
-DT("Content Migration");
-echo "<dd>";
-W("1k blocks in ESED can only migrate when they are requested.");
-W("If a node would push ESED blocks out into the network, the triple-hash does 
not protect against malicious nodes constructing bad pairs.");
-BR();BR();
-W("There are two attacks, and only one would work.");
-W("We describe both attacks and elaborate why they work (or why not).");
-BR();
-echo "<ol>";
-echo "<li>";
-W("The first is an adversary that sends out garbage into the world, hoping to 
push useful content out of the network.");
-W("Since the content message in the ESED scheme consists of a double-hash and 
encrypted data that are not related in any obvious way, nodes may think that 
garbage messages are perfectly valid and store them, even if the blocks would 
not properly decrypt.");
-BR();
-W("This attack will not work since the blocks will never be requested (they 
are highly unlikely to match any real request) and thus only nodes with excess 
capacity will store them.");
-W("Even those nodes will time-out and replace the content quickly with blocks 
that are actually requested.");
-echo "</li>\n";
-echo "<li>";
-W("The second attack involves an adversary that wants to censor a keyword or a 
specific piece of content.");
-W("If the adversary knows the respective double-hash of the key or content, he 
can push content messages into the network that map the good double-hash to bad 
(random) data.");
-W("Nodes that try to receive the data would then get garbage.");
-BR();
-W("The attack only really works if nodes would make copies of content even if 
they do not have a matching query spreading.");
-W("In that case, the adversary would be able to massively spread bad content 
in the network (the good guys are unable to spread the good blocks to the same 
extent), and the content would stay since the queries are popular.");
-W("In the current implementation, this attack is not a problem since nodes 
only keep content if there was a matching query, and the chance that an 
adversary can spread bad content compared to the good content depends entirely 
on its proportional presence in the network compared to nodes offering the good 
content.");
-W("Thus the adversary can not make the bad replies dominate the network unless 
the adversary dominates the network overall.");
-BR();
-W("In GNUnet 0.5.0 we want to allow active spreading of content in order to 
make better use of available disk-space.");
-W("This requires changing the encoding to make the attack described above 
impossible.");
-echo "</li>\n";
-echo "</ol></dd>\n";
-DT("Random Disk Accesses");
-echo "<dd>";
-W("The current encoding with 1k blocks requires GNUnet to perform an excessive 
amount of disc accesses.");
-W("In particular, every query is a random hash, and 8 GB of content require a 
256 MB lookup database just to resolve if the reply is available.");
-W("256 MB can typically not be kept in memory.");
-W("Since the queries are random hashes, every one of them requires a random 
disc access.");
-W("This is <em>the</em> major limiting factor for GNUnet AFS scalability, 
since modern drives can do at most 10-20 random accesses per second, while the 
network bandwidth would allow GNUnet to route more than 100 queries per second 
on a slow modem line.");
-BR();
-W("There are two main approaches to solve this problem.");
-echo "<ol>\n";
-echo "<li>";
-W("By reducing the number of different queries that are required to index a GB 
of data  to a more reasonable value (say 1 MB per GB instead of the current 16 
MB), we would be able to cache the index in memory and thus give us a 
practically unlimited number of random accesses.");
-W("The problem here is, that a larger block size (currently 1k) would require 
fragmentation for some transport protocols.");
-W("An implementation of fragmentation as in IP would require retransmission of 
the entire block if a single fragment was lost, which might mean on some lines 
that we would become horribly inefficient.");
-W("Also, fragmentation would require nodes to keep more state (the fragments), 
increasing memory requirements.");
-W("Possible attacks (like sending only the first fragment) would compilicate 
the implementation greatly.");
-echo "</li>\n";
-echo "<li>";
-W("If the node would be able to do a cheaper test to gauge the chances that 
the lookup succeeds, it might reduce the cost for the random disk access down 
to a level where say 90 percent of the queries would be filtered up-front.");
-echo "</ol></dd>";
-DT("Resynchronizing the stream");
-echo "<dd>";
-W("In ESED, inserting a single character at the beginning of a file results in 
a completely different file, reducing the chances of saving space by storing 
equal data only once.");
-W("It is still not clear how bad this really is since for large files (mp3, 
ogg, avi), I would suspect that they are hardly ever going to be just a bit 
different in a few places.");
-W("A different encoder is rather likely to produce a stream that has no 
similarity, but that&rsquo;s just my intuition.");
-W("It would be nice to see some proof that there can actually be a significant 
benefit in practice before even trying to develop some technique to solve 
this.");
-echo "</dd>\n";
-DT("Resuming downloads");
-echo "<dd>";
-W("Resuming an aborted download in GNUnet 0.4.7 and earlier requires 
re-requesting all the IBlocks of the file.");
-W("While DBlocks are not requested again, this can still be a significant 
overhead.");
-W("The major reason for this is an implementation that did not quite allow 
fixing this problem easily.");
-W("In particular, we may want to compute IBlocks from the DBlocks that are 
available in the tree below.");
-W("This requires us to integrate the code for insertion and downloading.");
-W("This is mostly a code design question, though.");
-echo "</dd></dl>\n";
-BR();BR();
-HR();
-BR();BR();
-H3("Ideas");
+
+H3("KNBlocks");
 BP();
-W("Here is the initial set of ideas:");
+W("GNUnet implements <tt>KNBlocks</tt> which are <tt>KBlocks</tt> that instead 
of metadata and a file identifier contain an <tt>NBlock</tt> in plaintext.");
+W("In other words, <tt>KNBlocks</tt> enable GNUnet to find <tt>NBlocks</tt> 
using the global keyword search.");
+W("The rationale behind <tt>KNBlock</tt>s and <tt>NBlock</tt>s is to enable 
peers to automatically discover namespaces and to associate useful information 
with the namespaces themselves.");
+W("When GNUnet finds <tt>KNBlocks</tt> during a normal keyword search, it adds 
the information to an internal list of discovered namespaces.");
+W("Users looking for interesting namespaces can then inspect this list, 
reducing the need for out-of-band discovery of namespaces.");
+W("Naturally, namespaces can also be referenced from directories, but 
<tt>KNBlock</tt>s should make it easier to advertise namespaces for the owner 
of the pseudonym since they eliminate the need to first create a directory.");
+P();
+W("Collections are also advertised using <tt>KNBlock</tt>s.");
 EP();
-echo "<ul><li>";
-W("Use a %s (%s) to avoid disk accesses for data that is not locally 
available.",
-  ARRAY(extlink_("http://www.nist.gov/dads/HTML/bloomfilt.html","bloom 
filter"),
-       extlink_("papers/p422-bloom.pdf","paper")));
-echo "</li><li>";
-W("Use CHK for the IBlocks and DBlocks instead of the ESED scheme.");
-W("This would require us to store 2 hashes in the IBlocks, reducing the degree 
of the B-tree from 51 to 25, but it would fix the content migration problem.");
-W("For searches, we would still need the triple-hash scheme, but search 
results migrate much faster anyway, so push-migration for those should not be 
needed anyway.");
-echo "</li><li>";
-W("Instead of larger blocks, use a summary (say a single hash of the 
CHK-encoded blocks as part of the IBlock) as a &quot;super&quot; query to 
summarize the blocks underneith.");
-W("This works fine for on-demand encoded and locally inserted blocks, but 
nodes that want to make copies of content that is flying by on the return-path 
would need to capture all 25 blocks in order to be sure that the summary is 
accurate.");
-W("But maybe the push-migration possible due to CHK would make fly-by content 
migration less important such that we can afford making fly-by migration harder 
(or less likely).");
-W("A 25-node super-query would reduce the lookup database for 8 GB from 256 MB 
to a &quot;super-lookup-cache&quot; of only 10 MB.");
-W("25 blocks per super-summary-hash would have the advantage that the IBlock 
format would work out exactly with the 1k scheme (25 CHKs a 40 bytes, 1 
super-hash a 20 bytes, 1 crc with 4 bytes).");
-echo "</li></ul>";
-BR();BR();
-HR();
-BR();BR();
-H3("The new implementation");
-BP();
-W("The good news is, that except for re-synchronizing the stream, the new 
implementation addresses all problems noted above.");
-W("The new encoding encrypts every 1k block using the hash of the block, 
called the key-hash (Ripe160, Blowfish) and stores that block under the hash of 
the encrypted block, called the query-hash (straight forward Freenet style 
CHK).");
-W("The IBlock above then contains both the key-hash and the query-hash.");
-W("With 1k bytes per IBlock, we can store up to 25 CHK-hashes (CHK is the 
key-hash plus the query-hash) in one IBlock.");
-W("The IBlock also contains a CRC32 (CRC32 of the concatenation of the CRC32s 
of the plaintext blocks of all Blocks below the IBlock) and a so-called 
<em>super-hash</em>.");
-W("The super-hash is the hash of the concatenation of all the query-hashes in 
the Blocks below the IBlock.");
-W("Note that in principle, the super-hash is redundant in the IBlock, but it 
fills up the block nicely to 1024 bytes.");
 
-P();
 
-W("The RBlock is still encrypted with the 3HASH scheme from ESED, using the 
hash of the block as the key, the hash of the hash as the proof and the hash of 
the hash of the hash as the query.");
-W("In queries, the peers do not differenciate between the two types of 
encodings (CHK or 3HASH); only the replies are different since the 3HASH reply 
contains the proof-hash as part of the reply message, whereas the CHK reply 
only contains the encrypted 1k block and no hash code.");
 
-P();
-
-W("This new encoding allows peers to differenciate between downloads and 
searches, but the security implications for this should be absolutely minor 
since anonymity and deniability are not affected.");
-W("The most significant attack that this allows is a limited form of 
censorship where peers would route only searches (3HASH) if they can guess the 
query by checking against a whitelist of <em>good</em> keys.");
-W("Since they would still route content, these peers would still earn trust in 
the system.");
-W("Note that the impact of this attack is limited since there is a high 
probability that GNUnet would eventually route the searches around these 
peers.");
-
-P();
-
-W("In order to reduce the number of disc accesses, the new encoding uses 
<em>super-queries</em>.");
-W("Super-queries are queries that contain the super-query from the IBlock 
(concatenation of the queries of the Blocks below) and a subset (!) of the 
queries from the blocks below.");
-W("Note that in the super-query message, the super-query itself is often not 
redundant since it may not be possible to reconstruct the super-query from the 
subset of the queries (the exception is, if the subset is the full set of 
queries).");
-W("Grouping up to 25 queries in one larger message is obviously saving some 
bandwidth (especially since the query-hash code is 20 bytes and there are 4 
bytes query-header and 20 bytes return-to address and 4 bytes priority overhead 
per query, thus grouping 25 queries in one larger message can save up to 24*28 
bytes per super-query in bandwidth), but it is not the major reason why they 
were introduced.");
-W("The real reason is to save on disc accesses.");
-W("In order to understand this part, we first need to describe the new content 
insertion mechanism.");
-
-P();
-
-W("When content is inserted into GNUnet, the local node is told by 
gnunet-insert which super-hashes were inserted.");
-W("Since (for inserted content at least initially, but for indexed files 
forever) all 25 queries that correspond to the super-query are going to be 
available locally, the node will remember the super-query and mark it as 
available in the super-bloomfilter.");
-W("When the node receives a super-query message, it will check with the 
super-bloomfilter if the super-query has been marked as available.");
-W("If it has been marked as available, the node can be fairly certain that the 
data will be locally available and proceed with the (costly) disc accesses to 
retrieve the content.");
-W("Note that the node does not have to mark all 25 hash codes in the 
super-bloomfilter but just the one super-hash, reducing the size of the 
super-bloomfilter by a factor of 25.");
-W("Since the super-bloomfilter resides in memory, this is very important.");
-
-P();
-
-W("Note that the super-query message is not dropped instantly if the 
super-bloomfilter says that the query is not available.");
-W("The reason for this is, that GNUnet also wants to be able to migrate 
content.");
-W("Since content in GNUnet does not migrate in 25k blocks (but instead in 1k 
blocks), super-query hash codes can not be constructed in a secure manner for 
migrated content.");
-W("Also, there is no super-query for the Root-Blocks and the top IBlocks.");
-W("Thus, GNUnet has a second bloom-filter for migrated content and 3HASH 
queries.");
-W("If the super-query fails (or if the query is an ordinary query with just 
one hash code), GNUnet consults this second bloom filter to test if the content 
might be available locally.");
-W("In the case of a super-query message, the second bloom-filter is tested for 
each of the individual queries of the query message, but not for the 
super-query.");
-W("If the second bloom filter test succeeds for one of the queries, GNUnet 
will try to locate the content for that query and send a response.");
-W("The other queries are then routed to other peers.");
-W("Note that the second bloom-filter needs to have space for every hash code 
that is locally available and not covered by a super-query.");
-W("This makes it significantly more expensive (a factor of 25 in terms of 
memory for the bloom-filter) to store migrated content compared to locally 
inserted or indexed content.");
-W("The old implementation was limited to 8 GB of locally available content, 
assuming that about 6 GB were indexed or inserted locally and only 2 GB were 
have migrated from other peers.");
-
-
-P();
-
-W("When routing super-queries, GNUnet may only be able to keep track of a 
subset of the queries of the query-message since the routing table may be 
full.");
-W("This is one of the cases where the super-query message is reduced to only 
contain a subset of the original 25 queries.");
-W("The other case occurs if the originating gnunet-download process has 
already a subset of the blocks locally available.");
-
-P();
-
-W("The new gnunet-download implementation has also been improved.");
-W("It stores IBlocks received from the network in FILENAME.X files until the 
download has been completed.");
-W("If the download is resumed, the .X files are used to avoid downloading the 
IBlocks again.");
-W("Furthermore, gnunet-download attempts to reconstruct IBlocks from the 
Blocks that are lower in the tree.");
-W("For this, the <tt>insert</tt> method that is also used by gnunet-insert is 
invoked on the Block.");
-W("<tt>insert</tt> is passed <tt>NULL</tt> for the GNUnet socket to ensure 
that it does not actually insert the data found on the drive.");
-W("Only if no matching block was found in the .X files and if the 
reconstruction attempt failed, gnunet-download actually starts downloading the 
blocks.");
-W("Note that this slows down gnunet-download at the beginning a bit since the 
insert-check may be costly, especially if the file is large (we may want to 
make it optional).");
-
-EP();
-
 include("html_footer.php3");
 ?>





reply via email to

[Prev in Thread] Current Thread [Next in Thread]