gzz-commits
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Gzz-commits] gzz/Documentation/misc/hemppah-progradu researc...


From: Hermanni Hyytiälä
Subject: [Gzz-commits] gzz/Documentation/misc/hemppah-progradu researc...
Date: Wed, 18 Dec 2002 06:08:14 -0500

CVSROOT:        /cvsroot/gzz
Module name:    gzz
Changes by:     Hermanni Hyytiälä <address@hidden>      02/12/18 06:08:14

Modified files:
        Documentation/misc/hemppah-progradu: research_problems 

Log message:
        Minor updates

CVSWeb URLs:
http://savannah.gnu.org/cgi-bin/viewcvs/gzz/gzz/Documentation/misc/hemppah-progradu/research_problems.diff?tr1=1.6&tr2=1.7&r1=text&r2=text

Patches:
Index: gzz/Documentation/misc/hemppah-progradu/research_problems
diff -u gzz/Documentation/misc/hemppah-progradu/research_problems:1.6 
gzz/Documentation/misc/hemppah-progradu/research_problems:1.7
--- gzz/Documentation/misc/hemppah-progradu/research_problems:1.6       Tue Dec 
17 08:57:40 2002
+++ gzz/Documentation/misc/hemppah-progradu/research_problems   Wed Dec 18 
06:08:14 2002
@@ -1,5 +1,3 @@
-
-
 1. Approaches
 
 -there are five approaches when performing searches in p2p networks.
@@ -40,8 +38,26 @@
 -SWNs require O(log^2 n) hops to reach arbitrary destinations, assuming (*only 
and only if* !!!) that 
 links between nodes are constructed in the way that they are uniformly 
distributed over all distances 
 in the network (Kleinberg)
+
 -Example systems: SWAN, Freenet
 
+Req. 1:
+The short-range links must be such that for all nodes n and m, where n != m, n
+has a short-range link to a node l, so that distance(l, m) < distance(n, m).
+
+Notice: The length of the link from node n to node m is distance(n, m)
+
+Req. 2:
+The long-range links must be such that for all Long-range links are nearly
+uniformly distributed over all 'distance scales'.
+
+Notice: There *is* a more formal specification about req. 2, but I won't give 
it
+right here, because it quite mathematical (and therefore it makes no sense to
+present it here with these characters).
+
+And, if these requirements are met, SWN network can locate any data in O(log^2
+n) hops (Kleinberg and e.g. simulations in SWAN) 
+
 1.3. Flooding Broadcast Networks (FBN)
 +own resources are not mapped into the network
 +keyword/fuzzy search possible
@@ -89,7 +105,7 @@
 
 ** = Please see http://www.darkridge.com/~jpr5/doc/gnutella.html for details
 
-** = From p2p-hackers mailinglist:
+*** = From p2p-hackers mailinglist:
 - - -
 
 > Btw, how well Alpine can scale (e.g. number of users) ? Do you have any 
 > "real-
@@ -205,6 +221,23 @@
        -metadata ?
        
 2.1.5. Answers to research problem
+-there are different kinds of uses of DHTs, e.g. DHT actually stores the data, 
or DHT only stores 
+the values, which are used for locating the data
+-current systems mainly uses to latter method
+-both have pros and cons: 
+       -in the value use, several hostile nodes might say that they hosts the 
value, but refuse to serve any host
+       -in the storage use, hostile node might say that someone must save data 
block size of 1TB for her computer
+       -however, there are methods for preventing these kinds of actions
+-DHTs and SWNs are very robust to failures: e.g. 20% of the nodes 
simultaneously fail, less than 0.1% of the data retrievals are failed
+-the basic difference between DHTs/SWNs is that in DHT/SWN approach, systems 
"knows", where the desired data block resides
+       -based on data block ID's, nodes gives "hints" (routes more close to ID 
in the key space) constantly to a query lookup until 
+       the specific node is found which hosts the block
+       -no extra network traffic
+-so in a way, DHT/SWNs are structured entities
+-instead, in other approaches, system doesn't "know" where the data block 
resides
+       -we have to ask from any participating node, if they have the data block
+       -very much extra network traffic
+
 -the most efficient algorithm: O(log n)
        -DHTs requires O(log n) hops, log n neighbors, except Viceroy (however, 
robustness suffers)
        -SWNs requires O(log^2 n) hops, much less neighbors (constant, is not 
depedent of n)
@@ -214,7 +247,8 @@
 -in this case, FBNs, Hybrids, Social Discovery require much bandwidth, since 
there is no benefit from the block's ID at all (they have to ask constantly)
 -SWNs require less memory than DHTs, since there are less connections to other 
nodes
 -SWNs relies less to neighbor nodes than DHTs
--proposol: most efficient approaches for this research problem are DHTs and 
SWNs, since they are based on key-value pairs (Storm has a key as block ID) 
+-proposol: most efficient approaches for this research problem are DHTs and 
SWNs, since they are based on key-value pairs (Storm has a key as block ID)
+ 
 
 
 
@@ -243,7 +277,7 @@
 -In this case (urn-5), there is no benefit from the block's ID at all (DHT, 
SWN)
 
 2.2.4. Open questions:
--in this case, is it sensible to maintain two different key-value mappings 
key-value based systems (DHT, SWN) ?
+-in this case, is it sensible to maintain two different key-value mappings 
key-value based systems (DHT, SWN) ?l
        -one fore block IDs
        -one for urn-5 names, which are associated with block IDs
        -is this approach too difficult to maintain ?
@@ -253,25 +287,41 @@
 -How CAs' data should be saved ?
 
 2.2.5. Answers to research problem
--compared to previous research problem, the "first seen" benefits of DHTs and 
SWNs in this research problem are smaller
-       -we don't know which block is the most recent associated with a specfic 
urn-5 name
--in theory, though, the most efficient algorithm is O(log n), *assuming* that 
we already know the most recent block's ID (see previos research problem)
--currently, there is no "out-of-the-box" answer to this research problem
--the question is, how we can efficiently associate urn-5 names with the most 
recent block / to all blocks which are associated with it
-       For DHTs and SWNs:
-       -adaption of Benja's 1st idea: each node maintains a local hash table 
(urn-5 name -> most recent local block ID) for every urn-5 names
-       which node hosts --> we don't have to check all blocks and their urn-5 
associations
-       -adaption of Benja's 2nd idea: All urn-5 name mappings are stored as 
<key, value[ ]>, where the key is urn-5 name's hash and value is a record 
containing
+-compared to previous research problem, the "obvious" benefits of DHTs and 
SWNs in this research problem are smaller
+-we don't beforehand know which block is the most recent associated with a 
specfic urn-5 name
+-in theory, though, the most efficient algorithm is O(log n), *assuming* that 
we already know the most recent block's ID (see previous research problem)
+-the most important question is, how we can efficiently associate urn-5 names 
with the most recent block / to all blocks which are associated with it
+-for DHTs and SWNs:
+
+Please notice: In this approach, DHT doesn't store the actual block, only the 
values for locating the data from the system
+
+       Req. 1:
+       -each node maintains a local stack based data structure (urn-5 name -> 
most recent local block ID) for every urn-5 names
+       which node hosts. The most recent block is topmost --> we don't have to 
check all blocks and their urn-5 associations to get the most recent
+       Req. 2:
+       -all urn-5 name mappings are stored as <key, value[ ]>, where the key 
is urn-5 name's hash and value is a record containing
        block ID and timestamp of that block. So, when we store a block in our 
system first time, we have to create a new key-value: 
        <hash_of_urn_5_name, [block id, block timestamp]> and route this 
mapping to node which is "closest" to a hash value. Now when we want to find 
the most 
-       recent block associated with a specific urn-5 name, we do:      
+       recent block associated with a specific urn-5 name, we do:
+       
                1. locally compute a hash for given urn-5 string
                2. route the query to a node which hosts the given hash of 
urn-5 ("closest" node)
                3. get most recent block's ID using the idea from previuos idea
+               4. use ID to get the specific block using normal DHT/SWN 
operation
                
        In this approach, we don't have to perform additional searching and 
sorting of mappings. And of course, we know that for given urn-5, only one node
        hosts *all* the block information ("block history") for the urn-5, 
since mappings are mapped to a single node, closest to urn-5 hash value.
-       Again, this should work fine under existing DHTs (and SWTs ?).  
+       Again, this should work fine under existing DHTs (and SWTs ?).
+       
+       Some simple analysis: 
+       -there are more key-value pairs in the system for additional urn-5 --> 
block associations
+       -however, I don't think this is an issue, since data's size is small
+       -efficiency: find node which hosts urn-5 names + find node which hosts 
blocks associated with urn-5 name: logn + logn = 2logn (logarithmical)
+       
+-for FBS and others:
+       -there is no very efficient (simple) methods for finding urn-5 name 
associated with the most recent block
+       -one simple proposal is that when we visit to each node (first idea 
above), get only the most recent one and compare them (or greedy approach: 
dismiss currently 
+       most recent block as we visit to nodes, if newer block have been found)
 
 2.3. "Searching for Storm blocks associated with specific urn-5 name, where 
      specific date has been defined (or date range), and where Storm block 
@@ -301,6 +351,35 @@
 tree based structure for all blocks for specific urn-5 name) ?
 -How CAs' data should be saved ?
 
+2.3.5. Answers to research problem
+This is quite similar to previous research problem's answer. There are slight 
differences, though.
+
+-for DHTs and SWNs:
+
+Please notice: In this approach, DHT doesn't store the actual block, only the 
values for locating the data from the system
+
+       Req. 1:
+       -each node maintains a local stack based data structure (urn-5 name -> 
most recent local block ID) for every urn-5 names
+       which node hosts. The most recent block is topmost --> we don't have to 
check all blocks and their urn-5 associations to get the most recent
+       Req. 2:
+       -all urn-5 name mappings are stored as <key, value[ ]>, where the key 
is urn-5 name's hash and value is a record containing
+       block ID and timestamp of that block. So, when we store a block in our 
system first time, we have to create a new key-value: 
+       <hash_of_urn_5_name, [block id, block timestamp]> and route this 
mapping to node which is "closest" to a hash value. Now when we want to find 
the most 
+       recent block associated with a specific urn-5 name, we do:      
+       
+               1. locally compute a hash for given urn-5 string
+               2. route the query to a node which hosts the given hash of 
urn-5 ("closest" node)
+               3. get the specific block(s) data (block ID) from the stack 
which matches to given date&time properties (we compare to block header data)
+               4. use IDs to get the specific blocks using normal DHT/SWN 
operation            
+       
+       Some simple analysis: 
+       -there are more key-value pairs in the system for additional urn-5 --> 
block associations
+       -however, I don't think this is an issue, since data's size is small
+       -efficiency: find node which hosts urn-5 names + find node which hosts 
blocks associated with urn-5 name: logn + logn = 2logn (logarithmical)
+       
+-for FBS and others:
+       -there is no very efficient (simple) methods for finding urn-5 name 
associated with the most recent block
+       -one simple proposal is that when we be that we visit to each node 
(first idea above), get all blocks which matches to given properties
       
 
 2.4. "How does search engine should work?"



reply via email to

[Prev in Thread] Current Thread [Next in Thread]