gzz-commits
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Gzz-commits] gzz/Documentation/misc/hemppah-progradu researc...


From: Hermanni Hyytiälä
Subject: [Gzz-commits] gzz/Documentation/misc/hemppah-progradu researc...
Date: Tue, 17 Dec 2002 05:36:21 -0500

CVSROOT:        /cvsroot/gzz
Module name:    gzz
Changes by:     Hermanni Hyytiälä <address@hidden>      02/12/17 05:36:21

Modified files:
        Documentation/misc/hemppah-progradu: research_problems 

Log message:
        Some initial answers to research problems

CVSWeb URLs:
http://savannah.gnu.org/cgi-bin/viewcvs/gzz/gzz/Documentation/misc/hemppah-progradu/research_problems.diff?tr1=1.4&tr2=1.5&r1=text&r2=text

Patches:
Index: gzz/Documentation/misc/hemppah-progradu/research_problems
diff -u gzz/Documentation/misc/hemppah-progradu/research_problems:1.4 
gzz/Documentation/misc/hemppah-progradu/research_problems:1.5
--- gzz/Documentation/misc/hemppah-progradu/research_problems:1.4       Tue Dec 
17 03:27:50 2002
+++ gzz/Documentation/misc/hemppah-progradu/research_problems   Tue Dec 17 
05:36:21 2002
@@ -47,8 +47,7 @@
 +keyword/fuzzy search possible
 -not scalable
 -huge network traffic
--not fast routing (in fact, there is no upper limit, because there are 
multiple simultaneous
-breadh-first searches in the network)
+-not fast routing 
 -no guarantee that all data will be located
 
 -Example systems: Gnutella, Fastrack family (Kazaa, Morpheus), JXTA Search, 
Gnutella2
@@ -82,12 +81,69 @@
 Kademlia:      O(log n)*       O(log n)        O(log n)
 Viceroy:       O(log n)        O(1)            O(log n)
 Small Worlds:  O(1)            O(1)            O(log^2 n)
-Flooding:      N/A             O(1)            "high"
+Flooding:      N/A**           N/A**           N/A**
 Hybrid:                N/A             N/A             N/A
-Social:                N/A             O(1)            N/A
+Social:                N/A***          N/A***          N/A***
 
 * = In Kademlia, there is no action required when nodes leaves the system
 
+** = Please see http://www.darkridge.com/~jpr5/doc/gnutella.html for details
+
+** = From p2p-hackers mailinglist:
+- - -
+
+> Btw, how well Alpine can scale (e.g. number of users) ? Do you have any 
"real-
+> life" experiences from Alpine (how social-connection paradigm really works
+> etc.) ? What about search performance in Alpine (any big O's) ? Security
+> (PKIs) ? 
+
+Scalability is the big question, as it is hard to guage real world
+scalability without an actual real world network :-)
+
+I have done some preliminary scalability testing on the OSDL systems
+( http://www.osdl.org/ ) using a pair of 4way SMP systems on a gigabit
+network link.  I was able to establish and communicate with over 4 million
+concurrent DTCP/Alpine connections between them, using about a gigabyte of
+RAM.
+
+On lower end hardware and bandwidth (5 - 10M of memory, DSL/cable) peers
+groups would be much smaller, 10,000 to 25,000 concurrent.
+
+The main feature of alpine that allows it to scale with less effort is the
+fact that all communication is direct, using a lightweight UDP based
+transport.  The number of connections is limited only by the memory and
+bandwidth you have available to use.
+
+Search performance is also difficult to guage because of the social
+discovery mechanism employed.  Each peer is continually tuning peer groups
+to increase the use of high quality peers with similar interests and
+removing peers that do not contribute or share interests.
+
+When you first join the network your query effectiveness is going to be
+low.  As you use the network, query effectiveness increases given the
+feedback received from previous queries and how they affect the composition
+of the peer groups you use for resource discovery.
+
+This makes it nice for people who share interests in unpopular or obscure
+resources; they can eventually obtain a fast, effective peer group for
+finding these resources.  This is in direct contrast to most other search
+mechanisms where obscure or unpopular resources are always more difficult
+to locate.
+
+I am focusing on usability for the next and future devel snapshots of
+alpine, so hopefully some real world use on a wider scale will allow
+me to answet your questions which much more detail in the future.
+Right now all I can provide you is rough guesstimates and intuitions..
+
+Last, regarding security, I would like to integrate PGP/GPG style
+assymetric cyphers and digital signatures, however this is a low
+priority.  I am also considering an implementation of peer frogs
+( http://cubicmetercrystal.com/peerfrog/ ) for avoiding man-in-the-middle
+attacks in large decentralized peer networks (although this is geared
+more for large wireless peer networks)
+
+- - - 
+
 Insert/Delete: 
 Number of messages when a node joins or leaves the network.
 
@@ -137,19 +193,30 @@
 -obviously a lot of extra traffic arises in the network
 -approach specific pseudo thinking: "ask repeating from each node if it has 
block, whose ID value is X"
 
-d) HS
-(-text missing)
-
-e) SDS
-(-text missing)
 
 2.1.4. Open questions
 (-This case is quite straight forward, since it resembles very much ordinary 
"locate file and get it")
--Really, how much better are the second generation FBN systems (Gnutella2) 
compared to the first generation FBN systems (Gnutella)
+-really, how much better are the second generation FBN systems (Gnutella2) 
compared to the first generation FBN systems (Gnutella)
        -efficiency ?
        -netowork traffic ?
        -will the data will be located, if it exists in the network ?
        -scalability
+-when searhing, do we have to know the exact hash of block ID or is there 
other alternatives ?
+       -metadata ?
+       
+2.1.5. Answers to research problem
+-the most efficient algorithm: O(log n)
+       -DHTs requires O(log n) hops, log n neighbors, except Viceroy (however, 
robustness suffers)
+       -SWNs requires O(log^2 n) hops, much less neighbors (constant, is not 
depedent of n)
+-in DHTs and SWNs, it is possible to locate data in constant time, BUT it 
requires knowing all n neighbors!!!
+-in basic scenario, DHTs and SWTs assume that we have to know value's key 
beforehand
+-in this case, DHTs and SWNs require little bandwidth, because "network knows 
where the block resides"
+-in this case, FBNs, Hybrids, Social Discovery require much bandwidth, since 
there is no benefit from the block's ID at all (they have to ask constantly)
+-SWNs require less memory than DHTs, since there are less connections to other 
nodes
+-SWNs relies less to neighbor nodes than DHTs
+-proposol: most efficient approaches for this research problem are DHTs and 
SWNs, since they are based on key-value pairs (Storm has a key as block ID) 
+
+
 
 2.2. "Searching for most recent Storm block associated with specific urn-5 
name, where
      the block has been signed with a given key"
@@ -180,12 +247,31 @@
        -one fore block IDs
        -one for urn-5 names, which are associated with block IDs
        -is this approach too difficult to maintain ?
-       
-       
+
 -is there possibility, in the specific urn-5 name, to maintain information 
about most recent block's ID for better search performance (or moreover, 
-tree based structure for all blocks for specific urn-5 name) ?
+tree/list based structure for all blocks for specific urn-5 name) ?
 -How CAs' data should be saved ?
 
+2.2.5. Answers to research problem
+-compared to previous research problem, the benefits of DHTs and SWNs in this 
research problem are smaller
+       -we don't know which block is the most recent associated with a specfic 
urn-5 name
+-in theory, though, the most efficient algorithm is O(log n), *assuming* that 
we already know the most recent block's ID (see previos research problem)
+-currently, there is no "out-of-the-box" answer to this research problem
+-the question is, how we can efficiently associate urn-5 names with the most 
recent block / to all blocks which are associated with it
+       For DHTs and SWNs:
+       -adaption of Benja's 1st idea: each node maintains a local hash table 
(urn-5 name -> most recent local block ID) for every urn-5 names
+       which node hosts --> we don't have to check all blocks and their urn-5 
associations
+       -adaption of Benja's 2nd idea: All urn-5 name mappings are stored as 
<key, value[ ]>, where the key is urn-5 name's hash and value is a record 
containing
+       block ID and timestamp of that block. So, when we store a block in our 
system first time, we have to create a new key-value: 
+       <hash_of_urn_5_name, [block id, block timestamp]> and route this 
mapping to node which is "closest" to a hash value. Now when we want to find 
the most 
+       recent block associated with a specific urn-5 name, we do:      
+               1. locally compute a hash for given urn-5 string
+               2. route the query to a node which hosts the given hash of 
urn-5 ("closest" node)
+               3. get most recent block's ID using the idea from previuos idea
+               
+       In this approach, we don't have to perform additional searching and 
sorting of mappings. And of course, we know that for given urn-5, only one node
+       hosts *all* the block information ("block history") for the urn-5, 
since mappings are mapped to a single node, closest to urn-5 hash value.
+       Again, this should work fine under existing DHTs (and SWTs ?).  
 
 2.3. "Searching for Storm blocks associated with specific urn-5 name, where 
      specific date has been defined (or date range), and where Storm block 



reply via email to

[Prev in Thread] Current Thread [Next in Thread]