[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Gzz-commits] gzz/Documentation/misc/hemppah-progradu researc...
From: |
Hermanni Hyytiälä |
Subject: |
[Gzz-commits] gzz/Documentation/misc/hemppah-progradu researc... |
Date: |
Wed, 18 Dec 2002 06:08:14 -0500 |
CVSROOT: /cvsroot/gzz
Module name: gzz
Changes by: Hermanni Hyytiälä <address@hidden> 02/12/18 06:08:14
Modified files:
Documentation/misc/hemppah-progradu: research_problems
Log message:
Minor updates
CVSWeb URLs:
http://savannah.gnu.org/cgi-bin/viewcvs/gzz/gzz/Documentation/misc/hemppah-progradu/research_problems.diff?tr1=1.6&tr2=1.7&r1=text&r2=text
Patches:
Index: gzz/Documentation/misc/hemppah-progradu/research_problems
diff -u gzz/Documentation/misc/hemppah-progradu/research_problems:1.6
gzz/Documentation/misc/hemppah-progradu/research_problems:1.7
--- gzz/Documentation/misc/hemppah-progradu/research_problems:1.6 Tue Dec
17 08:57:40 2002
+++ gzz/Documentation/misc/hemppah-progradu/research_problems Wed Dec 18
06:08:14 2002
@@ -1,5 +1,3 @@
-
-
1. Approaches
-there are five approaches when performing searches in p2p networks.
@@ -40,8 +38,26 @@
-SWNs require O(log^2 n) hops to reach arbitrary destinations, assuming (*only
and only if* !!!) that
links between nodes are constructed in the way that they are uniformly
distributed over all distances
in the network (Kleinberg)
+
-Example systems: SWAN, Freenet
+Req. 1:
+The short-range links must be such that for all nodes n and m, where n != m, n
+has a short-range link to a node l, so that distance(l, m) < distance(n, m).
+
+Notice: The length of the link from node n to node m is distance(n, m)
+
+Req. 2:
+The long-range links must be such that for all Long-range links are nearly
+uniformly distributed over all 'distance scales'.
+
+Notice: There *is* a more formal specification about req. 2, but I won't give
it
+right here, because it quite mathematical (and therefore it makes no sense to
+present it here with these characters).
+
+And, if these requirements are met, SWN network can locate any data in O(log^2
+n) hops (Kleinberg and e.g. simulations in SWAN)
+
1.3. Flooding Broadcast Networks (FBN)
+own resources are not mapped into the network
+keyword/fuzzy search possible
@@ -89,7 +105,7 @@
** = Please see http://www.darkridge.com/~jpr5/doc/gnutella.html for details
-** = From p2p-hackers mailinglist:
+*** = From p2p-hackers mailinglist:
- - -
> Btw, how well Alpine can scale (e.g. number of users) ? Do you have any
> "real-
@@ -205,6 +221,23 @@
-metadata ?
2.1.5. Answers to research problem
+-there are different kinds of uses of DHTs, e.g. DHT actually stores the data,
or DHT only stores
+the values, which are used for locating the data
+-current systems mainly uses to latter method
+-both have pros and cons:
+ -in the value use, several hostile nodes might say that they hosts the
value, but refuse to serve any host
+ -in the storage use, hostile node might say that someone must save data
block size of 1TB for her computer
+ -however, there are methods for preventing these kinds of actions
+-DHTs and SWNs are very robust to failures: e.g. 20% of the nodes
simultaneously fail, less than 0.1% of the data retrievals are failed
+-the basic difference between DHTs/SWNs is that in DHT/SWN approach, systems
"knows", where the desired data block resides
+ -based on data block ID's, nodes gives "hints" (routes more close to ID
in the key space) constantly to a query lookup until
+ the specific node is found which hosts the block
+ -no extra network traffic
+-so in a way, DHT/SWNs are structured entities
+-instead, in other approaches, system doesn't "know" where the data block
resides
+ -we have to ask from any participating node, if they have the data block
+ -very much extra network traffic
+
-the most efficient algorithm: O(log n)
-DHTs requires O(log n) hops, log n neighbors, except Viceroy (however,
robustness suffers)
-SWNs requires O(log^2 n) hops, much less neighbors (constant, is not
depedent of n)
@@ -214,7 +247,8 @@
-in this case, FBNs, Hybrids, Social Discovery require much bandwidth, since
there is no benefit from the block's ID at all (they have to ask constantly)
-SWNs require less memory than DHTs, since there are less connections to other
nodes
-SWNs relies less to neighbor nodes than DHTs
--proposol: most efficient approaches for this research problem are DHTs and
SWNs, since they are based on key-value pairs (Storm has a key as block ID)
+-proposol: most efficient approaches for this research problem are DHTs and
SWNs, since they are based on key-value pairs (Storm has a key as block ID)
+
@@ -243,7 +277,7 @@
-In this case (urn-5), there is no benefit from the block's ID at all (DHT,
SWN)
2.2.4. Open questions:
--in this case, is it sensible to maintain two different key-value mappings
key-value based systems (DHT, SWN) ?
+-in this case, is it sensible to maintain two different key-value mappings
key-value based systems (DHT, SWN) ?l
-one fore block IDs
-one for urn-5 names, which are associated with block IDs
-is this approach too difficult to maintain ?
@@ -253,25 +287,41 @@
-How CAs' data should be saved ?
2.2.5. Answers to research problem
--compared to previous research problem, the "first seen" benefits of DHTs and
SWNs in this research problem are smaller
- -we don't know which block is the most recent associated with a specfic
urn-5 name
--in theory, though, the most efficient algorithm is O(log n), *assuming* that
we already know the most recent block's ID (see previos research problem)
--currently, there is no "out-of-the-box" answer to this research problem
--the question is, how we can efficiently associate urn-5 names with the most
recent block / to all blocks which are associated with it
- For DHTs and SWNs:
- -adaption of Benja's 1st idea: each node maintains a local hash table
(urn-5 name -> most recent local block ID) for every urn-5 names
- which node hosts --> we don't have to check all blocks and their urn-5
associations
- -adaption of Benja's 2nd idea: All urn-5 name mappings are stored as
<key, value[ ]>, where the key is urn-5 name's hash and value is a record
containing
+-compared to previous research problem, the "obvious" benefits of DHTs and
SWNs in this research problem are smaller
+-we don't beforehand know which block is the most recent associated with a
specfic urn-5 name
+-in theory, though, the most efficient algorithm is O(log n), *assuming* that
we already know the most recent block's ID (see previous research problem)
+-the most important question is, how we can efficiently associate urn-5 names
with the most recent block / to all blocks which are associated with it
+-for DHTs and SWNs:
+
+Please notice: In this approach, DHT doesn't store the actual block, only the
values for locating the data from the system
+
+ Req. 1:
+ -each node maintains a local stack based data structure (urn-5 name ->
most recent local block ID) for every urn-5 names
+ which node hosts. The most recent block is topmost --> we don't have to
check all blocks and their urn-5 associations to get the most recent
+ Req. 2:
+ -all urn-5 name mappings are stored as <key, value[ ]>, where the key
is urn-5 name's hash and value is a record containing
block ID and timestamp of that block. So, when we store a block in our
system first time, we have to create a new key-value:
<hash_of_urn_5_name, [block id, block timestamp]> and route this
mapping to node which is "closest" to a hash value. Now when we want to find
the most
- recent block associated with a specific urn-5 name, we do:
+ recent block associated with a specific urn-5 name, we do:
+
1. locally compute a hash for given urn-5 string
2. route the query to a node which hosts the given hash of
urn-5 ("closest" node)
3. get most recent block's ID using the idea from previuos idea
+ 4. use ID to get the specific block using normal DHT/SWN
operation
In this approach, we don't have to perform additional searching and
sorting of mappings. And of course, we know that for given urn-5, only one node
hosts *all* the block information ("block history") for the urn-5,
since mappings are mapped to a single node, closest to urn-5 hash value.
- Again, this should work fine under existing DHTs (and SWTs ?).
+ Again, this should work fine under existing DHTs (and SWTs ?).
+
+ Some simple analysis:
+ -there are more key-value pairs in the system for additional urn-5 -->
block associations
+ -however, I don't think this is an issue, since data's size is small
+ -efficiency: find node which hosts urn-5 names + find node which hosts
blocks associated with urn-5 name: logn + logn = 2logn (logarithmical)
+
+-for FBS and others:
+ -there is no very efficient (simple) methods for finding urn-5 name
associated with the most recent block
+ -one simple proposal is that when we visit to each node (first idea
above), get only the most recent one and compare them (or greedy approach:
dismiss currently
+ most recent block as we visit to nodes, if newer block have been found)
2.3. "Searching for Storm blocks associated with specific urn-5 name, where
specific date has been defined (or date range), and where Storm block
@@ -301,6 +351,35 @@
tree based structure for all blocks for specific urn-5 name) ?
-How CAs' data should be saved ?
+2.3.5. Answers to research problem
+This is quite similar to previous research problem's answer. There are slight
differences, though.
+
+-for DHTs and SWNs:
+
+Please notice: In this approach, DHT doesn't store the actual block, only the
values for locating the data from the system
+
+ Req. 1:
+ -each node maintains a local stack based data structure (urn-5 name ->
most recent local block ID) for every urn-5 names
+ which node hosts. The most recent block is topmost --> we don't have to
check all blocks and their urn-5 associations to get the most recent
+ Req. 2:
+ -all urn-5 name mappings are stored as <key, value[ ]>, where the key
is urn-5 name's hash and value is a record containing
+ block ID and timestamp of that block. So, when we store a block in our
system first time, we have to create a new key-value:
+ <hash_of_urn_5_name, [block id, block timestamp]> and route this
mapping to node which is "closest" to a hash value. Now when we want to find
the most
+ recent block associated with a specific urn-5 name, we do:
+
+ 1. locally compute a hash for given urn-5 string
+ 2. route the query to a node which hosts the given hash of
urn-5 ("closest" node)
+ 3. get the specific block(s) data (block ID) from the stack
which matches to given date&time properties (we compare to block header data)
+ 4. use IDs to get the specific blocks using normal DHT/SWN
operation
+
+ Some simple analysis:
+ -there are more key-value pairs in the system for additional urn-5 -->
block associations
+ -however, I don't think this is an issue, since data's size is small
+ -efficiency: find node which hosts urn-5 names + find node which hosts
blocks associated with urn-5 name: logn + logn = 2logn (logarithmical)
+
+-for FBS and others:
+ -there is no very efficient (simple) methods for finding urn-5 name
associated with the most recent block
+ -one simple proposal is that when we be that we visit to each node
(first idea above), get all blocks which matches to given properties
2.4. "How does search engine should work?"
- [Gzz-commits] gzz/Documentation/misc/hemppah-progradu researc..., Hermanni Hyytiälä, 2002/12/16
- [Gzz-commits] gzz/Documentation/misc/hemppah-progradu researc..., Hermanni Hyytiälä, 2002/12/17
- [Gzz-commits] gzz/Documentation/misc/hemppah-progradu researc..., Hermanni Hyytiälä, 2002/12/17
- [Gzz-commits] gzz/Documentation/misc/hemppah-progradu researc..., Hermanni Hyytiälä, 2002/12/17
- [Gzz-commits] gzz/Documentation/misc/hemppah-progradu researc...,
Hermanni Hyytiälä <=
- [Gzz-commits] gzz/Documentation/misc/hemppah-progradu researc..., Hermanni Hyytiälä, 2002/12/18
- [Gzz-commits] gzz/Documentation/misc/hemppah-progradu researc..., Hermanni Hyytiälä, 2002/12/18
- [Gzz-commits] gzz/Documentation/misc/hemppah-progradu researc..., Hermanni Hyytiälä, 2002/12/18
- [Gzz-commits] gzz/Documentation/misc/hemppah-progradu researc..., Hermanni Hyytiälä, 2002/12/19
- [Gzz-commits] gzz/Documentation/misc/hemppah-progradu researc..., Hermanni Hyytiälä, 2002/12/19
- [Gzz-commits] gzz/Documentation/misc/hemppah-progradu researc..., Hermanni Hyytiälä, 2002/12/19