gzz-commits
[Top][All Lists]

 From: Hermanni Hyytiälä Subject: [Gzz-commits] gzz/Documentation/misc/hemppah-progradu mastert... Date: Tue, 25 Mar 2003 08:37:05 -0500

CVSROOT:        /cvsroot/gzz
Module name:    gzz
Changes by:     Hermanni Hyytiälä <address@hidden>      03/03/25 08:37:05

Modified files:

Log message:
Is it done !?

CVSWeb URLs:

Patches:
25 07:10:22 2003
08:37:05 2003
@@ -1181,34 +1181,34 @@
Directed BFS \cite{yang02improvingsearch} optimizes the original
BFS in a way that a peer selects the neighbors which have provided many
quality results in the past,
thereby maintaining the quality of costs and decreasing the amount
-of messages sent to network. Alpine \cite{alpineurl} and NeuroGrid
\cite{joseph02neurogrid}
-are Peer-to-Peer systems which use somewhat similar method when performing
data lookups.
+of messages sent to network.

-Local indices \cite{yang02improvingsearch} is a variation of active caching.
-In this scheme, each peer maintains an index over the data of all peers within
+In the local indices techique \cite{yang02improvingsearch}, each peer
maintains an index over the data of all peers within
$h$ hops of itself, where $h$ is a system-wide variable, called radius of the
index\footnote{In the normal BFS case, the value of $h$ is 0, as a peer only
has index
-over its local content.}. Mutual index caching architecture, as proposed in
-\cite{osokine02distnetworks}, is a variation of local indices technique.
+over its local content.}. Thus, when a peer receives a data lookup request, it
can
+process the request on behalf of every node within $r$ hops. Compared to
original BFS,
+the local indices technique keeps data lookup cost low while maintaining the
same
+number of search results.

In the random walk approach \cite{lv02searchreplication}, a peer forwards
query to a
randomly selected neighbor. The basic random walk approach
has a poor response time but it doesn't generate as much network traffic as
the original BFS. As suggested in \cite{lv02searchreplication}, the
random walk approach can be made more effective by introducing
-multiple ''walkers''.
-
-Freenet \cite{clarke00freenet} uses random walk searches in data lookups.
Freenet's data lookup model resembles
+multiple simultaneously working ''walkers''. Freenet \cite{clarke00freenet}
uses
+random walk searches in data lookups. Freenet's data lookup model resembles
Depth-First-Search (DFS) and peers' routing tables are dynamically built
using caching. This is an outcome of Freenet's main design principles,
anonymity.
Another property of the Freenet's data lookup model is that
-it adapts well with varying usage patterns. Improvements to Freenet's data
lookup using
-the ''small-world phenomenon'' have been proposed by Zhang et al.
\cite{zhang02using}.
+it adapts well with varying usage patterns (e.g., searching for popular data
items in the overlay).
+Improvements to Freenet's data lookup using
+the ''small-world'' techniques have been proposed by Zhang et al.
\cite{zhang02using}.

Since tightly structured systems have an efficient data lookup at the
application level overlay,
current research efforts are focused on the proximity-based data lookup. In
the proximity-based data lookup,
peers try to choose entries of routing-tables referring to other peers that
are \emph{nearby} in the
-underlying network. In this way, tightly structured systems are able to
decrease actual
+underlying network. In this way, tightly structured systems are able to
decrease the actual
Pastry \cite{rowston01pastry} and Tapestry \cite{zhao01tapestry} have advanced
heuristics for
proximity-based
@@ -1216,109 +1216,111 @@
uses a combination of proximity and application level overlay routing when
performing data
lookups. Authors call this feature as a \emph{constrained load balancing}.

-Additional research related to proximity-based routing include
\cite{karger02findingnearest, hildrum02distributedobject,
-brinkmann02compactplacement, rhea02probabilistic, castro02networkproximity,
ng02predicting, pias03lighthouse}.
-
\subsection{Fast and usable search}

To make Peer-to-Peer systems even more popular (and usable), these systems
have to support flexible, efficient
-and easy search methods. For instance, Internet's perhaps the most important
feature
Currently, only loosely
-structured systems are able to carry out this requirement. Unfortunately, as
discussed in this text,
-the data lookup model of the loosely structured approach doesn't scale. Thus,
research efforts have
-been focused towards tightly structured systems. The main problem with tightly
structured systems is the
-fact that tightly structured algorithms perform data lookups based on a
globally unique identifier (key).
+and simple search methods \cite{li03feasibility}. Currently, loosely
structured systems are able to carry out the flexibility and
+simplicity requirements and tightly structured systems are able to fulfill the
efficiency requirement.
+Research efforts have been focused to make security methods in tightly
structured systems more usable since
+the data lookup model of the loosely structured approach doesn't scale.
However, studies
+show that combining flexible search methods and tightly structured systems may
not be trivial \cite{harren02complex,
+ansaryefficientbroadcast03}.  The main problem with tightly structured systems
is the
+fact that tightly structured algorithms perform data lookups based on a
globally unique identifier thereby
+making efficient keyword searches hard to implement.
+
+Some studies have been concentrated on SQL-like queries \cite{harren02complex}
+in tightly structured overlays. It is unknown, however, if this approach is
realizable to implement, since
+initial analysis have shown that this approach is rather complex. Other
approaches include adaption of the data lookup model of the loosely
+structured approach into tightly structured systems
+Work in \cite{ansaryefficientbroadcast03} seems quite promising. Authors' work
is based on insight that
+performing data lookup in the overlay resembles regular tree-like search,
where trees' data structure
+is distributed throughout the overlay. Some studies suggest additional layer
upon overlay network \cite{kronfol02fasdsearch,
+joseph02p2players}, which use metadata to implement search methods. The
+search layer on top of the network layer is questionable, especially if the
search layer and the network
+layer have different assumptions about the participating peers (e.g., the
network layer supports heterogeneity
+of peers, but the search layer doesn't). Andrzejak et al. propose range
queries \cite{andrzejak02rangequeries}
+to be used with tightly structured overlays. In this technique, it is feasible
to perform data lookups
+using ranges of keys thereby covering larger amount of possible data items.
Currently their prototype
+is designed for the CAN system \cite{ratnasamy01can}.

Recent study has been focused on the feasibility of Peer-to-Peer Web-like
indexing and searching
on top of tightly structured overlays \cite{li03feasibility} . Authors argue,
that it is possible to implement
Peer-to-Peer Web-like search with certain compromises. First, Peer-to-Peer
search engine may need to
decrease the result quality in order to make searching more efficient. Second,
Peer-to-Peer systems must
-consult the properties of underlying network for better performance.
-
-Some studies have been concentrated on SQL-like queries \cite{harren02complex}
-in tightly structured overlays. Other approaches include adaption of the data
lookup model of the loosely
-structured approach into tightly structured systems
-Some studies suggest additional layer upon overlay network
\cite{kronfol02fasdsearch, joseph02p2players}
-and range queries \cite{andrzejak02rangequeries}.
+consult the properties of underlying network for better performance.

Many techniques have been developed in order to provide more efficient search
indexing. As
several studies show, the popularity of queries in the Internet follow
Zipf-like
distributions\footnote{Zipf-distribution is a variant of power-law function.
Zipf-distribution can be used in observation of frequency of occurrence event
$E$, as a function of the rank
$i$ when the rank is determined by the frequency of occurrence, is a power-law
function $E_i \sim \frac{1}{i^{a}}$,
-where the exponent $a$ is close to unity.} (e.g.,
\cite{breslau98implications}). Therefore, caching and pre-computation
-can be done for optimizing search indices \cite{li03feasibility}. Regular
compression algorithms,
-Bloom filters \cite{362692}, vector space models
\cite{CuencaAcuna2002DSIWorkshop} and view
-trees \cite{Bhattacharjee03resultcache} can be used for even better
optimizations. Authors
-in \cite{li03feasibility} use Gap compression \cite{wittengigabytes}, Adaptive
Set Intersection \cite{338634}
-and clustering with their search optimizations.
+where the exponent $a$ is close to unity.} (e.g.,
\cite{breslau98implications}).
+Therefore, according to \cite{li03feasibility}, caching and pre-computation
can be done for optimizing search indices.
+Authors in \cite{li03feasibility} use Gap compression \cite{wittengigabytes},
+and clustering with their search optimizations. Regular compression
algorithms, Bloom filters \cite{362692}, vector
+space models \cite{CuencaAcuna2002DSIWorkshop} and view trees
\cite{Bhattacharjee03resultcache} can be used for even
+better optimizations.

While it is expected that web-like searches can be layered on a top of tightly
structured overlay, much
more research is required to make indexing and searching more efficient.

-
\subsection{System management}

Adaptive system management and self-organization are essential properties
of any Peer-to-Peer system, since centralized control over the system is
missing. Loosely structured
-systems require less system management properties than tightly structured
systems; in a loosely
-structured system, peers join and leave the system constantly without any
restrictions. On the
-other hand, however, peers in tightly structured system join and leave the
system but have less freedom,
+systems require less system management properties than tightly structured
systems: in a loosely
+structured system, peers join and leave the system constantly without any
restrictions \cite{saroiu02measurementstudyp2p}.
+On the other hand, however, peers in tightly structured system join and leave
the system but have less freedom,
i.e., overlay chooses peer's neighbors on behalf of the peer itself and maps
data items randomly
-throughout the overlay network.
-
-All presented algorithms of the tightly structured approach have been analyzed
under static simulation
-environments. Furthermore, proposed tightly structured overlays are configured
statically to achieve
+throughout the overlay network \cite{balakrishanarticle03lookupp2p}. Almost
all presented algorithms
+for the tightly structured systems have been analyzed under static simulation
+environments \cite{libennowell01observations}. Furthermore, proposed tightly
structured overlays are configured statically to achieve
the desired reliability even in a uncommon and adverse environment
\cite{rowston03controlloingreliability}.
-The most important factor for future research is to get real-life experiences
from tightly structured
-systems, when there are frequent joins and leaves in the system.
+Thus, one of the most important factors for future research is to get
real-life experiences from tightly structured
+systems, when there are frequent joins and leaves of peers in the system.

-The concept of ''half-life'' was introduced by Liben-Nowell
\cite{libennowell01observations} since Peer-to-Peer
-system is \emph{never} in the ''ideal'' state as Peer-to-Peer system is
continiously evolving system. Half-life is defined
-as follows: let there be $N$ live peers at time $t$. The doubling from time
$t$ is the time that pass before
-$N$ new additional peers arrive into the system. The halving time from time
$t$ is the time
-required for half of the living peers at time $t$ to leave the system. The
half-life from
-time $t$ is smaller of the properties stated above. The half-life of the
entire system is the
-minimum half-life over all times $t$. Concept of half-time can be used as a
basis for developing
-more powerful analytical tools for modelling complex Peer-to-Peer systems.
+As mentioned before, an implicit assumption of almost every tightly structured
system is that there is a random, uniform
+distribution of peer and key identifiers. Even if participating peers are
extremely heterogeneous, e.g., in
+computing power or network bandwidth, all data items are distributed
uniformly. Clearly, this is
+a serious problem of tightly structured overlays in face of performance and
+Measurement study by Saroiu et al. show that there is a extreme heterogeneity
among participating peers in already deployed Peer-to-Peer
+systems \cite{saroiu02measurementstudyp2p}. Symphony \cite{gurmeet03symphony}
seems to be the first tightly structured overlay system
+which supports heterogeneity. Zhao et al. have proposed a secondary layer on
top of a structured overlay

Some research has been done with regard to load balancing properties of
tightly structured
overlays. Byers et al. suggest an idea of ''power of two choices'' whereby
data item is stored at the less loaded
of two (or more) random peer alternatives \cite{byers03dhtbalancing}. Rao et
al. use virtual servers
to control the load balance in a Peer-to-Peer system
\cite{rao03loadbalancing}. Their work rests on the
idea which was originally introduced by Chord \cite{stoica01chord} system.
+Ledlie et al. propose techniques for forming and maintaining groups in a
highly dynamic environment
+\cite{ledlie02selfp2p}. Their work relies on the idea that
+participating peers would create multiple hierarchical groups. It is not clear
whether this approach
+is scalable or fault tolerant and suitable for Peer-to-Peer environment. More
promising work has been done by Rowston et al.
+in \cite{rowston03controlloingreliability}. Authors propose techniques for
self-tuning, dealing with
+uncommon conditions (e.g., network partition and high failure rates).
Moreover, authors argue that
+with these techniques, the concerns over tightly structured overlay
maintenance costs are no more
+an open issue.

Also, query and routing hot spots may be an issue in tightly structured
overlays \cite{ratnasamy02routing}.
Hot spots happen, when a specific key is being requested extremely often in
tightly structured overlays. Recent study
by Freedman et al. tries to reduce hot spots in the system by performing
\emph{sloppy} hashing
\cite{sloppy:iptps03}. Authors' technique is especially suitable for the DOLR
abstraction of tightly structured overlays.
-With Sloppy hashing, we are able to reduce the generation of query hot spots.
Sloppy hashing enables to
+They arque that with Sloppy hashing, the generation of query hot spots can be
reduces and peers are able
locate nearby data without looking up data from distant peers. Moreover,
authors'
proposal for self-organizing clusters using network diameters may be useful,
-especially within small groups of working people. Thus, with Sloppy hashing
-we can provide locality properties the system.
-
-
-
-As mentioned before, an implicit assumption of almost every tightly structured
system is that there is a random, uniform
-distribution of peer and key identifiers. Even if participating peers are
extremely heterogeneous, e.g., in
-computing power or network bandwidth, all data items are distributed
uniformly. Clearly, this is
-a serious problem of tightly structured overlays in face of performance and
-by Saroiu et al. show that there is a extreme heterogeneity among
participating peers in already deployed Peer-to-Peer
-systems \cite{saroiu02measurementstudyp2p}. Symphony \cite{gurmeet03symphony}
seems to be the first tightly structured overlay system
-which supports heterogeneity. Zhao et al. have proposed a secondary layer on
top of a structured overlay
+especially within small groups of working people.

-Research has been done on self-organization. Ledlie et al. propose techniques
for forming and maintaining
-groups in a highly dynamic environment \cite{ledlie02selfp2p}. Unfortunately
their work relies on the idea that
-participating peers would create multiple hierarchical groups. It is not clear
whether this approach
-is fault tolerant and suitable for Peer-to-Peer environment. More promising
work has been done by Rowston et al.
-in \cite{rowston03controlloingreliability}. Authors propose techniques for
self-tuning, dealing with
-uncommon conditions (e.g., network partition and high failure rates).
Moreover, authors argue that
-with these techniques, the concerns over tightly structured overlay
maintenance costs are no more
-an open issue.
+The concept of ''half-life'' was introduced by Liben-Nowell
\cite{libennowell01observations} since Peer-to-Peer
+system is \emph{never} in the ''ideal'' state as Peer-to-Peer system is
continiously evolving system. Half-life is defined
+as follows: let there be $N$ live peers at time $t$. The doubling from time
$t$ is the time that pass before
+$N$ new additional peers arrive into the system. The halving time from time
$t$ is the time
+required for half of the living peers at time $t$ to leave the system. The
half-life from
+time $t$ is smaller of the properties stated above. The half-life of the
entire system is the
+minimum half-life over all times $t$. Concept of half-time can be used as a
basis for developing
+more powerful analytical tools for modelling complex Peer-to-Peer systems.

-Finally, little research has been done regarding self-monitoring and data
availability. Zhang et al.
+Finally, little research has been done regarding self-monitoring. Zhang et al.
describe an arbitrary data structure on top of a tightly structured overlay
\cite{zhang03somo}. Authors
call their technique as a \emph{data overlay}, since it supports several
fundamental data structures.
Authors have used this data overlay when building a Self-Organized Meta data
Overlay (SOMO), which can be used