[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Gzz-commits] manuscripts/storm article.rst
From: |
Hermanni Hyytiälä |
Subject: |
[Gzz-commits] manuscripts/storm article.rst |
Date: |
Wed, 12 Feb 2003 08:24:41 -0500 |
CVSROOT: /cvsroot/gzz
Module name: manuscripts
Changes by: Hermanni Hyytiälä <address@hidden> 03/02/12 08:24:41
Modified files:
storm : article.rst
Log message:
Few radical reorgs for more logical roadmap
CVSWeb URLs:
http://savannah.gnu.org/cgi-bin/viewcvs/gzz/manuscripts/storm/article.rst.diff?tr1=1.133&tr2=1.134&r1=text&r2=text
Patches:
Index: manuscripts/storm/article.rst
diff -u manuscripts/storm/article.rst:1.133 manuscripts/storm/article.rst:1.134
--- manuscripts/storm/article.rst:1.133 Wed Feb 12 07:39:34 2003
+++ manuscripts/storm/article.rst Wed Feb 12 08:24:41 2003
@@ -148,14 +148,13 @@
usable yet.
This paper is structured as follows. In the next section, we describe
-related work. In section 3, we introduce the basic storage unit of our
-system, file-like blocks identified by cryptographic hashes.
-In section 4, we discuss our implementation of Xanalogical storage
-on top of the block system. In section 5, we discuss application-specific
-reverse indexing of blocks by their content, essential for many applications.
-In section 6, we present techiques for efficient versioned storage
-of mutable data on top of blocks. In section 7,
-we report on implementation experience and future directions.
+related work. In section 3, we give an overview of xanalogical model.
+In section 4, we introduce the basic storage unit of our
+system, i.e. file-like blocks identified by cryptographic hashes. In section
5,
+we discuss application-specific reverse indexing of blocks by their
+content, essential for many applications. In section 6, we present
+techiques for efficient versioned storage of mutable data on top of blocks.
+In section 7, we report on implementation experience and future directions.
Section 8 concludes the paper.
.. uml:: storm_layers
@@ -419,9 +418,77 @@
which are permanently on-line, but act as ordinary peers
in the indexing overlay network.
+
+3. Overview of Xanalogical storage
+==================================
+
+In the xanalogical storage model [ref],
+pioneered by the unfinished Project Xanadu [ref],
+links are not between documents, but individual characters.
+When a character is first typed in, it acquires a permanent id
+("the character 'D' typed by Janne Kujala on 10/8/97 8:37:18"),
+which it retains when copied to a different document, distinguishing
+it from all similar characters typed in independently [#]_.
+A link is shown between any two documents containing the characters
+that the link connects. Xanalogical links are external and bidirectional.
+
+.. [#] Xanalogical storage is not limited to text. We speak about
+ *characters* because it simplifies the explanation; picture's pixels
+ or frames of video could be substituted.
-3. Block storage
-================
+In addition to content links, xanalogical storage keeps an index of
+transclusions: identical characters copied into different documents.
+Through this mechanism, the system can show to the user all documents
+that share text with the current document.
+
+To keep track of links and transclusions, the system keeps a global index
+of documents by the characters they contain, and of links by the characters
+they refer to. Thus, for each character in the document, the system
+queries the index for other documents containing this character,
+and shows them as transclusions. Resolving links is a multi-step process.
+Each link is modeled as two collections of characters: the two
+endpoints of the link. To show links to a specific document,
+the system firstly uses the link index to find links
+to each character in the document. Secondly, for each link,
+it looks at the *other* set of characters in the link-- the target
+of the link, if the original character was the source, and vice versa.
+Thirdly, it looks for documents containing these target characters.
+This way, even if both the source and target characters of the link
+are moved to a different document, the link stays connected to them.
+
+Of course, doing any expensive operation for *every* character
+in a document does not scale very well. In practice,
+characters typed in consecutively are given consecutive ids,
+such as ``...:4``, ``...:5``, ``...:6`` and so on, and
+operations are on *spans*, i.e. consecutive ranges of characters
+(``...:4-6``) in a document. In Storm, in each editor session we
+create a block with all characters entered in this session (the content type
+being ``text/plain``). To designate a span of characters
+from that session, we use the block's id, the offset of the first
+character, and the number of characters in the span.
+This technique was first introduced in [ref ht02 paper].
+
+In Xanadu, characters are stored to append-only *scrolls*
+when they are typed [ref]. Because of this, in Storm, we call the
+blocks containing the actual characters *scroll blocks*. The documents
+do not actually contain the characters; instead, they are
+*virtual files* containing span references as described above.
+To show a document, the scroll blocks it references are loaded
+and the characters retrieved from there [#]_.
+
+.. [#] It is unclear whether this approach is efficient for text
+ in the Storm framework; in the future, we may try storing
+ the characters in the documents themselves, along with their
+ permanent identifiers. For images or video, on the other hand,
+ it is clearly beneficial if content appearing in different
+ documents-- or different versions of a document-- is only
+ stored once, in a block only referred to wherever
+ the data is transcluded.
+
+
+
+4. Storm block storage
+======================
In Storm, all data is stored
as *blocks*, byte sequences identified by a SHA-1
@@ -571,7 +638,7 @@
to overcome the limitations of traditional file-based applications.
-3.1. Implementation
+4.1. Implementation
-------------------
Storm blocks are MIME messages [ref MIME], i.e., objects with
@@ -641,114 +708,6 @@
the software (mutable documents are described in section 6.1).
-4. Xanalogical storage
-======================
-
-In the xanalogical storage model [ref],
-pioneered by the unfinished Project Xanadu [ref],
-links are not between documents, but individual characters.
-When a character is first typed in, it acquires a permanent id
-("the character 'D' typed by Janne Kujala on 10/8/97 8:37:18"),
-which it retains when copied to a different document, distinguishing
-it from all similar characters typed in independently [#]_.
-A link is shown between any two documents containing the characters
-that the link connects. Xanalogical links are external and bidirectional.
-
-.. [#] Xanalogical storage is not limited to text. We speak about
- *characters* because it simplifies the explanation; picture's pixels
- or frames of video could be substituted.
-
-In addition to content links, xanalogical storage keeps an index of
-transclusions: identical characters copied into different documents.
-Through this mechanism, the system can show to the user all documents
-that share text with the current document.
-
-To keep track of links and transclusions, the system keeps a global index
-of documents by the characters they contain, and of links by the characters
-they refer to. Thus, for each character in the document, the system
-queries the index for other documents containing this character,
-and shows them as transclusions. Resolving links is a multi-step process.
-Each link is modeled as two collections of characters: the two
-endpoints of the link. To show links to a specific document,
-the system firstly uses the link index to find links
-to each character in the document. Secondly, for each link,
-it looks at the *other* set of characters in the link-- the target
-of the link, if the original character was the source, and vice versa.
-Thirdly, it looks for documents containing these target characters.
-This way, even if both the source and target characters of the link
-are moved to a different document, the link stays connected to them.
-
-Of course, doing any expensive operation for *every* character
-in a document does not scale very well. In practice,
-characters typed in consecutively are given consecutive ids,
-such as ``...:4``, ``...:5``, ``...:6`` and so on, and
-operations are on *spans*, i.e. consecutive ranges of characters
-(``...:4-6``) in a document. In Storm, in each editor session we
-create a block with all characters entered in this session (the content type
-being ``text/plain``). To designate a span of characters
-from that session, we use the block's id, the offset of the first
-character, and the number of characters in the span.
-This technique was first introduced in [ref ht02 paper].
-
-In Xanadu, characters are stored to append-only *scrolls*
-when they are typed [ref]. Because of this, in Storm, we call the
-blocks containing the actual characters *scroll blocks*. The documents
-do not actually contain the characters; instead, they are
-*virtual files* containing span references as described above.
-To show a document, the scroll blocks it references are loaded
-and the characters retrieved from there [#]_.
-
-.. [#] It is unclear whether this approach is efficient for text
- in the Storm framework; in the future, we may try storing
- the characters in the documents themselves, along with their
- permanent identifiers. For images or video, on the other hand,
- it is clearly beneficial if content appearing in different
- documents-- or different versions of a document-- is only
- stored once, in a block only referred to wherever
- the data is transcluded.
-
-Our current implementation shows only links between documents
-that are in memory at the same time [screenshot of xupdf, perhaps ref too
-(submitted) antont: was thinking the same, it would illustrate this well].
-In the future, we will implement a global distributed index at top of
-a distributed hashtable, with the scroll blocks' ids as the keys.
-To find the transclusions of a span, the system will retrieve
-all transclusions of any span in the scroll block, then
-sort out those that do not overlap the span in question.
-
-Since the problem is to search for overlapping ranges,
-the spans cannot be used as hashtable keys. However, as the blocks
-will be relatively small (limited by the amount of text
-the user enters between two saves of a document), we hope
-that this will not be a major scalability problem. Otherwise,
-systems that allow range queries, such as skip graphs [ref]
-and skipnet [ref], may prove useful.
-
-.. [how does video work here, i.e. are they huge blocks or collections of many
- small, frames? or a sequence between two keyframes?]
-
- Benja says: Probably large, but the number of links to them
- still won't be that big, I guess. Not sure how to discuss this
- in the text; feel free to propose something :-)
-
- Toni: will try.. wondering also about who to consult :o
-
-.. [This might be relevant also:
- http://www.hpl.hp.com/techreports/2002/HPL-2002-209.pdf
- -Hermanni]
-
-One question raised by xanalogical storage is which links to show
-for a popular document that has been linked to by many users.
-We hope to address this problem by collaborative filtering
-of links [explain, ref (grouplens.org pubs?)]. There has been research on
-collaborative filtering in peer-to-peer systems
-without compromising participants' privacy [ref John Canny].
-For some purposes simple rules based on e.g. belonging to a group may be
-applicable as well: e.g. when working on a project with a project group, it
-may be beneficial for the members of the group to see other members'
-comments of articles etc.
-
-
5. Application-specific reverse indexing
========================================
@@ -1166,6 +1125,49 @@
Comments may be new entities(?) linking to it
+
+Our current implementation shows only links between documents
+that are in memory at the same time [screenshot of xupdf, perhaps ref too
+(submitted) antont: was thinking the same, it would illustrate this well].
+In the future, we will implement a global distributed index at top of
+a distributed hashtable, with the scroll blocks' ids as the keys.
+To find the transclusions of a span, the system will retrieve
+all transclusions of any span in the scroll block, then
+sort out those that do not overlap the span in question.
+
+Since the problem is to search for overlapping ranges,
+the spans cannot be used as hashtable keys. However, as the blocks
+will be relatively small (limited by the amount of text
+the user enters between two saves of a document), we hope
+that this will not be a major scalability problem. Otherwise,
+systems that allow range queries, such as skip graphs [ref]
+and skipnet [ref], may prove useful.
+
+.. [how does video work here, i.e. are they huge blocks or collections of many
+ small, frames? or a sequence between two keyframes?]
+
+ Benja says: Probably large, but the number of links to them
+ still won't be that big, I guess. Not sure how to discuss this
+ in the text; feel free to propose something :-)
+
+ Toni: will try.. wondering also about who to consult :o
+
+.. [This might be relevant also:
+ http://www.hpl.hp.com/techreports/2002/HPL-2002-209.pdf
+ -Hermanni]
+
+One question raised by xanalogical storage is which links to show
+for a popular document that has been linked to by many users.
+We hope to address this problem by collaborative filtering
+of links [explain, ref (grouplens.org pubs?)]. There has been research on
+collaborative filtering in peer-to-peer systems
+without compromising participants' privacy [ref John Canny].
+For some purposes simple rules based on e.g. belonging to a group may be
+applicable as well: e.g. when working on a project with a project group, it
+may be beneficial for the members of the group to see other members'
+comments of articles etc.
+
+
Open issue: Tree hashing, also possibly needed for (OceanStore-like?)
distribution of shares
- [Gzz-commits] manuscripts/storm article.rst, (continued)
- [Gzz-commits] manuscripts/storm article.rst, Tuomas J. Lukka, 2003/02/10
- [Gzz-commits] manuscripts/storm article.rst, Tuomas J. Lukka, 2003/02/10
- [Gzz-commits] manuscripts/storm article.rst, Hermanni Hyytiälä, 2003/02/10
- [Gzz-commits] manuscripts/storm article.rst, Hermanni Hyytiälä, 2003/02/10
- [Gzz-commits] manuscripts/storm article.rst, Hermanni Hyytiälä, 2003/02/10
- [Gzz-commits] manuscripts/storm article.rst, Toni Alatalo, 2003/02/11
- [Gzz-commits] manuscripts/storm article.rst, Toni Alatalo, 2003/02/12
- [Gzz-commits] manuscripts/storm article.rst, Toni Alatalo, 2003/02/12
- [Gzz-commits] manuscripts/storm article.rst, Hermanni Hyytiälä, 2003/02/12
- [Gzz-commits] manuscripts/storm article.rst, Toni Alatalo, 2003/02/12
- [Gzz-commits] manuscripts/storm article.rst,
Hermanni Hyytiälä <=
- [Gzz-commits] manuscripts/storm article.rst, Hermanni Hyytiälä, 2003/02/12
- [Gzz-commits] manuscripts/storm article.rst, Toni Alatalo, 2003/02/12
- [Gzz-commits] manuscripts/storm article.rst, Toni Alatalo, 2003/02/12
- [Gzz-commits] manuscripts/storm article.rst, Hermanni Hyytiälä, 2003/02/13
- [Gzz-commits] manuscripts/storm article.rst, Toni Alatalo, 2003/02/13
- [Gzz-commits] manuscripts/storm article.rst, Toni Alatalo, 2003/02/13
- [Gzz-commits] manuscripts/storm article.rst, Toni Alatalo, 2003/02/13
- [Gzz-commits] manuscripts/storm article.rst, Hermanni Hyytiälä, 2003/02/13
- [Gzz-commits] manuscripts/storm article.rst, Toni Alatalo, 2003/02/13
- [Gzz-commits] manuscripts/storm article.rst, Hermanni Hyytiälä, 2003/02/14