|
From: | Paul Sander |
Subject: | Re: Idea for reducing disk IO on tagging operations |
Date: | Sun, 20 Mar 2005 17:00:54 -0800 |
On Mar 20, 2005, at 3:54 PM, address@hidden wrote:
* Mark D. Baushke (address@hidden) wrote:-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Dr. David Alan Gilbert <address@hidden> writes:OK, if I create a dummy ",foo.c," before modifying (or create a hardlink with that name to foo.c,v ?) would that be sufficient?I would say that it is likely necessary, but may not be sufficient.Hmm ok.Or perhaps create the ,foo,c, as I normally would - but if I can use this overwrite trick on the original then I just delete the ,foo.c, file.I am unclear how this lets you perform a speedup.I only create the ,foo.c, file - I don't write anything into it; the existence of the file is enough to act as the RCS lock; if I can do myinplace modification then I delete this file after doing it, if not thenI proceed as normal and just write the ,foo.c, file and do the rename as you normally would.
You're forgetting something: The RCS commands will complete read-only operations on RCS files even in the presence of the comma files owned by other processes. Your update protocol introduces race conditions in which the RCS file is not self-consistent at all times.
There's also the interrupt issue: Killing an update before it completes leaves the RCS file corrupt. You'd have to build in some kind of crash recovery. But RCS already has that by way of the comma file, which can simply be deleted. Other crash recovery algorithms usually involve transaction logs that can be reversed and replayed, or the creation of backup copies. None of these are more efficient than the existing RCS update protocol.
So the issue is what happens if the interrupt occurs as I'm overwriting the white space to add a tag; hmm yes;Correct. Depending on the filesystem kind and the level of I/O, your rewrite could impact up to three fileblocks and the directory data.is it possible to guard against this by using a single call to write(2) for that?Not for all possible filesystem types.
You'd have to guarantee that the write is atomic and flushes results completely to disk, even in the presence of things like power failures. It's hard to make this guarantee given all the buffering that goes on below the write(2) API.
Optimizing for tagging does not seem very useful to me as we typically do not drop that many tags on our repository.In the company I work for we are very tag heavy, but more importantly it is the tagging that gets in peoples way and places the strain on the write bandwidth of the discs/RAID.
I once built a successful system that tracked desirable configurations by building lists of file/version pairs, then committing and tagging the lists. The lists were built by polling the Entries files in workspaces (and making sure there were no uncommitted changes). This was fast and efficient, and it opens you up to use the optimization I mentioned earlier. And if you rely on floating tags, such lists could track the history of the tags as well.
In addition, an algebra can be easily written to manipulate such lists. Combine this with a way to link these lists with your defect tracking system, and you have the tools to build a very good change control system.
--Paul Sander | "Lets stick to the new mistakes and get rid of the old
address@hidden | ones" -- William Brown
[Prev in Thread] | Current Thread | [Next in Thread] |