rdiff-backup-users
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[rdiff-backup-users] [PATCH] Optimization for --check-destination


From: Josh Nisly
Subject: [rdiff-backup-users] [PATCH] Optimization for --check-destination
Date: Wed, 25 Jun 2008 20:24:03 +0600
User-agent: Thunderbird 2.0.0.14 (X11/20080505)

Actually, since the optimization affects determining whether the destination needs checking, it speeds up all backups.

What is happening is that we are going through the rdiff-backup-data directory, looking for current_mirror files. But for each file in the directory, we instantiate an RORPath object, which goes to the server to set file information. The problem is that it doesn't need any file information, since it works on the filename alone.

Since this involves the rdiff-backup-data directory, the size of the repository is irrelevant; it's the number of times it has been backed up that matters, and it is a linear slowdown.

What this patch does is factor out the logic of determining based on a filename whether a file is an increment or not, thus removing the need to go to the remote end for every file. It does go to the remote end for each file that matches, but there are only one or two matches most of the time.

I've been backing up to a repository daily for a year and a half, and over my 300ms latency link, it takes >30 minutes to run --check-destination, which runs for over 30 minutes, then says, "Fatal Error: Destination dir does not need checking." With this patch, it takes about one minute.

Thanks,
JoshN
--- rdiff_backup/rpath.py       10 Jun 2008 13:14:52 -0000      1.120
+++ rdiff_backup/rpath.py       25 Jun 2008 13:40:45 -0000
@@ -297,6 +300,26 @@
        assert rpath.conn is Globals.local_connection
        return open(rpath.path, "rb")
 
+def get_incfile_info(basename):
+       """Returns None or tuple of 
+       (is_compressed, timestr, type, and basename)"""
+       dotsplit = basename.split(".")
+       if dotsplit[-1] == "gz":
+               compressed = 1
+               if len(dotsplit) < 4: return None
+               timestring, ext = dotsplit[-3:-1]
+       else:
+               compressed = None
+               if len(dotsplit) < 3: return None
+               timestring, ext = dotsplit[-2:]
+       if Time.stringtotime(timestring) is None: return None
+       if not (ext == "snapshot" or ext == "dir" or
+                       ext == "missing" or ext == "diff" or ext == "data"):
+               return None
+       if compressed: basestr = ".".join(dotsplit[:-3])
+       else: basestr = ".".join(dotsplit[:-2])
+       return (compressed, timestring, ext, basestr)
+
 
 class RORPath:
        """Read Only RPath - carry information about a path
                Also sets various inc information used by the *inc* functions.
 
                """
-               if self.index: dotsplit = self.index[-1].split(".")
-               else: dotsplit = self.base.split(".")
-               if dotsplit[-1] == "gz":
-                       self.inc_compressed = 1
-                       if len(dotsplit) < 4: return None
-                       timestring, ext = dotsplit[-3:-1]
+               if self.index: basename = self.index[-1]
+               else: basename = self.base
+
+               inc_info = get_incfile_info(basename)
+
+               if inc_info:
+                       self.inc_compressed, self.inc_timestr, \
+                               self.inc_type, self.inc_basestr = inc_info
+                       return 1
                else:
-                       self.inc_compressed = None
-                       if len(dotsplit) < 3: return None
-                       timestring, ext = dotsplit[-2:]
-               if Time.stringtotime(timestring) is None: return None
-               if not (ext == "snapshot" or ext == "dir" or
-                               ext == "missing" or ext == "diff" or ext == 
"data"):
                        return None
-               self.inc_timestr = timestring
-               self.inc_type = ext
-               if self.inc_compressed: self.inc_basestr = 
".".join(dotsplit[:-3])
-               else: self.inc_basestr = ".".join(dotsplit[:-2])
-               return 1
 
        def isinccompressed(self):
                """Return true if inc file is compressed"""
--- rdiff_backup/restore.py     7 Jul 2007 22:43:34 -0000       1.60
+++ rdiff_backup/restore.py     25 Jun 2008 13:41:36 -0000
@@ -47,8 +64,10 @@
 
        inc_list = []
        for filename in parent_dir.listdir():
-               inc = parent_dir.append(filename)
-               if inc.isincfile() and inc.getincbase_str() == basename:
+               inc_info = rpath.get_incfile_info(filename)
+               if inc_info and inc_info[3] == basename:
+                       inc = parent_dir.append(filename)
+                       assert inc.isincfile()
                        inc_list.append(inc)
        return inc_list
 

reply via email to

[Prev in Thread] Current Thread [Next in Thread]