rdiff-backup-users
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [rdiff-backup-users] --check-destination-dir taking a very long time


From: Joe Steele
Subject: Re: [rdiff-backup-users] --check-destination-dir taking a very long time
Date: Tue, 10 Sep 2019 15:25:12 -0400
User-agent: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:60.0) Gecko/20100101 Thunderbird/60.9.0

On 9/9/2019 9:53 PM, Walt Mankowski wrote:
I found a file named

   rdiff-backup-data/current_mirror.2019-09-08T03:01:02-04:00.data

which contained

   4351

I moved it out of the way and reran the backup command. This time it
through an exception. The output is in the attached log file.


(Some of the following echoes what Eric Lavarde wrote a few minutes ago.)

Moving a current_mirror file out of the way is never a good thing to do. Having 2 current_mirror files is how rdiff-backup knows that the last backup failed and that a regression is necessary in order to reestablish a consistent state for the backup repository.

Fortunately, it looks as though your attempt to run another backup after removing the current_mirror file did not get anywhere (based on your log).

I suggest putting the 'current_mirror.2019-09-08T03:01:02-04:00.data' back in place (and possibly restart systemd-resolved, as commented further below). After that, I would look to see what current mirror files you now have. My guess is that you will find the following:

current_mirror.2019-09-07T03:01:01-04:00.data
current_mirror.2019-09-08T03:01:02-04:00.data
current_mirror.2019-09-09T21:46:29-04:00.data

9/7/19 is your last good backup. 9/8 was the backup that failed. 9/9 was your most recent attempt to fix things.

*Assuming* that I am correct about the current_mirror files that exist, then I would remove the last of those files (current_mirror.2019-09-09T21:46:29-04:00.data). Yes, that's contrary to my admonition above. But rdiff-backup cannot deal with 3 such files, and this last file is from your most recent backup that did not get anywhere, according to your log.

I would then again try 'rdiff-backup --check-destination-dir' (and cross your fingers).

Your original concern was that this was taking forever (12+ hours and counting). For what it is worth, my experience is that regressions do take many hours (depending on size of your current mirror), and they leave you wondering if anything is actually happening.

It seems like 296 GB would take me 4-8 hours to regress (I can't really remember -- it's been a while). If your backup is 527 GB (i.e., that's what shows up for 'MirrorFileSize' in your session_statistics.* files), then yes, I imagine that would take quite some time to regress. There are probably other factors besides size that affect the speed -- disk speed, processor speed, load, etc. I don't know if rdiff-backup logging verbosity is a factor or not -- I would think that it might be a factor.

None of the above addresses your problem with "No space left on device". I would try to restore your repository to a consistent state before investigating that further. (Of course, the real frustrating thing is that if the backup fails again, you are forced to wait many hours while you repeat the regression of the failed backup.)

<snip>

On Mon, Sep 09, 2019 at 08:17:04PM -0400, Walt Mankowski wrote:
I ran

   $ sudo rdiff-backup -v9 --print-statistics --exclude-filelist 
/usr/local/etc/rdiff_exclude / /backup/scruffy 2>&1 | tee rdiff-backup.txt

This time it exited right away. I've attached the log file, where the
key message is

   Fatal Error: It appears that a previous rdiff-backup session with
   process id 4351 is still running.

Process 4351 is /lib/systemd/systemd-resolved


It would seem that you had a bit of bad luck in that a process ID that had been used for a crashed rdiff-backup session happened to now be in use again for an unrelated process (systemd-resolved).

Is it safe to rerun it with --force?


Using --force would have gotten around the Fatal Error, but it would have also forced other things to happen that you may not want. In this instance, I would have probably restarted systemd-resolved so that it used a different PID. That should have gotten rdiff-backup past that particular error.

<snip>

On Mon, Sep 9, 2019 at 7:47 PM Walt Mankowski <address@hidden> wrote:

On Mon, Sep 09, 2019 at 07:38:52PM -0400, Patrik Dufresne wrote:
Hum, this is strange. It should not fail with a "no space left on
device".

Agreed! That's why I originally thought it must have been some sort of
USB glitch.

Could you provide the log generate with -v9 ? Plz provide the full
command
line you used.

So kill the run with -v8?

What is the filesystem of your USB drive ?

ext4

If you try to run the backup again do you have an error?

In fact that happened last night. My normal nightly backup kicked in
while a previous attempt at running --check-destination-dir was still
running. The cronjob reported:

   Previous backup seems to have failed, regressing destination now.
   Fatal Error: Killed with signal 15

The latter was when I killed it when I woke up and saw that both of
them were running.


That's interesting. That points out that rdiff-backup does not check if a regression is already in progress before starting another one. That needs fixing.

--Joe




reply via email to

[Prev in Thread] Current Thread [Next in Thread]