Re: [rdiff-backup-users] --check-destination-dir taking a very long time

rdiff-backup-users

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [rdiff-backup-users] --check-destination-dir taking a very long time

From:	Joe Steele
Subject:	Re: [rdiff-backup-users] --check-destination-dir taking a very long time
Date:	Tue, 10 Sep 2019 15:25:12 -0400
User-agent:	Mozilla/5.0 (Windows NT 6.1; WOW64; rv:60.0) Gecko/20100101 Thunderbird/60.9.0

On 9/9/2019 9:53 PM, Walt Mankowski wrote:

I found a file named

   rdiff-backup-data/current_mirror.2019-09-08T03:01:02-04:00.data

which contained

   4351

I moved it out of the way and reran the backup command. This time it
through an exception. The output is in the attached log file.


(Some of the following echoes what Eric Lavarde wrote a few minutes ago.)

Moving a current_mirror file out of the way is never a good thing to do.Having 2 current_mirror files is how rdiff-backup knows that the lastbackup failed and that a regression is necessary in order to reestablisha consistent state for the backup repository.

Fortunately, it looks as though your attempt to run another backup afterremoving the current_mirror file did not get anywhere (based on your log).

I suggest putting the 'current_mirror.2019-09-08T03:01:02-04:00.data'back in place (and possibly restart systemd-resolved, as commentedfurther below). After that, I would look to see what current mirrorfiles you now have. My guess is that you will find the following:


current_mirror.2019-09-07T03:01:01-04:00.data
current_mirror.2019-09-08T03:01:02-04:00.data
current_mirror.2019-09-09T21:46:29-04:00.data

9/7/19 is your last good backup. 9/8 was the backup that failed. 9/9was your most recent attempt to fix things.

*Assuming* that I am correct about the current_mirror files that exist,then I would remove the last of those files(current_mirror.2019-09-09T21:46:29-04:00.data). Yes, that's contraryto my admonition above. But rdiff-backup cannot deal with 3 such files,and this last file is from your most recent backup that did not getanywhere, according to your log.

I would then again try 'rdiff-backup --check-destination-dir' (and crossyour fingers).

Your original concern was that this was taking forever (12+ hours andcounting). For what it is worth, my experience is that regressions dotake many hours (depending on size of your current mirror), and theyleave you wondering if anything is actually happening.

It seems like 296 GB would take me 4-8 hours to regress (I can't reallyremember -- it's been a while). If your backup is 527 GB (i.e., that'swhat shows up for 'MirrorFileSize' in your session_statistics.* files),then yes, I imagine that would take quite some time to regress. Thereare probably other factors besides size that affect the speed -- diskspeed, processor speed, load, etc. I don't know if rdiff-backup loggingverbosity is a factor or not -- I would think that it might be a factor.

None of the above addresses your problem with "No space left on device".I would try to restore your repository to a consistent state beforeinvestigating that further. (Of course, the real frustrating thing isthat if the backup fails again, you are forced to wait many hours whileyou repeat the regression of the failed backup.)


<snip>

On Mon, Sep 09, 2019 at 08:17:04PM -0400, Walt Mankowski wrote:

I ran

   $ sudo rdiff-backup -v9 --print-statistics --exclude-filelist 
/usr/local/etc/rdiff_exclude / /backup/scruffy 2>&1 | tee rdiff-backup.txt

This time it exited right away. I've attached the log file, where the
key message is

   Fatal Error: It appears that a previous rdiff-backup session with
   process id 4351 is still running.

Process 4351 is /lib/systemd/systemd-resolved

It would seem that you had a bit of bad luck in that a process ID thathad been used for a crashed rdiff-backup session happened to now be inuse again for an unrelated process (systemd-resolved).

Is it safe to rerun it with --force?

Using --force would have gotten around the Fatal Error, but it wouldhave also forced other things to happen that you may not want. In thisinstance, I would have probably restarted systemd-resolved so that itused a different PID. That should have gotten rdiff-backup past thatparticular error.


<snip>

On Mon, Sep 9, 2019 at 7:47 PM Walt Mankowski <address@hidden> wrote:

On Mon, Sep 09, 2019 at 07:38:52PM -0400, Patrik Dufresne wrote:

Hum, this is strange. It should not fail with a "no space left on

device".

Agreed! That's why I originally thought it must have been some sort of
USB glitch.

Could you provide the log generate with -v9 ? Plz provide the full

command

line you used.


So kill the run with -v8?

What is the filesystem of your USB drive ?


ext4

If you try to run the backup again do you have an error?


In fact that happened last night. My normal nightly backup kicked in
while a previous attempt at running --check-destination-dir was still
running. The cronjob reported:

   Previous backup seems to have failed, regressing destination now.
   Fatal Error: Killed with signal 15

The latter was when I killed it when I woke up and saw that both of
them were running.

That's interesting. That points out that rdiff-backup does not check ifa regression is already in progress before starting another one. Thatneeds fixing.


--Joe

[Prev in Thread]

Current Thread

[Next in Thread]

Re: [rdiff-backup-users] --check-destination-dir taking a very long time, (continued)

Prev by Date: Re: [rdiff-backup-users] --check-destination-dir taking a very long time
Next by Date: Re: [rdiff-backup-users] --check-destination-dir taking a very long time
Previous by thread: Re: [rdiff-backup-users] --check-destination-dir taking a very long time
Next by thread: Re: [rdiff-backup-users] --check-destination-dir taking a very long time
Index(es):
- Date
- Thread