recently my 'rdiff-backup' developed a weird problem, where it 'hangs'
when backing up a certain file on one server. Other 40+ servers are OK,
it's just that one and even that is only happening since May 3rd....
I can 'cat' the file on the originating server, I can also 'scp' it on
the backup server - there is no problem, no error with that. However -
when 'rdiff-backup' gets to this file, it just 'hangs' and does nothing.
On the backup server I see the file 'rdiff-backup.tmp.22397' which seems
the be a partially transferred original file (524288 bytes vs. 785592
bytes of the original file).
If I 'strace' the 'python' process on the backup server, I get this:
# strace -p 7343
Process 7343 attached - interrupt to quit
read(5, ^C <unfinished ...>
Process 7343 detached
If I strace the 'ssh' process', I get this:
# strace -p 7344
Process 7344 attached - interrupt to quit
select(7, [3 4], , NULL, NULL^C <unfinished ...>
Process 7344 detached
And that's all, there is nothing else going on even if I leave 'strace'
open for 30 minutes...
And if I 'strace' the 'python' process on the originating server, I get
# strace -p 20518
Process 20518 attached - interrupt to quit
read(3, <unfinished ...>
strace: ptrace(PTRACE_CONT,1,133): Input/output error
Process 20518 detached
After that the process state in 'ps' changes from 'Ss' to 'Ts'
(stopped). I can change it back to 'Ss' with 'kill -CONT', but it still
doesn't do anything.
The weird thing is that it ALWAYS happens on the same file, but there is
seemingly nothing wrong with that particular file...
Any ideas? What else is there to try and get more clues?
PS: OS of the backup server is OpenSuSE 11.1 (32 bit), OS of the
'backed-up' server is CentOS release 5.2 (64 bit). Rdiff-backup version
on both is 1.2.8. I also tried removing 'rdiff-backup-data' to start all
over, but it didn't help.
rdiff-backup-users mailing list at address@hidden
Wiki URL: http://rdiff-backup.solutionsfirst.com.au/index.php/RdiffBackupWiki