[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[rdiff-backup-users] Re: Millions of files

From: Troels Arvin
Subject: [rdiff-backup-users] Re: Millions of files
Date: Fri, 27 May 2005 21:42:36 +0200
User-agent: Pan/ (As She Crawled Across the Table)

On Thu, 28 Apr 2005 18:46:46 -0700, Ben Escoto wrote:
>> > rsync seems to have trouble with jobs operating on millions of files:
>> > http://lists.samba.org/archive/rsync/2003-October/007546.html
>> > 
>> > If that's the case, is it also automatically the case for
>> > rdiff-backup? Does rdiff-backup create a complete list of files to be
>> > backed up before working?
> No, rdiff-backup does not have this problem :-P rdiff-backup only makes
> one pass, so memory should not increase linearly with the number of
> files.

A new backup setup I'm working on isn't completely finished yet. But one
backup job with >13mill files (totalling 233GB, lasting 45 hours) ran
fine. Nice.

233GB in 45 hours is around 1½ MB/sec if I'm calculating right. I'm not
sure if I'm satisfied with that. Both harddisks and modern LAN-networks
should (in principle) be able to sustain higher throughput.(?)

I'm wondering what the most significant bottleneck(s) in a setup like the
following might be? (Servers running mostly Linux, mostly using reiserfs3
file systems.)

*Backup server (Single, relatively modern P4 CPU, 1GB RAM)
-  8 Disks in RAID5 (not sure about specs)
   (no write caching)
-  Storage controller: Adaptec Serial ATA RAID 2810SA
   (no write caching)
-  Python+librsync+rdiff-backup
-  sshd
---Gigabit network hardware and wires-----
-  ssh
-  Python+librsync+rdiff-backup
-  Disks
*Production server (Various Intel-like CPUs); sometimes
 doing hard work.

One thing to note is that no swapping is happening on the backup server.

Of course, I can (and probably will) measure various parameters myself,
but I think it would be interesting to hear what others may hypothesize.
As (hopefully) illustrated, the backup server sucks data from the
productions servers via SSH. Several hosts are backed up in parallel.

I'm thinking:
 - Could SSH's crypto work be significant, or is it really peanuts
   for today's fast CPUs?
 - Is rdiff-backup performing calculations where Python's
   slowness could be a problem?

> rdiff-backup and rsync use completely different protocols, and they
> don't really share any code.

How about librsync? - Isn't that code shared between rdiff-backup and

Greetings from Troels Arvin, Copenhagen, Denmark

reply via email to

[Prev in Thread] Current Thread [Next in Thread]