[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[rdiff-backup-users] Memory benchmarks and some questions

From: David
Subject: [rdiff-backup-users] Memory benchmarks and some questions
Date: Fri, 9 May 2008 11:34:45 +0200

Hi list.

With regards to my previous mails (memory usage problems, possible
hardlink bug issues), I ran some benchmarks, and have some questions
which hopefully someone (hopefully a developer) can answer.

Here are the steps I followed (and which can be re-produced by
interested parties) for the benchmarks:

1) Make temporary directory structure:

mkdir -p /tmp/test_rdiffbackup/other_server_rsync
mkdir -p /tmp/test_rdiffbackup/rsync_tmp_local
mkdir -p /tmp/test_rdiffbackup/bkp_store

Summary of dirs:
 - other_server_rsync - represents files on another server, only
accessible by rsync login
 - rsync_tmp_local - temporary staging area, where we rsync files into
before running rdiff-backup
 - bkp_store - local rdiff-backup store (with a rdiff-backup-data
directory, etc)

2) Make 10,000 temporary files on the 'other sever'

# First declare a function, because we run this logic a few times
make_files() {
  while [ $i -lt 10000 ]; do
    mktemp -p /tmp/test_rdiffbackup/other_server_rsync
    let i++

# Now call the function

3) Rsync them from 'remote server' to rsync staging area

rsync -va /tmp/test_rdiffbackup/other_server_rsync/

4) Start (in another terminal) a command to measure rdiff-backup memory usage:

top -b -d 0.1 | grep rdiff-backup

5) Run rdiff-backup:

rdiff-backup /tmp/test_rdiffbackup/rsync_tmp_local

# Last 'top' line has this memory usage: 15988  11m 2676 (VIRT/RES/SHR)

6) Re-run 'server update & backup' cycle a few times, to get some
memory statistics for normal usage:

update_bkp() {
  rsync -va --delete /tmp/test_rdiffbackup/other_server_rsync/
  rdiff-backup /tmp/test_rdiffbackup/rsync_tmp_local

update_bkp # 20240  15m 2732 (VIRT/RES/SHR)
update_bkp # 20396  16m 2732 - Increased by 156kb
update_bkp # 21176  16m 2732 - Increased by 780kb
update_bkp # 21276  16m 2732 - Increased by 100kb
update_bkp # 21612  17m 2732 - Increased by 336kb

So, it looks like there is a (rougly) linear increase in memory usage
when the number of files increases, between 100 and 780 kb per 10,000

A few questions at this point:

a) Is this normal?

b) Won't this cause problems when backing up huge (millions+) files to
a memory-limited backup server?

c) Is it possible for rdiff-backup to use Python generator functions
instead of lists, to keep memory usage down?

7) Next test - using hardlinks to limit disk usage on the backup server.

If the /tmp/test_rdiffbackup/bkp_store directory is massive, then we
don't want to use up the same amount of space under
/tmp/test_rdiffbackup/rsync_tmp_local. The most obvious solution to
this is to use hardlinks between unchanged files.

So, let's update our 'update_bkp()' function:

update_bkp() {
  mkdir -p /tmp/test_rdiffbackup/rsync_tmp_local
  rsync -va --link-dest=/tmp/test_rdiffbackup/bkp_store/
/tmp/test_rdiffbackup/rsync_tmp_local/ --exclude=/rdiff-backup-data
  rsync -va --delete /tmp/test_rdiffbackup/other_server_rsync/
  rdiff-backup /tmp/test_rdiffbackup/rsync_tmp_local
  rm -rf /tmp/test_rdiffbackup/rsync_tmp_local

8) Re-run the test function a few times, and gather stats from the
other terminal:

# First, clear out the directories, to speed up rdiff-backup for these
tests (otherwise it takes a *long* time to finish)

rm -rvf /tmp/test_rdiffbackup/other_server_rsync /tmp/test_rdiffbackup/bkp_store
mkdir /tmp/test_rdiffbackup/other_server_rsync /tmp/test_rdiffbackup/bkp_store

# Next, run the tests and monitor memory usage:

update_bkp # 15972  11m 2704 (VIRT/RES/SHR)
update_bkp # 33748  29m 2760 - Increased by 17,776kb (First backup
where history already existed)
update_bkp # 34524  30m 2760 - Increased by 700kb
update_bkp # 36036  31m 2760 - Increased by 1,512kb
update_bkp # 42336  37m 2760 - Increased by 6,300kb (Also, took a lot
longer to run than the previous backups)
update_bkp # 45896  41m 2760 - Increased by 3,560kb

>From the above stats it looks like rdiff-backup uses an extra 1-6 MB
per 10,000 files when hardlinks are involved.

Some questions at this point:

d) Is an extra 3-6 MB really needed per extra 10,000 files (with
hardlinks)? That's 300-600 bytes per file. Aren't there more efficient
structures that can be used? How does rsync handle hard-link

e) Does rdiff-backup really need to use it's hardlink-handling logic
in this case? None of the files under the store are hardlinked to each
other. I assume this is happening because the hardlink count per file
is greater than 1

f) Why does rdiff-backup go so much slower when hardlinks are involved?

g) Could rdiff-backup get a new option (eg: --min-hard-link-count)
which sets when hardlink logic will activate? The default will be 2,
but for cases like this users could use 3 instead.


reply via email to

[Prev in Thread] Current Thread [Next in Thread]