duplicity-talk
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Duplicity-talk] Incremental backup when data changes but timestamp


From: Kenneth Loafman
Subject: Re: [Duplicity-talk] Incremental backup when data changes but timestamp does not
Date: Sun, 14 May 2023 10:53:33 -0500

Nate,

I agree with @ede on this one.  It's too much of an edge case to mess with.  Why someone would put out a different package with files of the same name, size, and mtime smells of malware to me. I'd be suspicious of the package and not even try to back it up.  What is it from / for?

...Ken


On Sun, May 14, 2023 at 9:56 AM edgar.soldin--- via Duplicity-talk <duplicity-talk@nongnu.org> wrote:
On 14.05.2023 07:31, Nate Eldredge via Duplicity-talk wrote:
> Returning to a thread from many years ago (https://lists.gnu.org/archive/html/duplicity-talk/2013-07/msg00015.html), I am looking for a way to do an incremental backup involving files whose data has changed but the timestamp, permissions and size stayed the same.

hi Nate :)

this sounds like a corner case. so firstly i'd really like to see examples of those files. could you provide those? maybe restore them from these backups of yours?

> This actually comes up in a real-life situation, not via some sort of deliberate timestamp abuse.  They're files from the same package in two different versions of Ubuntu.  I assume the packages were built simultaneously from the same source, but using different compiler versions, and the files for each one happened to be created within the same second.  So if you upgrade from one package to the other, the new version of the file is different, but has the same mtime and permissions and possibly even the same size.  Then `duplicity incremental` doesn't notice the change, and your backup stays with the old version.

rsync provides a `--checksum` parameter for that. but that of course is io-heavy as the file would have to be read in full to decide if there are changes. not sure if we already keep per-file-checksums in the meta-data.

> At one time I worked around this by hacking in a command-line option which causes ROPath.__eq__ (https://gitlab.com/duplicity/duplicity/-/blob/main/duplicity/path.py#L331) to always return 0.  Then every file is treated as "changed", and so the changes in question are picked up.  For those files that haven't actually changed, the rdiff is trivial, and so the only practical impact is that the backup takes a long time and you get a big new-signatures file, which I can live with.  For me it usually only happens when I upgrade OS versions, so I would run an incremental with this option at those times. (Or, I would bite the storage-space bullet and run a full backup even if I didn't otherwise need one.)

enforcing to treat every file as changed sound more reasonable compared to `--checksum`. it will read the file once too, but in this run will come up with the changes already. all the code would need to do is verfify, if there were changes and skip adding the result to a volume if there were none, not sure how intelligent the code is already in this regard.

> It'd be nice to have something more efficient and robust, though.  One thought would be to check whether the ctime is newer than the date of the previous backup.

i wonder why rsync does not use ctime by default though. there may be a reason for that. fs-standard of course mandates mod-time changes only when the file is changed. c-time is supposed to be fixed.

> We could also check the birth time on filesystems that support it.  We would get false positives in cases like replacing a disk and `cp -a`ing over all the files (which normally would preserve mtime but not ctime), but it could still be useful as an option.

that sounds fishy. i don't see how a containing filesystem change should trigger a recompare by default.

>
> I'm curious if anyone has other suggestions, or tips on how / where to implement them.

i'm still curious which files you come up with that equal in
- file name
- size
- mod time
but have a different content. not saying they do not exist, just saying it is a very rare phenomenon.

in summary, easiest way around would be a forced check, similarly as you hacked it. not sure how much effort it'd be to implement though.

sunny regards.. ede/duply.net


_______________________________________________
Duplicity-talk mailing list
Duplicity-talk@nongnu.org
https://lists.nongnu.org/mailman/listinfo/duplicity-talk

reply via email to

[Prev in Thread] Current Thread [Next in Thread]