2009/8/9 Gabriel Ambuehl
<address@hidden>
On 9.8.09 David Stanaway wrote:
> EG: I have a logfile which gets rotated to logfile.1 - that is the same
> as logile in the previous backup, I don't need to send it again.
> EG: I have some family pics that got emailed to me in my Family Maildir,
> I fwd the email to someone else. The mimeenc attachment data is the
> same. I haven't tested this, but I would think you had a solid archive
> file (tar or fs dump) thhat this kind of duplications of data would drop
> out.
I would assume that these would get compressed away but only if you had a
really giant compression dictionary?
<wild half-baked idea>
fingerprint all the files, and then when it comes to storage of the file, you only store the same fingerprinted file once.
so if you have 5 copies of a file, or the file moves around, then its only backed up once.
as for log files, that could be dealt with nicer if (eg) the
fingerprints were done in chunks. that way the first half of a log
file would only be backed up once.
</wild half-baked idea>
<problems>
first one would be correctly checking for hash-collisions, so two
different chunks of data that coincidentally share the same fingerprint
don't only get half backed up
</problems>