[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Duplicity-talk] Fwd: AssertionError on every attempt

From: Kenneth Loafman
Subject: Re: [Duplicity-talk] Fwd: AssertionError on every attempt
Date: Thu, 11 Jun 2015 09:35:09 -0500


Been thinking along the lines of SQLite as an object store for the manifest mainly.  Keeping reference ids in the manifest along with some limited metadata on each file would make the manifest much larger, but would make it much more useful.  I'm not sure it's worth the change for just one backend though.  We normally require that a backend stand alone without support from the main system.  This is moving into unchartered territory, so let's discuss this before we go too far.

As to where the cache lives, the default is based on the XDG standards for cache and config homes (see config.py).  I'm sure there is a data standard as well.  You'd need that if you wanted to keep a database local (clearing the cache is a common debug exercise).

I'm working on 0.8 with the new librsync hash code.  The new librsync doubles the size of the signature which is already huge to something nearly impossible, so now the split at X size is mandatory.  While I'm in there I'm going to rework the manifest and make it more useful.  SQLite is part of the plan, assuming it can be limited in size.

So, let's communicate and hash this thing out.

Everyone, please jump in anytime you want.


On Wed, Jun 10, 2015 at 1:10 PM, Bruce Merry <address@hidden> wrote:
On 10 June 2015 at 16:16, Tim Fletcher <address@hidden> wrote:
> I suspect that this is due to the Google storage back-end having difference
> constancy guarantees for single objects vs directory listings.
> See https://cloud.google.com/storage/docs/concepts-techniques#consistency

Thanks, that's an interesting link. It's describing Cloud rather than
Drive, but it wouldn't surprise me if Drive is similar i.e. an object
store with a filesystem duct-taped on.

That makes me think that maximum robustness would be achieved by
having duplicity reference IDs internally and only use filenames for
presentation to the user. That sounds like it would need major
architectural changes though, since the list of IDs forming a backup
set would need to be recorded as part of the backup, instead of being
discovered from a directory listing.

A halfway point might be to have the client keep its own filename<->ID
cache in the Duplicity cache directory. Operations would need to query
the object by ID to validate the cache entry, but I think this would
allow for strong consistency in cases where the same client is doing
the accesses (as is the case when an upload is immediately followed by
a query - different clients are more likely to be separated in time).

I can probably have a go at implementing that the next week or two.
Are there helper functions I should look at for the backend to
discover where the cache directory for the backup lives? And any
preferences for the format of the cache file? My personal inclination
would be to go for sqlite to get all the nice safety guarantees that
gives over just a pickle/yaml/json/xml/whatever file, but that would
introduce a dependency.

Dr Bruce Merry
bmerry <@> gmail <.> com

Duplicity-talk mailing list

reply via email to

[Prev in Thread] Current Thread [Next in Thread]