qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] [RFC V4 00/30] QCOW2 deduplication


From: Stefan Hajnoczi
Subject: Re: [Qemu-devel] [RFC V4 00/30] QCOW2 deduplication
Date: Fri, 4 Jan 2013 10:49:22 +0100
User-agent: Mutt/1.5.21 (2010-09-15)

On Thu, Jan 03, 2013 at 01:51:02PM -0600, Troy Benjegerdes wrote:
> On Thu, Jan 03, 2013 at 01:39:48PM +0100, Stefan Hajnoczi wrote:
> > On Wed, Jan 02, 2013 at 12:26:37PM -0600, Troy Benjegerdes wrote:
> > > The probability may be 'low' but it is not zero. Just because it's
> > > hard to calculate the hash doesn't mean you can't do it. If your
> > > input data is not random the probability of a hash collision is
> > > going to get scewed.
> > 
> > The cost of catching hash collisions is an extra read for every write.
> > It's possible to reduce this with a 2nd hash function and/or caching.
> > 
> > I'm not sure it's worth it given the extremely low probability of a hash
> > collision.
> > 
> > Venti is an example of an existing system where hash collisions were
> > ignored because the probability is so low.  See 3.1. Choice of Hash
> > Function section:
> > 
> > http://plan9.bell-labs.com/sys/doc/venti/venti.html
> 
> 
> If you believe that it's 'extremely low', then please provide either:
> 
> * experimental evidence to prove your claim
> * an insurance underwriter who will pay-out if data is lost due to
> a hash collision.

Read the paper, the point is that if the probability of collision is so
extremely low, then it's not worth worrying about since other effects
are much more likely (i.e. cosmic rays).

The TCP/IP checksums are weak and not comparable to what Benoit is
using.

Stefan



reply via email to

[Prev in Thread] Current Thread [Next in Thread]