[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Qemu-devel] [RFC V4 00/30] QCOW2 deduplication
From: |
Troy Benjegerdes |
Subject: |
Re: [Qemu-devel] [RFC V4 00/30] QCOW2 deduplication |
Date: |
Thu, 3 Jan 2013 13:51:02 -0600 |
User-agent: |
Mutt/1.5.20 (2009-06-14) |
On Thu, Jan 03, 2013 at 01:39:48PM +0100, Stefan Hajnoczi wrote:
> On Wed, Jan 02, 2013 at 12:26:37PM -0600, Troy Benjegerdes wrote:
> > The probability may be 'low' but it is not zero. Just because it's
> > hard to calculate the hash doesn't mean you can't do it. If your
> > input data is not random the probability of a hash collision is
> > going to get scewed.
>
> The cost of catching hash collisions is an extra read for every write.
> It's possible to reduce this with a 2nd hash function and/or caching.
>
> I'm not sure it's worth it given the extremely low probability of a hash
> collision.
>
> Venti is an example of an existing system where hash collisions were
> ignored because the probability is so low. See 3.1. Choice of Hash
> Function section:
>
> http://plan9.bell-labs.com/sys/doc/venti/venti.html
If you believe that it's 'extremely low', then please provide either:
* experimental evidence to prove your claim
* an insurance underwriter who will pay-out if data is lost due to
a hash collision.
What I have heard so far is a lot of theoretical posturing and no
experimental evidence.
Please google for "when TCP checksums and CRC disagree" for experimental
evidence of problems assuming that probability is low. This is the
abstract:
"Traces of Internet packets from the past two years show that between 1 packet
in 1,100 and 1 packet in 32,000 fails the TCP checksum, even on links where
link-level CRCs should catch all but 1 in 4 billion errors. For certain
situations, the rate of checksum failures can be even higher: in one hour-long
test we observed a checksum failure of 1 packet in 400. We investigate why so
many errors are observed, when link-level CRCs should catch nearly all of
them.We have collected nearly 500,000 packets which failed the TCP or UDP or IP
checksum. This dataset shows the Internet has a wide variety of error sources
which can not be detected by link-level checks. We describe analysis tools that
have identified nearly 100 different error patterns. Categorizing packet
errors, we can infer likely causes which explain roughly half the observed
errors. The causes span the entire spectrum of a network stack, from memory
errors to bugs in TCP.After an analysis we conclude that the checksum will fail
to detect errors for roughly 1 in 16 million to 10 billion packets. From our
analysis of the cause of errors, we propose simple changes to several protocols
which will decrease the rate of undetected error. Even so, the highly
non-random distribution of errors strongly suggests some applications should
employ application-level checksums or equivalents."
- Re: [Qemu-devel] [RFC V4 00/30] QCOW2 deduplication, (continued)
- Re: [Qemu-devel] [RFC V4 00/30] QCOW2 deduplication, Benoît Canet, 2013/01/02
- Re: [Qemu-devel] [RFC V4 00/30] QCOW2 deduplication, Troy Benjegerdes, 2013/01/02
- Re: [Qemu-devel] [RFC V4 00/30] QCOW2 deduplication, Benoît Canet, 2013/01/02
- Re: [Qemu-devel] [RFC V4 00/30] QCOW2 deduplication, ronnie sahlberg, 2013/01/02
- Re: [Qemu-devel] [RFC V4 00/30] QCOW2 deduplication, Benoît Canet, 2013/01/02
- Re: [Qemu-devel] [RFC V4 00/30] QCOW2 deduplication, Troy Benjegerdes, 2013/01/02
- Re: [Qemu-devel] [RFC V4 00/30] QCOW2 deduplication, ronnie sahlberg, 2013/01/02
- Re: [Qemu-devel] [RFC V4 00/30] QCOW2 deduplication, Stefan Hajnoczi, 2013/01/03
- Re: [Qemu-devel] [RFC V4 00/30] QCOW2 deduplication,
Troy Benjegerdes <=
- Re: [Qemu-devel] [RFC V4 00/30] QCOW2 deduplication, Dietmar Maurer, 2013/01/04
- Re: [Qemu-devel] [RFC V4 00/30] QCOW2 deduplication, Stefan Hajnoczi, 2013/01/04
Re: [Qemu-devel] [RFC V4 00/30] QCOW2 deduplication, Benoît Canet, 2013/01/03