bug-tar
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Bug-tar] stat() on btrfs reports the st_blocks with delay (data los


From: Austin S. Hemmelgarn
Subject: Re: [Bug-tar] stat() on btrfs reports the st_blocks with delay (data loss in archivers)
Date: Wed, 6 Jul 2016 07:37:15 -0400
User-agent: Mozilla/5.0 (Windows NT 10.0; WOW64; rv:45.0) Gecko/20100101 Thunderbird/45.1.1

On 2016-07-05 05:28, Joerg Schilling wrote:
Andreas Dilger <address@hidden> wrote:

I think in addition to fixing btrfs (because it needs to work with existing
tar/rsync/etc. tools) it makes sense to *also* fix the heuristics of tar
to handle this situation more robustly.  One option is if st_blocks == 0 then
tar should also check if st_mtime is less than 60s in the past, and if yes
then it should call fsync() on the file to flush any unwritten data to disk,
or assume the file is not sparse and read the whole file, so that it doesn't
incorrectly assume that the file is sparse and skip archiving the file data.

A broken filesystem is a broken filesystem.

If you try to change gtar to work around a specific problem, it may fail in
other situations.
The problem with this is that tar is assuming things that are not guaranteed to be true. There is absolutely nothing that says that st_blocks has to be non-zero if there's data in the file. In fact, the behavior that BTRFS used to have of reporting st_blocks to be 0 for files entirely inlined in the metadata is absolutely correct given the description of the field by POSIX, because there _are_ no blocks allocated to the file (because the metadata block is technically equivalent to the inode, which isn't counted by st_blocks). This is yet another example of an old interface (in this case, sparse file detection) being short-sighted (read in this case as non-existent).

The proper fix for this is that tar (and anything else that handles sparse files differently) should be parsing the file regardless. It has to anyway for a normal sparse file to figure out where the sparse regions are, and optimizing for a file that's completely sparse (and therefore probably pre-allocated with fallocate) is not all that reasonable considering that this is going to be a very rare case in normal usage.




reply via email to

[Prev in Thread] Current Thread [Next in Thread]