bug-tar
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Bug-tar] stat() on btrfs reports the st_blocks with delay (data los


From: Austin S. Hemmelgarn
Subject: Re: [Bug-tar] stat() on btrfs reports the st_blocks with delay (data loss in archivers)
Date: Wed, 6 Jul 2016 11:12:24 -0400
User-agent: Mozilla/5.0 (Windows NT 10.0; WOW64; rv:45.0) Gecko/20100101 Thunderbird/45.1.1

On 2016-07-06 10:53, Joerg Schilling wrote:
Antonio Diaz Diaz <address@hidden> wrote:

Joerg Schilling wrote:
POSIX requires st_blocks to be != 0 in case that the file contains data.

Please, could you provide a reference? I can't find such requirement at
http://pubs.opengroup.org/onlinepubs/9699919799/basedefs/sys_stat.h.html

        blkcnt_t st_blocks      Number of blocks allocated for this object.

It should be obvious that a file that offers content also has allocated blocks.
What you mean then is that POSIX _implies_ that this is the case, but does not say whether or not it is required. There are all kinds of counterexamples to this too, procfs is a POSIX compliant filesystem (every POSIX certified system has it), yet does not display the behavior that you expect, every single file in /proc for example reports 0 for both st_blocks and st_size, and yet all of them very obviously have content.

Blocks are "allocated" when the OS decides whether the new data will fit on the
medium. The fact that some filesystems may have data in a cache but not yet on
the medium does not matter here. This is how UNIX worked since st_block has
been introduced nearly 40 years ago.
Tradition is the corpse of wisdom. Backwards comparability is a problem just as much as a good thing.

In all seriousness though, this started out because stuff wasn't cached to anywhere near the degree it is today, and there was no such thing as delayed allocation. When you said to write, the filesystem allocated the blocks, regardless of when it actually wrote the data. IOW, the behavior that GNU tar is relying on is an implementation detail, not an API. Just like df, this breaks under modern designs, not because they chose to break it, but because it wasn't designed for use with such implementations.

In the case of tar and similar things though, I'd argue that it's not sensible to special case files that are 'sparse', it should store any long enough run of zeroes as a sparse region, then provide an option to say to not make those files sparse when restored.

A new filesystem cannot introduce new rules just because people believe it would
save time.
Saying the file has no blocks when there are no blocks allocated for it is not to 'save time', it's absolutely accurate. Suppose SVR4 UFS had a way to pack file data into the inode if it was small enough. In that case, it woulod be perfectly reasonable to return 0 for st_blocks because the inode table in UFS is a fixed pre-allocated structure, and therefore nothing is allocated to the file itself except the inode. The same applies in the case of a file packed into it's own metadata block on BTRFS, nothing is allocated to that file beyond the metadata block it has to have to store the inode. In the case of delayed allocation where the file hasn't been flushed, there is nothing allocated, so st_blocks based on a strict interpretation of it's description in POSIX _should_ be 0, because nothing is allocated yet.



reply via email to

[Prev in Thread] Current Thread [Next in Thread]