bug-grep
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: SEEK_HOLE defined but useless on linux-3.4+/ext4 [Re: small ascii fi


From: Jim Meyering
Subject: Re: SEEK_HOLE defined but useless on linux-3.4+/ext4 [Re: small ascii files can be sparse
Date: Tue, 31 Jul 2012 14:10:09 +0200

Jim Meyering wrote:
> Paul Eggert wrote:
>> On further thought, the heuristic is also incorrect for file
>> systems that compress their data.  So I installed this further
>> patch.
>>
>> Oh, well.  At least the code is simpler now.  Simple and slow
>> is better than complicated and fast and occasionally wrong.
> ...
>> Subject: [PATCH] grep: don't falsely report compressed text files as binary
>>
>> * NEWS: Document this.
>> * src/main.c (file_is_binary): Remove the heuristic based on
>> st_blocks, as it does not work for compressed file systems.
>> On Solaris, it'd be cheap to test whether the file system is known
>> to be uncompressed, which allow the heuristic, but Solaris has
>> SEEK_HOLE so there's little point.
>
> Hi Paul,
>
> Without the st_blocks-based heuristic, grep's big-hole test now fails
> (exhausts memory and exits with status 2) on an ext4 file system with
> a recent linux kernel.
> That happens because while SEEK_HOLE and SEEK_DATA are now defined,
> the kernel's ext4 lseek/SEEK_HOLE support is just a stub that simply
> returns the length of the file.
>
> For the record, the SEEK_HOLE support for btrfs and xfs in
> linux-3.4.4 (F17) works the way I would expect, and it looks
> like ocfs2 is fine, too.
>
> Here's a demo:
>
> SEEK_HOLE works (detects the hole) with btrfs (SEEK_HOLE == 4):
>
>     $ perl -e '$f=*STDERR; sysseek($f,2**22,0); syswrite($f,"a");' \
>       -e 'print 0+sysseek($f,0,4)' 2> j; stat -f --fo=\ %T .
>     0 btrfs
>
> SEEK_HOLE is not usable (reports "hole" at EOF) with ext4:
> stat -f report ext2/ext3, but that's only looking at the magic number.
> It's really ext4:
>
>     $ perl -e '$f=*STDERR; sysseek($f,2**22,0); syswrite($f,"a");' \
>       -e 'print 0+sysseek($f,0,4)' 2> j; stat -f --fo=\ %T .
>     4194305 ext2/ext3
>
> tmpfs uses the same code,
>
>     4194305 tmpfs

A quick update:
At least with recent linux kernels (3.5.0+), tmpfs now does
have SEEK_HOLE support.  Confirmed on fedora rawhide.
Thanks to Jeff Layton for the tip.



reply via email to

[Prev in Thread] Current Thread [Next in Thread]