bug-grep
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: SEEK_HOLE defined but useless on linux-3.4+/ext4 [Re: small ascii fi


From: Jim Meyering
Subject: Re: SEEK_HOLE defined but useless on linux-3.4+/ext4 [Re: small ascii files can be sparse
Date: Tue, 31 Jul 2012 09:05:38 +0200

Paul Eggert wrote:

> On 07/30/2012 12:33 PM, Jim Meyering wrote:
>
>>   - the interface is cumbersome (putting it mildly)
>
> Yes, and I remember that FIEMAP had some real bugs when the
> data structure on disk didn't match the data structure in
> memory.  Dunno if they're fixed.  Even if they are fixed,
> I'd reeeally rather just deal with SEEK_HOLE -- it's a
> *much* nicer interface.
>
>> it may be enough to use the old heuristic, but treat a file
>> as non-sparse when it has st.st_size <= ST_BLKSIZE(st).
>
> That would mishandle compressed file systems.  Say the file is
> 5 MB of text, but file system compression squashes it down to 1 MB.
> Then st_size is 5 MB whereas st_blocks is just 1 MB,
> and grep would incorrectly think that the file has a hole
> and therefore is a binary file.
>
> Since the test is marked as expensive, how about if we just
> leave things as-is?  Most people don't run expensive tests,
> and people who run them on inadequate file systems and with
> inadequate kernels that can't do 'ulimit -v' will just have
> to watch out (or buy machines with 10 TB of RAM ...).

:-)

I think that for now, at least with ext2, ext3, ext4 and tmpfs, grep can
resort to a file system type check (cached per-device statvfs.f_fsid).
Hmm.. maybe better to test "is local_fs and ! is_compressing_fs_type(f_fsid)"
since there aren't many of those, while we'll probably want to use the
heuristic also for FAT*, NTFS, HFS, etc.
Given the knowledge that we're using one of those non-compressing file
systems, the legacy heuristic will work.

Otherwise, I find it too onerous to search a hierarchy and watch
grep appear to hang while it consumes all virtual memory --
only to die (exit 2 or OOM-kill), interrupting the search.

>> The arguments for switching from ext4 to btrfs are adding up...
>
> I rely on you for notes from the bleeding edge....

I would have switched a year or so ago if ext4 weren't so much
faster when e.g., removing many small files.



reply via email to

[Prev in Thread] Current Thread [Next in Thread]