bug-grep
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: small ascii files can be sparse


From: Paul Eggert
Subject: Re: small ascii files can be sparse
Date: Fri, 27 Jul 2012 12:29:27 -0700
User-agent: Mozilla/5.0 (X11; Linux i686; rv:14.0) Gecko/20120714 Thunderbird/14.0

On 07/27/2012 07:36 AM, Martin Carroll wrote:
> a "used" value of 0 for small ascii files is technically within spec

That's not clear.  The NFSv3 spec surely does not
not grant permission to the server to (say) report a
used count of zero at all times, claiming that this is
technically within spec.

But you're right that 'grep' should interoperate with
these servers, so I pushed the following patch into the
grep master.  It'd be nice to generalize this to other apps
but that's a bigger project.

Thanks for the bug report.


>From 2f0255e9f4cc5cc8bd619d1f217902eb29b30bc2 Mon Sep 17 00:00:00 2001
From: Paul Eggert <address@hidden>
Date: Fri, 27 Jul 2012 12:14:14 -0700
Subject: [PATCH] grep: don't falsely report tiny text files as binary

* NEWS: Document this.
* src/main.c (file_is_binary): When we are already at apparent
EOF, skip the file-size check, as some servers use zero blocks
to store binary files.  Reported by Martin Carroll in
<http://lists.gnu.org/archive/html/bug-grep/2012-07/msg00016.html>.
---
 NEWS       |    5 +++++
 src/main.c |   17 ++++++++++++-----
 2 files changed, 17 insertions(+), 5 deletions(-)

diff --git a/NEWS b/NEWS
index c7922ff..753aedc 100644
--- a/NEWS
+++ b/NEWS
@@ -2,6 +2,11 @@ GNU grep NEWS                                    -*- outline 
-*-
 
 * Noteworthy changes in release ?.? (????-??-??) [?]
 
+** Bug fixes
+
+  'grep' no longer falsely reports tiny text files as being binary
+  on file systems that store tiny files' contents in metadata.
+
 
 * Noteworthy changes in release 2.13 (2012-07-04) [stable]
 
diff --git a/src/main.c b/src/main.c
index dda7c9b..96e4f37 100644
--- a/src/main.c
+++ b/src/main.c
@@ -476,11 +476,18 @@ file_is_binary (char const *buf, size_t bufsize, int fd, 
struct stat const *st)
          represent its data, then it must have at least one hole.  */
       if (HAVE_STRUCT_STAT_ST_BLOCKS)
         {
-          off_t nonzeros_needed = st->st_size - cur + bufsize;
-          off_t full_blocks = nonzeros_needed / ST_NBLOCKSIZE;
-          int partial_block = 0 < nonzeros_needed % ST_NBLOCKSIZE;
-          if (ST_NBLOCKS (*st) < full_blocks + partial_block)
-            return 1;
+          /* Some servers store tiny files using zero blocks, so skip
+             this check at apparent EOF, to avoid falsely reporting
+             that a tiny zero-block file is binary.  */
+          off_t not_yet_read = st->st_size - cur;
+          if (0 < not_yet_read)
+            {
+              off_t nonzeros_needed = not_yet_read + bufsize;
+              off_t full_blocks = nonzeros_needed / ST_NBLOCKSIZE;
+              int partial_block = 0 < nonzeros_needed % ST_NBLOCKSIZE;
+              if (ST_NBLOCKS (*st) < full_blocks + partial_block)
+                return 1;
+            }
         }
 
       /* Look for a hole after the current location.  */
-- 
1.7.6.5





reply via email to

[Prev in Thread] Current Thread [Next in Thread]