Re: [PATCH] grep: sparse files are now considered binary

bug-grep

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH] grep: sparse files are now considered binary

From:	Jim Meyering
Subject:	Re: [PATCH] grep: sparse files are now considered binary
Date:	Tue, 15 May 2012 19:15:10 +0200

Paul Eggert wrote:
> No further comment, so I pushed the following
> slightly-improved version of that patch.

Thanks for the nice patch.
Sorry I didn't realize you were waiting for review.

> * NEWS: Document this.
> * doc/grep.texi (File and Directory Selection): Likewise.
> * bootstrap.conf (gnulib_modules): Add stat-size.
> * src/main.c: Include stat-size.h.
> (usable_st_size): New function, mostly stolen from coreutils.
> (fillbuf): Use it.
> (file_is_binary): New function, which looks for holes too.
> (grep): Use it.
> * tests/Makefile.am (TESTS): Add big-hole.
> * tests/big-hole: New file.
> ---
>  NEWS              |    6 +++-
>  bootstrap.conf    |    1 +
>  doc/grep.texi     |    7 +++--
>  src/main.c        |   77 +++++++++++++++++++++++++++++++++++++++++++++++++++-
>  tests/Makefile.am |    1 +
>  tests/big-hole    |   31 +++++++++++++++++++++
>  6 files changed, 117 insertions(+), 6 deletions(-)
>  create mode 100755 tests/big-hole
>
> diff --git a/NEWS b/NEWS
...
> @@ -45,7 +50,6 @@ GNU grep NEWS                                    -*- 
> outline -*-
>    use -R if you prefer the old behavior of following all symlinks and
>    defaulting to reading all devices.
>
> -

Oops.  This hunk removed a blank line from "old" NEWS,
and triggers a "make syntax-check" failure.
Easiest is to restore the two-blank-line separator.

In coreutils I even added this rule in cfg.mk:

# Ensure that the end of each release's section is marked by two empty lines.
sc_NEWS_two_empty_lines:
        @sed -n 4,/Noteworthy/p $(srcdir)/NEWS                          \
            | perl -n0e '/(^|\n)\n\n\* Noteworthy/ or exit 1'           \
          || { echo '$(ME): use two empty lines to separate NEWS sections' \
                 1>&2; exit 1; } || :

...
> +/* Return 1 if a file is known to be binary for the purpose of 'grep'.
> +   BUF, of size BUFSIZE, is the initial buffer read from the file with
> +   descriptor FD and status ST.  */
> +static int
> +file_is_binary (char const *buf, size_t bufsize, int fd, struct stat const 
> *st)
> +{

Nice function.

...

> diff --git a/tests/big-hole b/tests/big-hole
> new file mode 100755
> index 0000000..47e36e1
> --- /dev/null
> +++ b/tests/big-hole
> @@ -0,0 +1,31 @@
> +#!/bin/sh
> +# Check that grep --binary-file=without-match quickly skips files with holes.
> +
> +. "${srcdir=.}/init.sh"; path_prepend_ ../src
> +
> +expensive_
> +
> +# Try to make this test not THAT expensive, on typical hosts.
> +virtual_memory_KiB=10240
> +if echo x | (ulimit -v $virtual_memory_KiB && grep x) >/dev/null 2>&1; then
> +  ulimit -v $virtual_memory_KiB
> +fi
> +
> +# Create a file that starts with at least a buffer's worth of text,
> +# but has a big hole later.
> +ten='1 2 3 4 5 6 7 8 9 10'
> +x='xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx'
> +(for i in $ten; do
> +   for j in $ten; do
> +     for k in $ten; do
> +       echo $x
> +     done
> +   done
> + done
> + echo x | dd bs=1024k seek=8000000
> +) >8T-or-so || skip_ 'cannot create big sparse file'

After wading through thousands of lines of shell debug output from
the likes of the above that I've written, I now prefer to use awk, e.g.,

  awk 'BEGIN{ for (i=0;i<1000;i++) printf "%080d\n", 0 }'

> +grep --binary-file=without-match x 8T-or-so >/dev/null
> +test $? -eq 1 || fail=1
> +
> +Exit $fail

So how about this?

>From 888a869035ecb55668e1a7fdfc474b62ed6e7457 Mon Sep 17 00:00:00 2001
From: Jim Meyering <address@hidden>
Date: Tue, 15 May 2012 19:12:35 +0200
Subject: [PATCH] maint: avoid nit-picky syntax-check test failure; tweak
 big-hole test

* NEWS: Restore deleted newline in "old" NEWS, to fix a syntax-check
test failure.
* tests/big-hole: Use awk, rather than a shell loop: saves 3000 lines
of verbose shell output in the .log file.
---
 NEWS           |  2 ++
 tests/big-hole | 12 ++----------
 2 files changed, 4 insertions(+), 10 deletions(-)

diff --git a/NEWS b/NEWS
index f515e84..6926276 100644
--- a/NEWS
+++ b/NEWS
@@ -19,6 +19,7 @@ GNU grep NEWS                                    -*- outline 
-*-
   Bootstrapping with Makefile.boot has been broken since grep 2.6,
   and was removed.

+
 * Noteworthy changes in release 2.12 (2012-04-23) [stable]

 ** Bug fixes
@@ -50,6 +51,7 @@ GNU grep NEWS                                    -*- outline 
-*-
   use -R if you prefer the old behavior of following all symlinks and
   defaulting to reading all devices.

+
 * Noteworthy changes in release 2.11 (2012-03-02) [stable]

 ** Bug fixes
diff --git a/tests/big-hole b/tests/big-hole
index 47e36e1..c509878 100755
--- a/tests/big-hole
+++ b/tests/big-hole
@@ -13,16 +13,8 @@ fi

 # Create a file that starts with at least a buffer's worth of text,
 # but has a big hole later.
-ten='1 2 3 4 5 6 7 8 9 10'
-x='xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx'
-(for i in $ten; do
-   for j in $ten; do
-     for k in $ten; do
-       echo $x
-     done
-   done
- done
- echo x | dd bs=1024k seek=8000000
+(${AWK-awk} 'BEGIN{ for (i=0;i<1000;i++) printf "%080d\n", 0 }'
+  echo x | dd bs=1024k seek=8000000
 ) >8T-or-so || skip_ 'cannot create big sparse file'

 grep --binary-file=without-match x 8T-or-so >/dev/null
--
1.7.10.2.484.gcd07cc5

[Prev in Thread]

Current Thread

[Next in Thread]

Re: [PATCH] grep: sparse files are now considered binary, Paul Eggert, 2012/05/15
- Re: [PATCH] grep: sparse files are now considered binary, Jim Meyering <=
  - Re: [PATCH] grep: sparse files are now considered binary, Paul Eggert, 2012/05/15
    - Re: [PATCH] grep: sparse files are now considered binary, Jim Meyering, 2012/05/16

Prev by Date: [PATCH] grep: handle non-devices like regular files
Next by Date: Re: [PATCH] grep: sparse files are now considered binary
Previous by thread: Re: [PATCH] grep: sparse files are now considered binary
Next by thread: Re: [PATCH] grep: sparse files are now considered binary
Index(es):
- Date
- Thread