bug-grep
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH] grep: -r no follows symlinks


From: Jim Meyering
Subject: Re: [PATCH] grep: -r no follows symlinks
Date: Mon, 12 Mar 2012 19:08:45 +0100

Paul Eggert wrote:
> Here's a proposed change to GNU grep so that 'grep -r'
> no longer follows symlinks, and by default does not
> read devices, which is more convenient in the typical
> use case.  Symlinks and devices on the command line are
> still dereferenced; the change affects only symlinks and
> devices encountered recursively.  Users who want dereferencing
> for all files can use -R, which retains its old meaning.
>
> I considered adding --dereference, --no-dereference,
> --dereference-command-line, etc, but that was a lot of extra complexity
> (particularly in the documentation) for little payoff --
> anybody who wants something that complicated can use 'find',
> as the whole point of 'grep -r' is that it should be simple.
>
> As a side effect this patch changes grep to use fts,
> which should make it more robust with large directory
> hierarchies.  This is why the patch subtracts more lines
> than it adds.

As you mentioned below, it is more efficient.
For example, to search a dir-only hierarchy of depth 2040 on a
tmpfs file system, I see ~20-30x speed-up (0.60 elapsed vs 0.02)
The hierarchy looks like z/z/z/.../z with 2039 slashes.

  $ env time grep -r FFF z
  0.03user 0.55system 0:00.59elapsed 98%CPU (0avgtext+0avgdata 5744maxresident)k
  0inputs+0outputs (0major+1482minor)pagefaults 0swaps
  $ env time ~/w/co/grep/src/grep -r FFF z
  0.00user 0.01system 0:00.02elapsed 91%CPU (0avgtext+0avgdata 1560maxresident)k
  0inputs+0outputs (0major+421minor)pagefaults 0swaps

[those times remained consistent across repeated trials]

Making it slightly deeper (when the name reached length 4096/PATH_MAX),
the result became too long for the old version, which failed with
"File name too long".

With 500,000 empty files on a tmpfs file system, created like this:
    mkdir j; cd j; seq 500000|xargs touch
I see that env time grep -r FFF . is 4-5% faster with elapsed
time going from (old-to-new) 1.22s to 1.15s.

> Subject: [PATCH 1/2] build: update gnulib submodule to latest
...
> Subject: [PATCH 2/2] grep: -r no follows symlinks

s/no/no longer/

Please mention fts in the subject, too, since rewriting the
dir-traversal code is pretty fundamental.  Maybe this:

    grep: -r no longer follows symlinks; use fts

> Change -r to follow only command-line symlinks, and by default to
> reads only devices named on the command line.  This is a simple
> way to get a more-useful behavior when searching random
> directories; the idea is to use 'find' if you want something fancy.
> -R acts as before and gets a new alias --dereference-recursive.
> The code now uses fts internally, so it should be more robust and

s/should be/is/  ;-)

> faster with large hierarchies.
> * .gitignore: Remove lib/savedir.c, lib/savedir.h.
> * tests/symlink: New file
> * Makefile.boot (LIB_OBJS_core): Remove isdir.o, savedir.o.
> Perhaps other changes are needed too, but I'm not sure what
> this makefile is for.
> * NEWS: Document changes.
> * doc/grep.texi (File and Directory Selection): Likewise.
> * bootstrap.conf (gnulib_modules): Remove dirent, dirname, isdir, open.
> Add fstatat, fts, openat-safer.
> * lib/Makefile.am (libgreputils_a_SOURCES): Remove savedir.c, savedir.h.
> * lib/savedir.c, lib/savedir.h: Remove.
> * po/POTFILES.in: Add lib/openat-die.c.
> * src/main.c: Include fcntl-safer.h, fts_.h.  Don't include
> isdir.h, savedir.h.
> (struct stats, stats_base): Remove.
> (long_options, usage, main): Add --dereference-recursive and
> implement -r vs -R.
> (filename_prefix_len, fts_options): New static vars.
> (basic_fts_options, READ_COMMAND_LINE_DEVICES): New constants.
> (devices): Now defaults to READ_COMMAND_LINE_DEVICES.
> (reset, grep): Now takes just struct stat rather than file name and
> struct stats.  All callers changed.
> (fillbuf): Now takes struct stat reather than struct stats.
> All callers changed.
> (grep): Don't worry about recursing too deeply; fts and grepdesc
> handle this now.
> (grepdirent, grepdesc, grep_command_line_args): New functions.
> (grepfile): New args DIRDESC, FOLLOW, COMMAND_LINE.  Remove struct stats
> arg.  All callers changed.  Use openat_safer rather than open.
> Use desc == STDIN_FILENO to tell whether we're reading "-".
> Don't worry about EINTR when closing -- not possible, since we're
> not catching signals.
> * tests/Makefile.am (TESTS): Add symlink.
> * tests/symlink: New file.

I should finish the review tonight.



reply via email to

[Prev in Thread] Current Thread [Next in Thread]