[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[bug #58197] "find" fails to optimize "-path /usr/foo -o -path /usr/bar"
From: |
Spencer Baugh |
Subject: |
[bug #58197] "find" fails to optimize "-path /usr/foo -o -path /usr/bar" to "-regex '/usr/\(foo\|bar\)'" |
Date: |
Wed, 19 Jul 2023 16:52:01 -0400 (EDT) |
Follow-up Comment #3, bug #58197 (project findutils):
One use case is GNU Emacs which heavily uses find, for example in M-x rgrep.
Emacs often constructs find commands which look like this by default:
find -H . \( -path \*/SCCS/\* -o -path \*/RCS/\* -o -path \*/CVS/\* -o -path
\*/MCVS/\* -o -path \*/.src/\* -o -path \*/.svn/\* -o -path \*/.git/\* -o
-path \*/.hg/\* -o -path \*/.bzr/\* -o -path \*/_MTN/\* -o -path \*/_darcs/\*
-o -path \*/\{arch\}/\* -o -path \*/.\#\* -o -path \*.o -o -path \*\~ -o -path
\*.bin -o -path \*.lbin -o -path \*.so -o -path \*.a -o -path \*.ln -o -path
\*.blg -o -path \*.bbl -o -path \*.elc -o -path \*.lof -o -path \*.glo -o
-path \*.idx -o -path \*.lot -o -path \*.fmt -o -path \*.tfm -o -path \*.class
-o -path \*.fas -o -path \*.lib -o -path \*.mem -o -path \*.x86f -o -path
\*.sparcf -o -path \*.dfsl -o -path \*.pfsl -o -path \*.d64fsl -o -path
\*.p64fsl -o -path \*.lx64fsl -o -path \*.lx32fsl -o -path \*.dx64fsl -o -path
\*.dx32fsl -o -path \*.fx64fsl -o -path \*.fx32fsl -o -path \*.sx64fsl -o
-path \*.sx32fsl -o -path \*.wx64fsl -o -path \*.wx32fsl -o -path \*.fasl -o
-path \*.ufsl -o -path \*.fsl -o -path \*.dxl -o -path \*.lo -o -path \*.la -o
-path \*.gmo -o -path \*.mo -o -path \*.toc -o -path \*.aux -o -path \*.cp -o
-path \*.fn -o -path \*.ky -o -path \*.pg -o -path \*.tp -o -path \*.vr -o
-path \*.cps -o -path \*.fns -o -path \*.kys -o -path \*.pgs -o -path \*.tps
-o -path \*.vrs -o -path \*.pyc -o -path \*.pyo \) -prune -o -type f
-print0
That is, it lists a bunch of file extensions to ignore (from
grep-find-ignored-files) and passes them to find.
Because find does not optimize this, the list of file extensions completely
dominates the cost of the find execution; on my system, running a find over a
particularly common directory, going from the full set of ignores to just one
takes the find execution time from:
real 0m7.807s
user 0m7.505s
sys 0m0.359s
to:
real 0m0.491s
user 0m0.221s
sys 0m0.293s
Could you reconsider optimizing this case?
If not, how should Emacs be invoking find instead? Should we optimize this
into a single regex before passing it to find? That is a bit worse user
experience, because we provide the ability to edit the find invocation, but it
may be worth it for speed improvements.
_______________________________________________________
Reply to this item at:
<https://savannah.gnu.org/bugs/?58197>
_______________________________________________
Message sent via Savannah
https://savannah.gnu.org/
- [bug #58197] "find" fails to optimize "-path /usr/foo -o -path /usr/bar" to "-regex '/usr/\(foo\|bar\)'",
Spencer Baugh <=