bug-gnu-emacs
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

bug#44983: Truncate long lines of grep output


From: Dmitry Gutov
Subject: bug#44983: Truncate long lines of grep output
Date: Wed, 9 Dec 2020 22:06:01 +0200
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:68.0) Gecko/20100101 Thunderbird/68.10.0

On 09.12.2020 21:17, Juri Linkov wrote:
I think until a long string is inserted to the buffer, truncating the
string in the variable in xref--collect-matches-1 should be much faster.

It would surely be faster, but how would that overhead compare to the
whole operation?

Could be negligible, except in the most extreme cases. After all, the main
slowdown factor with long strings is the display engine, and it won't be in
play there.

The upside is we'd be able to support column limiting with Grep too. Which
is the default configuration. And we'd extract the cutoff column into
a more visible user option.

This is exactly what we need.  After that this bug report/feature request
can be closed.

Perhaps you would like to come up with the name for the new user option? The changes to xref--collect-matches-1 should be straightforward (it will include a choice, though: whether to cut off matches when they don't fit). Since you're the one who has experienced poor performance because of this, though, you can do the benchmarking. Basically, what we need to know is whether the new option indeed makes performance acceptable.

BTW, for sorting currently xref-search-program-alist uses:

     "| sort -t: -k1,1 -k2n,2"

but fortunately ripgrep has a special option to do the same with:

     "--sort path"

Somehow, that option came out to be consistently slower in my benchmarking. Even when the results are only a few lines (that's actually when the difference should be most apparent, because with many lines Elisp takes up the most of CPU time). You can try it yourself:

(benchmark 10 '(project-find-regexp ":package-version '(xref"))

  0.86 with '| sort'
  1.33 with '--sort path'

$ rg --version
ripgrep 12.1.1 (rev 7cb211378a)
-SIMD -AVX (compiled)
+SIMD +AVX (runtime)

We can also document it in the docstring, though. For those who don't have 'sort' installed.

They should be merged into one regexp indeed.  Because after customizing
it
to the rg regexp, grep output doesn't highlight matches anymore (I use both
grep and rg interchangeably by different commands).
Currently their separate regexps are:
grep:
"\033\\[0?1;31m
   \\(.*?\\)
   \033\\[[0-9]*m"
rg:
"\033\\[[0-9]*m
   \033\\[[0-9]*1m
   \033\\[[0-9]*1m
   \\(.*?\\)
   \033\\[[0-9]*0m"
That could be combined into one regexp:
"\033\\[[0-9?;]*m
   \\(?:\033\\[[0-9]*1m\\)\\{0,2\\}
   \\(.*?\\)
   \033\\[[0-9]*0?m"

Makes sense. Is the parsing performance the same?

Performance is not a problem.  The problem is that more lax regexp
causes more false positives.  So the above regexp highlighted even
the separator colons (':') between file names and column numbers.

BTW, it's possible to see all highlighted parts of the output
by changing the argument 'MODE' of 'compilation-start' in 'grep'
from #'grep-mode to t (so it uses comint-mode in grep buffers).

Because ansi-color-process-output is in comint-output-filter-functions?

Anyway, I found the shortest change needed to support ripgrep,
and pushed to master.

Excellent.

Also, with the increased complexity, I'd rather we added a couple of tests,
or a comment with output examples. Or maybe both.

Fortunately, we have all possible cases listed in etc/grep.txt,
so it was easy to check if everything is highlighted correctly now.
Also I added ripgrep samples to etc/grep.txt.

Thanks!





reply via email to

[Prev in Thread] Current Thread [Next in Thread]