bug-grep
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

bug#21700: new snapshot available: grep-2.21.78-7da30


From: Gary Johnson
Subject: bug#21700: new snapshot available: grep-2.21.78-7da30
Date: Thu, 22 Oct 2015 16:49:34 -0700
User-agent: Mutt/1.5.20 (2009-06-14)

On 2015-10-21, Jim Meyering wrote:
> On Wed, Oct 21, 2015 at 1:09 PM, Gary Johnson wrote:
> > On 2015-10-18, Jim Meyering wrote:
> >> > I built the snapshot on two systems, a fairly old one running Ubuntu
> >> > 10.04.4 and a newer one running an up-to-date Linux Mint 17.2.
> >> > 'make check' reported the same two failures on both:
> >> >
> >> >    XFAIL: backref-alt
> >> >    XFAIL: triple-backref
> >>
> >> Thanks for building and reporting.
> >> Each of those "XFAIL"s indicates an expected failure, so that is the
> >> expected test result, for now.
> >
> > OK, thanks.
> >
> > I also built the snapshot successfully on a Fedora 17 system that I
> > use for real work.  I just ran a performance test, FWIW.  I searched
> > recursively in our source hierarchy of 6044 regular files and 1102
> > directories for a simple string.
> >
> >     time grep -Rin mystring src > /dev/null
> >
> > Here are the results, averaged over three trials each, not including
> > any slow times clearly due to updating caches.
> >
> >             2.12    2.21    2.21.78-7da30
> >             -----   -----   -----
> >     real    18.0s   1.08s   2.36s
> >     user    17.8s   0.96s   2.24s
> >     sys     0.12s   0.11s   0.10s
> >
> > Version 2.12 was /bin/grep.  The other two versions I built myself.
> 
> Thank you for the timings. Next time, please include the following:

This is kind of long, so I'll summarize here.  The relatively poor
performance I observed of grep-2.21.78 appears to have been due to
my having built it in an environment tainted with CFLAGS from the
build of another project.  A clean build of grep-2.21.78 resulted in
performance only slightly worse than grep-2.21.

>   - CPU type/speed

>From lshw (probably more than you wanted):

     *-cpu:0
          description: CPU
          product: Quad-Core Xeon 5xxx
          vendor: Intel Corp.
          physical id: 5
          bus info: address@hidden
          version: Intel(R) Xeon(R) CPU E5506 @ 2.13GHz
          slot: CPU0 PROCESSOR
          size: 1596MHz
          capacity: 2128MHz
          width: 64 bits
          clock: 505MHz
          capabilities: x86-64 fpu fpu_exception wp vme de pse tsc msr pae mce 
cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss 
ht tm pbe syscall nx rdtscp constant_tsc arch_perf
mon pebs bts rep_good nopl xtopology nonstop_tsc aperfmperf pni dtes64 monitor 
ds_cpl vmx est tm2 ssse3 cx16 xtpr pdcm dca sse4_1 sse4_2 popcnt lahf_lm 
tpr_shadow vnmi flexpriority ept vpid cpufreq
          configuration: cores=4 enabledcores=4 threads=4
        *-cache:0
             description: L1 cache
             physical id: 7
             slot: L1 Cache
             size: 256KiB
             capacity: 256KiB
             capabilities: burst internal write-through unified
        *-cache:1
             description: L2 cache
             physical id: 8
             slot: L2 Cache
             size: 1MiB
             capacity: 1MiB
             capabilities: burst internal write-back unified
        *-cache:2
             description: L3 cache
             physical id: 9
             slot: L3 Cache
             size: 4MiB
             capacity: 4MiB
             capabilities: burst internal write-back unified
     *-cpu:1
          description: CPU
          product: Quad-Core Xeon 5xxx
          vendor: Intel Corp.
          physical id: 6
          bus info: address@hidden
          version: Intel(R) Xeon(R) CPU E5506 @ 2.13GHz
          slot: CPU1 PROCESSOR
          size: 1596MHz
          capacity: 2128MHz
          width: 64 bits
          clock: 505MHz
          capabilities: x86-64 fpu fpu_exception wp vme de pse tsc msr pae mce 
cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss 
ht tm pbe syscall nx rdtscp constant_tsc arch_perfmon pebs bts rep_good nopl 
xtopology nonstop_tsc aperfmperf pni dtes64 monitor ds_cpl vmx est tm2 ssse3 
cx16 xtpr pdcm dca sse4_1 sse4_2 popcnt lahf_lm tpr_shadow vnmi flexpriority 
ept vpid cpufreq
          configuration: cores=4 enabledcores=4 threads=4
        *-cache:0
             description: L1 cache
             physical id: a
             slot: L1 Cache
             size: 256KiB
             capacity: 256KiB
             capabilities: burst internal write-through unified
        *-cache:1
             description: L2 cache
             physical id: b
             slot: L2 Cache
             size: 1MiB
             capacity: 1MiB
             capabilities: burst internal write-back unified
        *-cache:2
             description: L3 cache
             physical id: c
             slot: L3 Cache
             size: 4MiB
             capacity: 4MiB
             capabilities: burst internal write-back unified

>   - file system type (and SSD or spinning rust)

Type: ext4
Size: 1.1 TB
Spinning rust

The file system resides on an LVM logical volume composed of two
physical volumes.  One physical volume is on a Seagate ST3250318AS
and the other is on a Western Digital WDC WD1002FAEX-0.  I didn't
build the system, so I don't know very much about this.

>   - OS version

Fedora 17
Kernel: 3.3.4-5.fc17.x86_64

>   - options with which you configured/built grep

Version 2.21:
    ./configure --prefix=$HOME/src/grep-2.21
    make

Version 2.21.78-7da30:
    ./configure --prefix=$HOME/src/grep-2.21.78
    make

gcc is:
    gcc (GCC) 4.7.0 20120507 (Red Hat 4.7.0-5)

>   - your current locale

LANG=en_US.UTF-8
LC_CTYPE="en_US.UTF-8"
LC_NUMERIC="en_US.UTF-8"
LC_TIME="en_US.UTF-8"
LC_COLLATE=C
LC_MONETARY="en_US.UTF-8"
LC_MESSAGES="en_US.UTF-8"
LC_PAPER="en_US.UTF-8"
LC_NAME="en_US.UTF-8"
LC_ADDRESS="en_US.UTF-8"
LC_TELEPHONE="en_US.UTF-8"
LC_MEASUREMENT="en_US.UTF-8"
LC_IDENTIFICATION="en_US.UTF-8"
LC_ALL=

> While you see a performance degradation going from 2.21 to the
> first 2.22 release candidate, I see the opposite trend, albeit barely
> measurable:
> 
> Searching the following hierarchies, I see a consistent 1% improvement
> going from 2.21 to 2.22 on an Intel(R) Core(TM) i7-4770S CPU @ 3.10GHz.
> The files I searched were on an ext4 file system residing on an SSD
> (OCZ-VERTEX3).
> This system is using fedora rawhide.
> 
> $ find [a-g]* -type f|wc -l
> 335065
> $ find [a-g]* -type d|wc -l
> 9667
> $ du -shc [a-g]*
> 25M     autoconf
> 125M    automake
> 129M    bison
> 74M     cppi
> 437M    cu
> 103M    diffutils
> 732M    emacs
> 2.3G    gcc
> 345M    glibc
> 252M    gnulib
> 187M    grep
> 90M     gzip
> 4.7G    total
> 
> Both grep binaries were compiled with gcc-6.0.something (built from git)
> using ./configure --enable-gcc-warnings && make
> 
> Here are best-of-3 timings running this command:
> 
>   env LC_ALL=en_US.UTF-8 time grep -ri mystring [a-g]* > /dev/null
> 
> grep-2.21: 8.05user 1.10system 0:09.17elapsed 99%CPU
> (0avgtext+0avgdata 32876maxresident)k
> 0inputs+0outputs (0major+9986minor)pagefaults 0swaps
> 
> grep-2.22: 8.04user 1.04system 0:09.10elapsed 99%CPU
> (0avgtext+0avgdata 32940maxresident)k
> 0inputs+0outputs (0major+9988minor)pagefaults 0swaps
> 
> It is critical to mention the locale you use.
> As you see above, I explicitly set LC_ALL=en_US.UTF-8.
> Note that when I switch to LC_ALL=C, it halves those times,
> although the ~1% win with 2.22 still remains
> 
> Would you please compile 2.21 yourself, too? Otherwise, the timing may
> be biased by the fact that distribution-provided binaries are often
> better optimized than those one gets when building from sources with
> the default options. If we can identify a modern system for which
> there is anywhere near a 2x performance regression, I would be very
> interested to learn more.

Version 2.21 is one I compiled myself.  The distribution-provided
version is 2.12.

Your comments encouraged me to pay more attention to what I was
doing.  I compared the config.log files from the grep-2.21 and
grep-2.21.78-7da30 directories and noticed that the environments and
results were slightly different.  I noticed that CFLAGS had been set
to "-g -DFEAT_CONCEAL" for a Vim build and had been used when I
built grep-2.21.78.  Also, I had built grep-2.21 back in February
and couldn't be sure that nothing relevant had changed on the system
since then.

So I opened a new xterm window, created two new build directories
and untarred, configured and made both grep versions from scratch.
New measurements showed no difference between the two 2.21 builds,
but a significant improvement in the 2.21.78 times.  Here are the
new results.  The times of successive runs were very close, so I
just chose a representative example of each.  In short, 2.21.78
appears _slightly_ slower than 2.21, but not enough (for me) to
worry about.

====================================================================

$ time ~/grep-2.21-new/bin/grep -ri mystring src > /dev/null

real    0m0.814s
user    0m0.725s
sys     0m0.081s

$ time LC_ALL=en_US.UTF-8 ~/grep-2.21-new/bin/grep -ri mystring src > /dev/null

real    0m0.817s
user    0m0.720s
sys     0m0.090s

$ time LC_ALL=C ~/grep-2.21-new/bin/grep -ri mystring src > /dev/null

real    0m0.350s
user    0m0.252s
sys     0m0.094s

====================================================================

$ time ~/grep-2.21.78-new/bin/grep -ri mystring src > /dev/null

real    0m0.849s
user    0m0.756s
sys     0m0.086s

$ time LC_ALL=en_US.UTF-8 ~/grep-2.21.78-new/bin/grep -ri mystring src > 
/dev/null

real    0m0.849s
user    0m0.751s
sys     0m0.090s

$ time LC_ALL=C ~/grep-2.21.78-new/bin/grep -ri mystring src > /dev/null

real    0m0.354s
user    0m0.267s
sys     0m0.082s

====================================================================

I'm sorry for wasting your time on a wild goose chase.  (But my new
grep works better now!)

Regards,
Gary






reply via email to

[Prev in Thread] Current Thread [Next in Thread]