bug-gawk
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [bug-gawk] 4.1.3->4.1.4 = Linux-libre's deblob-check grows huge and


From: arnold
Subject: Re: [bug-gawk] 4.1.3->4.1.4 = Linux-libre's deblob-check grows huge and takes forever
Date: Thu, 13 Jul 2017 11:49:51 -0600
User-agent: Heirloom mailx 12.4 7/29/08

Where can I get the files from?

Thanks,

Arnold

"Andrew J. Schorr" <address@hidden> wrote:

> Hi,
>
> I grabbed the file. It's broken both in master and in stable.
>
> gawk 4.1.3:
>
> bash-4.2$ /bin/time ./deblob-check --use-awk linux-libre-4.12-gnu.tar.bz2 
> 472.19user 96.71system 6:59.57elapsed 135%CPU (0avgtext+0avgdata 
> 876968maxresident)k
> 0inputs+0outputs (0major+63559391minor)pagefaults 0swaps
>
> Master branch (I ctrl-c'ed after 13 minutes):
>
> bash-4.2$ /bin/time ./deblob-check --use-awk linux-libre-4.12-gnu.tar.bz2 
> ^C813.84user 17.63system 13:23.65elapsed 103%CPU (0avgtext+0avgdata 
> 1122292maxresident)k
> 0inputs+0outputs (0major+11885984minor)pagefaults 0swaps
>
> Stable branch (I ctrl-c'ed after 13 minutes):
>
> bash-4.2$ /bin/time ./deblob-check --use-awk linux-libre-4.12-gnu.tar.bz2 
> ^C828.07user 23.17system 13:35.58elapsed 104%CPU (0avgtext+0avgdata 
> 1590252maxresident)k
> 456inputs+72outputs (4major+15541814minor)pagefaults 0swaps
>
> Kind of a pain to bisect, since each iteration will be so slow. I haven't
> tried yet.
>
> -Andy
>
> On Thu, Jul 13, 2017 at 01:42:21AM -0600, address@hidden wrote:
> > If neither of those are any better, then let's work offline to isolate
> > when things broke. "git bisect" is quite good at that.  :-) If possible,
> > I'd prefer to fix the problem instead of leaving things alone.
> > 
> > Thanks,
> > 
> > Arnold
> > 
> > address@hidden wrote:
> > 
> > > Hi.
> > >
> > > Can you try building from the gawk-4.1-stable branch in the git repo
> > > and let me know if you still have the problem?
> > >
> > > I'm also curious if you build from master in the repo what happens.
> > >
> > > Thanks,
> > >
> > > Arnold
> > >
> > > Alexandre Oliva <address@hidden> wrote:
> > >
> > > > Hi,
> > > >
> > > > I've upgraded the root in which I create and verify GNU Linux-libre
> > > > tarballs from Fedora/Freed-ora 25 to 26, which brought gawk from 4.1.3 
> > > > to
> > > > 4.1.4.
> > > >
> > > > With 4.1.3, it used about 1GB of RAM and took some 15 minutes to run.
> > > >
> > > > With 4.1.4, I gave up after 2 hours of CPU time, and the process was at
> > > > 6GB and growing.
> > > >
> > > > I saw a number of regexp changes in gawk 4.1.3-4.1.4 diff, so I took the
> > > > Fedora 25 binary and it's running on the Fedora 26 root with the
> > > > previous memory use.
> > > >
> > > > The command I use to perform this check is:
> > > >
> > > > deblob-check --use-awk linux-libre-4.12.tar.bz2
> > > >
> > > > deblob-check and the tarball can be downloaded from
> > > > http://linux-libre.fsfla.org/pub/linux-libre/releases/4.12-gnu/
> > > >
> > > > The script generates and runs a gawk script with monster regexps that
> > > > match known blobs, known false positives, and patterns that catch likely
> > > > blobs, and it's running that generated script that's taking up a lot of
> > > > RAM and time.
> > > >
> > > > deblob-check can use sed, python or perl instead of gawk, but gawk used
> > > > to be the best choice for this final checking, because of the low memory
> > > > use compared with sed, and the DFA-based regexp not available in python
> > > > and perl.  (for deblobbing proper, python turns out to be better due to
> > > > the much lower start-up time compiling the monster regexp)
> > > >
> > > > I haven't checked whether gawk 4.1.4 still beats the memory efficiency
> > > > of sed, but sed was barely usable for this purpose back then, and gawk
> > > > 4.1.4 is unfortunately turning out to be unusable too.
> > > >
> > > > Any recommendations as to how we could avoid this huge performance
> > > > regression in gawk, short of switching to a different regexp processing
> > > > engine?
> > > >
> > > > Thanks in advance,
> > > >
> > > > -- 
> > > > Alexandre Oliva, freedom fighter    http://FSFLA.org/~lxoliva/
> > > > You must be the change you wish to see in the world. -- Gandhi
> > > > Be Free! -- http://FSFLA.org/   FSF Latin America board member
> > > > Free Software Evangelist|Red Hat Brasil GNU Toolchain Engineer
>
> -- 
> Andrew Schorr                      e-mail: address@hidden
> Telemetry Investments, L.L.C.      phone:  917-305-1748
> 545 Fifth Ave, Suite 1108          fax:    212-425-5550
> New York, NY 10017-3630



reply via email to

[Prev in Thread] Current Thread [Next in Thread]