gprofng-gui-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Bug sighting: Infinite loop in gprofng display gui while reading exp


From: Vladimir Mezentsev
Subject: Re: Bug sighting: Infinite loop in gprofng display gui while reading experiment
Date: Tue, 15 Aug 2023 00:42:29 -0700
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:102.0) Gecko/20100101 Thunderbird/102.4.0



On 8/14/23 15:39, Peter Kessler OS wrote:

Indeed you did fix that bug.  Thanks.

 

Now I seem to have hit the next bug. :-)

 

$ /home/pkessler/Work/GProfNG-GUI/Repos/gprofng-gui/INSTALL/bin/gprofng display gui

Unrecognized option "--gprofngdir=/home/pkessler/Work/GProfNG-GUI/Repos/gprofng-gui/INSTALL/bin/"


Sorry.

Fixes for  bug 64151: gprofng GUI and gprofng must be in same directory
were in gprofng and gprofng GUI.
I updated gprofng but not gprofng GUI.

I just pushed my fixes to git.savannah.gnu.org/srv/git/gprofng-gui.git

-Vladimir


 

To get around that,  I'm using

 

$ /home/pkessler/Work/GProfNG-GUI/Repos/gprofng-gui/INSTALL/bin/gp-display-gui

 

(That is, changing `gprofng display gui` to `gp-display-gui`.)  That command can open my previously problematic experiments just fine.

 

                                                ... peter

 

From: Vladimir Mezentsev <vladimir.mezentsev@oracle.com>
Date: Monday, August 14, 2023 at 09:29
To: Peter Kessler OS <peter.kessler@os.amperecomputing.com>, gprofng-gui-devel@gnu.org <gprofng-gui-devel@gnu.org>
Subject: Re: Bug sighting: Infinite loop in gprofng display gui while reading experiment

Hi Peter,

On 8/13/23 22:59, Peter Kessler OS wrote:

I found a sort of a workaround to "Exception in thread "IPCReader" java.lang.StringIndexOutOfBoundsException" issue with `gprofng display gui`.


https://savannah.gnu.org/bugs/index.php?64152 - GUI hang on large experiment

This bug should be fixed.
The fixes are in sourceware.org/git/binutils-gdb.git:

commit c476793d5bb9e7a0945411149128657a56d4b06c
Author: Vladimir Mezentsev <vladimir.mezentsev@oracle.com>
Date:   Thu Jun 29 13:11:09 2023 -0700

    gprofng: fix data race
    
    In our GUI project (https://savannah.gnu.org/projects/gprofng-gui), we use
    the output of gprofng to display the data. Sometimes this data is corrupted.
    
    gprofng/ChangeLog
    2023-06-29  Vladimir Mezentsev  <vladimir.mezentsev@oracle.com>
    
            * src/ipc.cc (ipc_doWork): Fix data race.
            * src/ipcio.cc (IPCresponse::print): Fix data race.
            Remove unused variables and functions.
            * src/ipcio.h: Declare two variables.
            * src/StringBuilder.cc (StringBuilder::write): New function.
            * src/StringBuilder.h: Likewise.





 

When `gprofng display gui` gets into a loop opening a complex experiment, I run

 

                $ for i in top-level.er/*.er ; do gprofng display text "${i}" -overview | grep -e 'Experiment      :' -e '  Target        :' ; done

 

and rummage through that output to find the "leaf" experiment that I am interested in, e.g.,

 

                Experiment      :top-level.er/_f158_x1_x3_x1_x1.er

                Target        : 'java ...'

 

and have always (so far :-) found that I can then use

 

                $ gprofng display gui top-level.er/_f158_x1_x3_x1_x1.er


If you are using the old gp-display-text (prior to my 2023-06-29 fixes,
there is no guarantee that the GUI will not hanged.


 

to bring up a perfectly useable gprofng gui.  (Thank you again for this tool!)

I'm sure someone is going to tell me not to have 158 process forks, and however many of whatever the "x1_x3_x1_x1" indicates, but that is the structure I get when I wrap `gprofng collect app` around the command that I usually run.  I recognize a lot of the sub-experiment targets, but digging down to where this particular Java process starts to wrap just that Java invocation with `gprofng collect app` is prohibitive.  Often the process I am interested in is started programmatically, and I don't control those programs.  (For example, the Java Microbenchmark Harness, JMH, where I first found this issue.)  It is much more tractable to find the leaf experiment after the fact.

 

Maybe this workaround would be useful for other `gprofng display gui` users.  Maybe the workaround suggests why the top-level experiment gets itself in trouble when reading the leaf experiments.  If you wanted, I could go through the individual leaf-level experiments to see if any of them individually gets into trouble.  In the case I am dealing with today, the top-level experiment has 528 (!) nested leaf experiments.



Maybe the -F option can help:
"gprofng collect app -F =java"


% man gp-collect-app.1
...
       -F {off|on|=regex}
           Control whether descendant processes should have their data recorded.  To disable/enable this feature, use off/on.
           Use =regex to record data on those processes whose executable name matches the regular _expression_.  Only the basename
           of the executable is used, not the full path.  If spaces or characters interpreted by the shell are used, enclose the
           regex in single quotes.  The default is -F on.


Thank you,
-Vladimir





 

                                                ... peter

 

From: Vladimir Mezentsev <vladimir.mezentsev@oracle.com>
Date: Thursday, May 25, 2023 at 00:14
To: Peter Kessler OS <peter.kessler@os.amperecomputing.com>, gprofng-gui-devel@gnu.org <gprofng-gui-devel@gnu.org>
Subject: Fwd: Bug sighting: Infinite loop in gprofng display gui while reading experiment

Resend to Peter.



-------- Forwarded Message --------

Subject:

Re: Bug sighting: Infinite loop in gprofng display gui while reading experiment

Date:

Thu, 25 May 2023 00:11:11 -0700

From:

Vladimir Mezentsev <vladimir.mezentsev@oracle.com>

To:

gprofng-gui-devel@gnu.org



 Hi Peter,

I am evaluating the similar problem on the big experiment.
So far no success.
It looks like both processes (gui and gp-display-text) are hanging and both are waiting for input.
This is usually OK, except when it is reported that N bytes will be sent, but less than N are sent.


Thank you,
-Vladimir




On 5/24/23 18:07, Peter Kessler OS via Gprofng-gui-devel wrote:

I have experiments that I collected while running a Java application under the Java Microbenchmark Harness (JMH).  JMH forks new JVM's so that the harness does not affect the benchmark, and can serially fork multiple JVM to measure run-to-run variation, startup time, etc.

 

I am seeing an infinite loop in the gprofng display gui, I think while reading an experiment.  I am using my own build of the https://savannah.gnu.org/projects/gprofng-gui/ repo that I built on an Ampere Altra running Fedora 36, and using the standard OpenJDK build of jdk-17.0.4.1 to run the gprofng display gui application.

 

I have tried up to 5 forks under JMH and gprofng display text seems to be able to handle them all.  Here's a run with 1 fork showing the process view:

 

Altra $ /home/pkessler/Work/GProfNG-GUI/Repos/INSTALL/bin/gprofng display text forks-1.er -processes

Warning: History and command editing is not supported on this system.

Objects sorted by metric: Exclusive Total CPU Time

 

Excl. Total    Name

CPU

sec.      %

2.642 100.00   <Total>

1.631  61.74   java, Process   4, PID 1195111

0.961  36.36   java, Process   1, PID 1195043

0.030   1.14   java, Process   2, PID 1195068

0.020   0.76   java, Process   3, PID 1195090

 

My benchmark is process 4, that takes around 1.6 seconds.  Here is a run with 5 forks:

 

Altra $ /home/pkessler/Work/GProfNG-GUI/Repos/INSTALL/bin/gprofng display text forks-5.er -processes

Warning: History and command editing is not supported on this system.

Objects sorted by metric: Exclusive Total CPU Time

 

Excl. Total    Name

CPU

sec.      %

9.176 100.00   <Total>

1.651  17.99   java, Process   5, PID 1195779

1.631  17.78   java, Process   4, PID 1195751

1.581  17.23   java, Process   6, PID 1195808

1.561  17.01   java, Process   8, PID 1195865

1.551  16.90   java, Process   7, PID 1195837

1.161  12.65   java, Process   1, PID 1195684

0.020   0.22   java, Process   2, PID 1195708

0.020   0.22   java, Process   3, PID 1195730

 

If I try to look at the same experiments with gprofng display gui it usually hangs reading the experiment.

 

Altra $ /home/pkessler/Work/GProfNG-GUI/Repos/INSTALL/bin/gprofng display gui \

        --jdkhome "/home/pkessler/Deployed/OpenJDK/jdk-17.0.4.1"              \

        forks-1.er

fdhome: /home/pkessler/Work/GProfNG-GUI/Repos/INSTALL

gp-display-text: /home/pkessler/Work/GProfNG-GUI/Repos/INSTALL/bin/gp-display-text

 

That run hung with the progress bar bouncing back and forth saying

 

Loading CallStack Data: _x1_x1.er

 

until I used File/Exit to quit and then it produced the exception stack trace

 

Exception in thread "IPCReader" java.lang.StringIndexOutOfBoundsException: offset 18, count 16842752, length 54

      at java.base/java.lang.String.checkBoundsOffCount(String.java:4589)

      at java.base/java.lang.String.<init>(String.java:523)

      at java.base/java.lang.String.<init>(String.java:1415)

      at org.gprofng.mpmt.ipc.IPCProtocol.decodeSVal(IPCProtocol.java:109)

      at org.gprofng.mpmt.ipc.IPCResult.progress(IPCResult.java:197)

      at org.gprofng.mpmt.ipc.IPCResult.fireResponse(IPCResult.java:220)

      at org.gprofng.mpmt.ipc.IPCReader.run(IPCReader.java:228)

      at java.base/java.lang.Thread.run(Thread.java:833)

 

Trying the same thing again (because :-) worked flawlessly, allowing me to export the Process view

 

ExperimentIDs sorted by metric: Exclusive Total CPU Time

 

Exclusive        

Total CPU Time    Name

sec.             

2.642 (100.00%)   <Total>

1.631 ( 61.74%)   java, Process   4, PID 1195111

0.961 ( 36.36%)   java, Process   1, PID 1195043

0.030 (  1.14%)   java, Process   2, PID 1195068

0.020 (  0.76%)   java, Process   3, PID 1195090

 

matching the gprofng display text shown above, with subtle formatting differences.

 

When gprofng display gui hangs on input (80% (?) of the time with my forks-1.er, 100% of the time with my forks-5.er), the status bar in the lower right shows different stages.  Here are a bunch of attempts to display the 5 fork experiment

 

Altra $ /home/pkessler/Work/GProfNG-GUI/Repos/INSTALL/bin/gprofng display gui \

        --jdkhome "/home/pkessler/Deployed/OpenJDK/jdk-17.0.4.1"              \

        forks-5.er

 

which produced the status bars

 

Loading Profile Data: _x2_x1.er

Loading CallStack Data: _x1_x1.er

Loading CallStack Data: _x2_x1.er

Loading CallStack Data: _x6_x1.er

Loading CallStack Data: _x6_x1.er 34%

Loading CallStack Data: _x3_x1.er

Loading CallStack Data: _x4_x1.er

Loading CallStack Data: _x4_x1.er 69%

Loading CallStack Data: _x5_x1.er

Loading CallStack Data: _x4_x1.er 34%

Loading CallStack Data: _x6_x1.er 52%

Processing Load Object Data

 

I have tried letting it run (for hours) and it never finishes loading the experiment.  But sometimes it just works: loading the experiment (forks-1.er) essentially instantly.  (Fortunately, the one time I was demoing it to my manager! :-)  I hate intermittent bugs.

 

When I have to use File/Exit to quit the GUI, it always reports the same exception stack trace shown above.  The count 16842752 argument looks distinctly wrong.  (!(offset < 0) and !(count < 0), but (offset > length - count)).  Maybe count is an uninitialized variable?  Maybe the looping behavior is because someone is catching the exception (or a superclass) but not fixing the cause and trying again.  I shouldn't speculate.

 

The experiments are too large to attach to an email.  I could send you the code if you have Maven and can download JMH and try to reproduce it yourself.  Or, if you know how to write JMH benchmarks, you might be able to reproduce the problem on your own.  Let me know if you want me to send you anything that will help you debug this issue.

 

                                                ... peter

 

 

 

 

 

 



reply via email to

[Prev in Thread] Current Thread [Next in Thread]