How To Analyze Performance, Hangs, and High CPU Utilization on Linux

Introduction

There are many times when using native Linux/Unix tools can be very useful for analzing performance issues, especially high CPU or server hangs. This article discusses several tools executed in the two attached shell scripts, linmon.sh and linperf.sh. The scripts came from an IBM support site and are part of their standard information gathering process (https://www-304.ibm.com/support/docview.wss?uid=swg21115785#show-hide). The purpose of this article is to explain what the native tools in these scripts do and how you can use them to help troubleshoot performance issues on your server.

The Scripts

Both scripts use some of the same native tools but have different purposes. The attached zip file includes them.

linmon.sh

Linmon.sh is used to monitor your system for an extended period of time to alert you when a problem will occur. It runs for one day by default and collects metris every five minutes. It uses the ps and vmstat commands to monitor process size and CPU and paging usage (we'll discuss these in detail later in this article). When these metrics rise to a certain level it is a signal that an event is imminent, so then it's time to run linperf.sh. It writes two files, ps_mon.out and vmstat_mon.out. These files are likely to get pretty large after a day, so they should be deleted or roatated daily if you run the scripts for multiple days.

linperf.sh

This script is meant to capture metrics just before and/or during an event. It also uses the ps and vmstat commands but also includes whoami, top with two different parameters, and creates java thread dumps. 

The Native Tools

Process Status - ps -eLf

ps is the standard command to list processes and is most often used to find a process id (pid), as "ps -ef | grep java," to find the pid of a running application server. The L flag tells it to list all threads and adds two columns, LWP and NLWP. LWP, or Light Weight Process is the process number, NLWP is the number of threads running for that process.

Search for the process id of the running JVM process to find the number of threads associated with it.

(formatted for legibility)
UID        PID  PPID   LWP  C NLWP STIME TTY          TIME CMD
root      2956     1  2956  0  115 Jan21 ?        00:00:00 /opt/IBM/WebSphere/AppServer/java/bin/java
 -Declipse.security
 -Dwas.status.socket=52485
 -Dosgi.install.area=/opt/IBM/WebSphere/AppServer
 -Dosgi.configuration.area=/opt/IBM/WebSphere/AppServer/profiles/AppSrv01/servers/server1/configuration
 -Djava.awt.headless=true
 -Dosgi.framework.extensions=com.ibm.cds,com.ibm.ws.eclipse.adaptors
 -Xshareclasses:name=webspherev85_1.6_32_%g,nonFatal
 -Dcom.ibm.xtq.processor.overrideSecureProcessing=true
 -Xbootclasspath/p:/opt/IBM/WebSphere/AppServer/java/jre/lib/ibmorb.jar
 -classpath /opt/IBM/WebSphere/AppServer/profiles/AppSrv01/properties
           :/opt/IBM/WebSphere/AppServer/properties
           :/opt/IBM/WebSphere/AppServer/lib/startup.jar
           :/opt/IBM/WebSphere/AppServer/lib/bootstrap.jar
           :/opt/IBM/WebSphere/AppServer/lib/jsf-nls.jar
           :/opt/IBM/WebSphere/AppServer/lib/lmproxy.jar
           :/opt/IBM/WebSphere/AppServer/lib/urlprotocols.jar
           :/opt/IBM/WebSphere/AppServer/deploytool/itp/batchboot.jar
           :/opt/IBM/WebSphere/AppServer/deploytool/itp/batch2.jar
           :/opt/IBM/WebSphere/AppServer/java/lib/tools.jar
 -Dibm.websphere.internalClassAccessMode=allow
 -Xms50m -Xmx256m -Xscmaxaot4M -Xscmx60M
 -Dws.ext.dirs=/opt/IBM/WebSphere/AppServer/java/lib
              :/opt/IBM/WebSphere/AppServer/profiles/AppSrv01/classes
              :/opt/IBM/WebSphere/AppServer/classes:/opt/IBM/WebSphere/AppServer/lib
              :/opt/IBM/WebSphere/AppServer/installedChannels
              :/opt/IBM/WebSphere/AppServer/lib/ext
              :/opt/IBM/WebSphere/AppServer/web/help
              :/opt/IBM/WebSphere/AppServer/deploytool/itp/plugins/com.ibm.etools.ejbdeploy/runtime
 -Dderby.system.home=/opt/IBM/WebSphere/AppServer/derby
 -Dcom.ibm.itp.location=/opt/IBM/WebSphere/AppServer/bin
 -Djava.util.logging.configureByServer=true
 -Duser.install.root=/opt/IBM/WebSphere/AppServer/profiles/AppSrv01
 -Djava.ext.dirs=/opt/IBM/WebSphere/AppServer/tivoli/tam
                :/opt/IBM/WebSphere/AppServer/java/jre/lib/ext
 -Djavax.management.builder.initial=com.ibm.ws.management.PlatformMBeanServerBuilder
 -Dpython.cachedir=/opt/IBM/WebSphere/AppServer/profiles/AppSrv01/temp/cachedir
 -Dwas.install.root=/opt/IBM/WebSphere/AppServer
 -Djava.util.logging.manager=com.ibm.ws.bootstrap.WsLogManager
 -Dserver.root=/opt/IBM/WebSphere/AppServer/profiles/AppSrv01
 -Dcom.ibm.security.jgss.debug=off -Dcom.ibm.security.krb5.Krb5Debug=off
 -Djava.library.path=/opt/IBM/WebSphere/AppServer/lib/native/linux/x86_32/
                    :/opt/IBM/WebSphere/AppServer/java/jre/lib/i386/default
                    :/opt/IBM/WebSphere/AppServer/java/jre/lib/i386
                    :/opt/IBM/WebSphere/AppServer/bin
                    :/opt/IBM/WebSphere/AppServer/nulldllsdir
                    :/usr/lib:
 -Djava.endorsed.dirs=/opt/IBM/WebSphere/AppServer/endorsed_apis
                     :/opt/IBM/WebSphere/AppServer/java/jre/lib/endorsed
 -Djava.security.auth.login.config=/opt/IBM/WebSphere/AppServer/profiles/AppSrv01/properties/wsjaas.conf
 -Djava.security.policy=/opt/IBM/WebSphere/AppServer/profiles/AppSrv01/properties/server.policy com.ibm.wsspi.bootstrap.WSPreLauncher
 -nosplash
 -application com.ibm.ws.bootstrap.WSLauncher
              com.ibm.ws.runtime.WsServer /opt/IBM/WebSphere/AppServer/profiles/AppSrv01/config
              nullNode01Cell nullNode01 server1

The number of processes for pid 2956 is 115. If this number reaches the total number of threads allocated for this application it means thread starvation has occurred. To find out what is holding these threads you must analyze the thread dumps contained in the javacore files. Most often this occurs because of a memory leak.

vmstat 5 12

This will run vmstat with a 5 second delay between iterations for 12 iterations. Here is a summary of the fields:

  1. Procs
    • r: The number of processes waiting for run time.
    • b: The number of processes in uninterruptible sleep.
  2. Memory
    • swpd: the amount of virtual memory used.
    • free: the amount of idle memory.
    • buff: the amount of memory used as buffers.
    • cache: the amount of memory used as cache.
    • inact: the amount of inactive memory. (-a option)
    • active: the amount of active memory. (-a option)
  3. Swap
    • si: Amount of memory swapped in from disk (/s).
    • so: Amount of memory swapped to disk (/s).
  4. IO
    • bi: Blocks received from a block device (blocks/s).
    • bo: Blocks sent to a block device (blocks/s).
  5. System
    • in: The number of interrupts per second, including the clock.
    • cs: The number of context switches per second.
  6. CPU
    • us: Time spent running non-kernel code. (user time, including nice time)
    • sy: Time spent running kernel code. (system time)
    • id: Time spent idle. Prior to Linux 2.5.41, this includes IO-wait time.
    • wa: Time spent waiting for IO. Prior to Linux 2.5.41, included in idle.
    • st: Time stolen from a virtual machine. Prior to Linux 2.6.11, unknown.
procs -----------memory---------- ---swap-- -----io---- --system-- -----cpu-----
r  b   swpd   free   buff  cache   si   so    bi    bo   in   cs us sy  id wa st
1  2      0 6911696 175108 636960   0    0     0     1   13   15  0  0 100  0  0
0  0      0 6907868 175136 640272   0    0   201   488  302 1786  1  1  98  0  0
0  0      0 6907884 175136 640280   0    0     0    18  127  204  0  0 100  0  0
0  0      0 6908008 175136 640292   0    0     0     9  129  209  0  0 100  0  0
0  0      0 6908008 175136 640304   0    0     0    10  128  199  0  0 100  0  0
0  0      0 6908008 175136 640312   0    0     0    33  143  204  0  0 100  0  0
1  0      0 6907884 175140 640324   0    0     0    11  134  212  0  0 100  0  0
0  0      0 6907884 175140 640332   0    0     0    11  150  248  0  0 100  0  0
0  0      0 6907356 175140 640348   0    0     0    10  176  293  0  0 100  0  0
0  0      0 6906628 175140 640356   0    0     0    13  247  244  0  1  99  0  0
0  0      0 6906628 175140 640364   0    0     0    18  209  245  0  1  99  0  0
0  0      0 6906628 175140 640380   0    0     0    10  126  207  0  0 100  0  0
1  0      0 6906496 175140 640388   0    0     0    19  204  226  2  0  98  0  0
1  0      0 6906504 175140 640420   0    0     0    14  139  207  0  0 100  0  0
0  0      0 6906504 175140 640428   0    0     0    11  133  201  0  0 100  0  0
0  0      0 6906496 175140 640436   0    0     0    10  136  212  0  0 100  0  0
0  0      0 6906504 175140 640448   0    0     0     9  133  201  0  0 100  0  0
0  0      0 6906008 175140 640456   0    0     0    10  145  203  0  0  99  0  0
1  0      0 6906008 175140 640468   0    0     0    14  132  208  0  0 100  0  0
0  0      0 6906008 175140 640480   0    0     0    14  130  201  0  0 100  0  0
0  0      0 6906016 175140 640488   0    0     0    11  137  209  0  0 100  0  0

Since this server in this example is idle the CPU utilization is mostly 100% idle, but if the CPU is running at 95% or above for extended periods of time the application performance is no doubt poor and you must search for a reason. The other interesting metrics are swap in and out (the si and so columns). If it is swapping memory to disk at a high rate you are approaching or have reached the limits of system memory and performance will suffer.

kill -3

This command produces a java thread dump containing the state of all threads in the JVM at the time of the dump. In the Oracle/Sun JDK the thread dumps are written to stout, so you will find them written to the application server log file. The IBM JDK produces javacore files. The location of the javacore files depends on the OS, to find them use "find / -name *javacore*."

whoami

This is simply the user who is running the script.

[root@infra-was8-forgh ~]# whoami
root
[root@infra-was8-forgh ~]#

top -bc -d 60 -n 5

Top displays a listing of the most CPU-intensive tasks on the system. The b flag puts it into batch mode to facilitate writing the output to disk. The c flag tells it to display the full command line for each process instead of just the program name. The d flag tells it to delay 60 seconds between refreshes and the n flag tells it to stop after five refreshes.

top - 14:54:08 up 7 days,  3:42,  2 users,  load average: 0.00, 0.01, 0.00
Tasks: 250 total,   1 running, 249 sleeping,   0 stopped,   0 zombie
Cpu(s):  0.2%us,  0.1%sy,  0.0%ni, 99.6%id,  0.0%wa,  0.0%hi,  0.0%si,  0.0%st
Mem:   8313644k total,  1405676k used,  6907968k free,   175136k buffers
Swap: 16383992k total,        0k used, 16383992k free,   640004k cached
 
  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND          
 2956 root      20   0  466m 215m  62m S 19.6  2.7  13:39.16 /opt/IBM/WebSphere/
30375 root      20   0  2672 1084  768 R  2.0  0.0   0:00.01 top -bc -d 60 -n 5
    1 root      20   0  2852 1424 1208 S  0.0  0.0   0:04.07 /sbin/init
    2 root      20   0     0    0    0 S  0.0  0.0   0:00.00 [kthreadd]
    3 root      RT   0     0    0    0 S  0.0  0.0   0:00.03 [migration/0]
    4 root      20   0     0    0    0 S  0.0  0.0   0:00.00 [ksoftirqd/0]

top -bH -d 5 -n 49 -p 2956

Here the p flag tells top to show only process 2956 (the app server process). The H flag tells it to show all threads associated with pid 2956.

top - 15:46:23 up 10 days,  4:35,  2 users,  load average: 0.01, 0.02, 0.00
Tasks: 114 total,   0 running, 114 sleeping,   0 stopped,   0 zombie
Cpu(s):  0.2%us,  0.2%sy,  0.0%ni, 99.7%id,  0.0%wa,  0.0%hi,  0.0%si,  0.0%st
Mem:   8313644k total,  2819832k used,  5493812k free,   266804k buffers
Swap: 16383992k total,        0k used, 16383992k free,  1824068k cached
 
  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
2956 root      20   0  466m 216m  62m S  0.0  2.7   0:00.00 java
2957 root      20   0  466m 216m  62m S  0.0  2.7   0:15.31 P=557864:O=0:CT
2958 root      20   0  466m 216m  62m S  0.0  2.7   0:00.23 Signal Reporter
2960 root      20   0  466m 216m  62m S  0.0  2.7   0:08.43 JIT Compilation
2961 root      20   0  466m 216m  62m S  0.0  2.7   0:04.35 JIT Compilation
2962 root      20   0  466m 216m  62m S  0.0  2.7   0:00.00 JIT Compilation
2963 root      20   0  466m 216m  62m S  0.0  2.7   1:20.71 JIT Sampler
2964 root      20   0  466m 216m  62m S  0.0  2.7   0:05.58 JIT IProfiler
2965 root      20   0  466m 216m  62m S  0.0  2.7   0:00.00 Signal Dispatch
2966 root      20   0  466m 216m  62m S  0.0  2.7   0:00.12 Finalizer maste
2967 root      20   0  466m 216m  62m S  0.0  2.7   0:34.41 Concurrent Mark
2968 root      20   0  466m 216m  62m S  0.0  2.7   0:09.31 GC Slave
2969 root      20   0  466m 216m  62m S  0.0  2.7   0:09.36 GC Slave
2970 root      20   0  466m 216m  62m S  0.0  2.7   0:09.28 GC Slave
2973 root      20   0  466m 216m  62m S  0.0  2.7   0:08.57 Thread-8

Attachments: 
Feedback