Introduction
There are many times when using native Linux/Unix tools can be very useful for analzing performance issues, especially high CPU or server hangs. This article discusses several tools executed in the two attached shell scripts, linmon.sh and linperf.sh. The scripts came from an IBM support site and are part of their standard information gathering process (https://www-304.ibm.com/support/docview.wss?uid=swg21115785#show-hide). The purpose of this article is to explain what the native tools in these scripts do and how you can use them to help troubleshoot performance issues on your server.
The Scripts
Both scripts use some of the same native tools but have different purposes. The attached zip file includes them.
linmon.sh
Linmon.sh is used to monitor your system for an extended period of time to alert you when a problem will occur. It runs for one day by default and collects metris every five minutes. It uses the ps and vmstat commands to monitor process size and CPU and paging usage (we'll discuss these in detail later in this article). When these metrics rise to a certain level it is a signal that an event is imminent, so then it's time to run linperf.sh. It writes two files, ps_mon.out and vmstat_mon.out. These files are likely to get pretty large after a day, so they should be deleted or roatated daily if you run the scripts for multiple days.
linperf.sh
This script is meant to capture metrics just before and/or during an event. It also uses the ps and vmstat commands but also includes whoami, top with two different parameters, and creates java thread dumps.
The Native Tools
Process Status - ps -eLf
ps is the standard command to list processes and is most often used to find a process id (pid), as "ps -ef | grep java," to find the pid of a running application server. The L flag tells it to list all threads and adds two columns, LWP and NLWP. LWP, or Light Weight Process is the process number, NLWP is the number of threads running for that process.
Search for the process id of the running JVM process to find the number of threads associated with it.
(formatted for legibility) UID PID PPID LWP C NLWP STIME TTY TIME CMD root 2956 1 2956 0 115 Jan21 ? 00:00:00 /opt/IBM/WebSphere/AppServer/java/bin/java -Declipse.security -Dwas.status.socket=52485 -Dosgi.install.area=/opt/IBM/WebSphere/AppServer -Dosgi.configuration.area=/opt/IBM/WebSphere/AppServer/profiles/AppSrv01/servers/server1/configuration -Djava.awt.headless=true -Dosgi.framework.extensions=com.ibm.cds,com.ibm.ws.eclipse.adaptors -Xshareclasses:name=webspherev85_1.6_32_%g,nonFatal -Dcom.ibm.xtq.processor.overrideSecureProcessing=true -Xbootclasspath/p:/opt/IBM/WebSphere/AppServer/java/jre/lib/ibmorb.jar -classpath /opt/IBM/WebSphere/AppServer/profiles/AppSrv01/properties :/opt/IBM/WebSphere/AppServer/properties :/opt/IBM/WebSphere/AppServer/lib/startup.jar :/opt/IBM/WebSphere/AppServer/lib/bootstrap.jar :/opt/IBM/WebSphere/AppServer/lib/jsf-nls.jar :/opt/IBM/WebSphere/AppServer/lib/lmproxy.jar :/opt/IBM/WebSphere/AppServer/lib/urlprotocols.jar :/opt/IBM/WebSphere/AppServer/deploytool/itp/batchboot.jar :/opt/IBM/WebSphere/AppServer/deploytool/itp/batch2.jar :/opt/IBM/WebSphere/AppServer/java/lib/tools.jar -Dibm.websphere.internalClassAccessMode=allow -Xms50m -Xmx256m -Xscmaxaot4M -Xscmx60M -Dws.ext.dirs=/opt/IBM/WebSphere/AppServer/java/lib :/opt/IBM/WebSphere/AppServer/profiles/AppSrv01/classes :/opt/IBM/WebSphere/AppServer/classes:/opt/IBM/WebSphere/AppServer/lib :/opt/IBM/WebSphere/AppServer/installedChannels :/opt/IBM/WebSphere/AppServer/lib/ext :/opt/IBM/WebSphere/AppServer/web/help :/opt/IBM/WebSphere/AppServer/deploytool/itp/plugins/com.ibm.etools.ejbdeploy/runtime -Dderby.system.home=/opt/IBM/WebSphere/AppServer/derby -Dcom.ibm.itp.location=/opt/IBM/WebSphere/AppServer/bin -Djava.util.logging.configureByServer=true -Duser.install.root=/opt/IBM/WebSphere/AppServer/profiles/AppSrv01 -Djava.ext.dirs=/opt/IBM/WebSphere/AppServer/tivoli/tam :/opt/IBM/WebSphere/AppServer/java/jre/lib/ext -Djavax.management.builder.initial=com.ibm.ws.management.PlatformMBeanServerBuilder -Dpython.cachedir=/opt/IBM/WebSphere/AppServer/profiles/AppSrv01/temp/cachedir -Dwas.install.root=/opt/IBM/WebSphere/AppServer -Djava.util.logging.manager=com.ibm.ws.bootstrap.WsLogManager -Dserver.root=/opt/IBM/WebSphere/AppServer/profiles/AppSrv01 -Dcom.ibm.security.jgss.debug=off -Dcom.ibm.security.krb5.Krb5Debug=off -Djava.library.path=/opt/IBM/WebSphere/AppServer/lib/native/linux/x86_32/ :/opt/IBM/WebSphere/AppServer/java/jre/lib/i386/default :/opt/IBM/WebSphere/AppServer/java/jre/lib/i386 :/opt/IBM/WebSphere/AppServer/bin :/opt/IBM/WebSphere/AppServer/nulldllsdir :/usr/lib: -Djava.endorsed.dirs=/opt/IBM/WebSphere/AppServer/endorsed_apis :/opt/IBM/WebSphere/AppServer/java/jre/lib/endorsed -Djava.security.auth.login.config=/opt/IBM/WebSphere/AppServer/profiles/AppSrv01/properties/wsjaas.conf -Djava.security.policy=/opt/IBM/WebSphere/AppServer/profiles/AppSrv01/properties/server.policy com.ibm.wsspi.bootstrap.WSPreLauncher -nosplash -application com.ibm.ws.bootstrap.WSLauncher com.ibm.ws.runtime.WsServer /opt/IBM/WebSphere/AppServer/profiles/AppSrv01/config nullNode01Cell nullNode01 server1
The number of processes for pid 2956 is 115. If this number reaches the total number of threads allocated for this application it means thread starvation has occurred. To find out what is holding these threads you must analyze the thread dumps contained in the javacore files. Most often this occurs because of a memory leak.
vmstat 5 12
This will run vmstat with a 5 second delay between iterations for 12 iterations. Here is a summary of the fields:
- Procs
- r: The number of processes waiting for run time.
- b: The number of processes in uninterruptible sleep.
- Memory
- swpd: the amount of virtual memory used.
- free: the amount of idle memory.
- buff: the amount of memory used as buffers.
- cache: the amount of memory used as cache.
- inact: the amount of inactive memory. (-a option)
- active: the amount of active memory. (-a option)
- Swap
- si: Amount of memory swapped in from disk (/s).
- so: Amount of memory swapped to disk (/s).
- IO
- bi: Blocks received from a block device (blocks/s).
- bo: Blocks sent to a block device (blocks/s).
- System
- in: The number of interrupts per second, including the clock.
- cs: The number of context switches per second.
- CPU
- us: Time spent running non-kernel code. (user time, including nice time)
- sy: Time spent running kernel code. (system time)
- id: Time spent idle. Prior to Linux 2.5.41, this includes IO-wait time.
- wa: Time spent waiting for IO. Prior to Linux 2.5.41, included in idle.
- st: Time stolen from a virtual machine. Prior to Linux 2.6.11, unknown.
procs -----------memory---------- ---swap-- -----io---- --system-- -----cpu----- r b swpd free buff cache si so bi bo in cs us sy id wa st 1 2 0 6911696 175108 636960 0 0 0 1 13 15 0 0 100 0 0 0 0 0 6907868 175136 640272 0 0 201 488 302 1786 1 1 98 0 0 0 0 0 6907884 175136 640280 0 0 0 18 127 204 0 0 100 0 0 0 0 0 6908008 175136 640292 0 0 0 9 129 209 0 0 100 0 0 0 0 0 6908008 175136 640304 0 0 0 10 128 199 0 0 100 0 0 0 0 0 6908008 175136 640312 0 0 0 33 143 204 0 0 100 0 0 1 0 0 6907884 175140 640324 0 0 0 11 134 212 0 0 100 0 0 0 0 0 6907884 175140 640332 0 0 0 11 150 248 0 0 100 0 0 0 0 0 6907356 175140 640348 0 0 0 10 176 293 0 0 100 0 0 0 0 0 6906628 175140 640356 0 0 0 13 247 244 0 1 99 0 0 0 0 0 6906628 175140 640364 0 0 0 18 209 245 0 1 99 0 0 0 0 0 6906628 175140 640380 0 0 0 10 126 207 0 0 100 0 0 1 0 0 6906496 175140 640388 0 0 0 19 204 226 2 0 98 0 0 1 0 0 6906504 175140 640420 0 0 0 14 139 207 0 0 100 0 0 0 0 0 6906504 175140 640428 0 0 0 11 133 201 0 0 100 0 0 0 0 0 6906496 175140 640436 0 0 0 10 136 212 0 0 100 0 0 0 0 0 6906504 175140 640448 0 0 0 9 133 201 0 0 100 0 0 0 0 0 6906008 175140 640456 0 0 0 10 145 203 0 0 99 0 0 1 0 0 6906008 175140 640468 0 0 0 14 132 208 0 0 100 0 0 0 0 0 6906008 175140 640480 0 0 0 14 130 201 0 0 100 0 0 0 0 0 6906016 175140 640488 0 0 0 11 137 209 0 0 100 0 0
Since this server in this example is idle the CPU utilization is mostly 100% idle, but if the CPU is running at 95% or above for extended periods of time the application performance is no doubt poor and you must search for a reason. The other interesting metrics are swap in and out (the si and so columns). If it is swapping memory to disk at a high rate you are approaching or have reached the limits of system memory and performance will suffer.
kill -3
This command produces a java thread dump containing the state of all threads in the JVM at the time of the dump. In the Oracle/Sun JDK the thread dumps are written to stout, so you will find them written to the application server log file. The IBM JDK produces javacore files. The location of the javacore files depends on the OS, to find them use "find / -name *javacore*."
whoami
This is simply the user who is running the script.
[root@infra-was8-forgh ~]# whoami root [root@infra-was8-forgh ~]#
top -bc -d 60 -n 5
Top displays a listing of the most CPU-intensive tasks on the system. The b flag puts it into batch mode to facilitate writing the output to disk. The c flag tells it to display the full command line for each process instead of just the program name. The d flag tells it to delay 60 seconds between refreshes and the n flag tells it to stop after five refreshes.
top - 14:54:08 up 7 days, 3:42, 2 users, load average: 0.00, 0.01, 0.00 Tasks: 250 total, 1 running, 249 sleeping, 0 stopped, 0 zombie Cpu(s): 0.2%us, 0.1%sy, 0.0%ni, 99.6%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st Mem: 8313644k total, 1405676k used, 6907968k free, 175136k buffers Swap: 16383992k total, 0k used, 16383992k free, 640004k cached PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 2956 root 20 0 466m 215m 62m S 19.6 2.7 13:39.16 /opt/IBM/WebSphere/ 30375 root 20 0 2672 1084 768 R 2.0 0.0 0:00.01 top -bc -d 60 -n 5 1 root 20 0 2852 1424 1208 S 0.0 0.0 0:04.07 /sbin/init 2 root 20 0 0 0 0 S 0.0 0.0 0:00.00 [kthreadd] 3 root RT 0 0 0 0 S 0.0 0.0 0:00.03 [migration/0] 4 root 20 0 0 0 0 S 0.0 0.0 0:00.00 [ksoftirqd/0]
top -bH -d 5 -n 49 -p 2956
Here the p flag tells top to show only process 2956 (the app server process). The H flag tells it to show all threads associated with pid 2956.
top - 15:46:23 up 10 days, 4:35, 2 users, load average: 0.01, 0.02, 0.00 Tasks: 114 total, 0 running, 114 sleeping, 0 stopped, 0 zombie Cpu(s): 0.2%us, 0.2%sy, 0.0%ni, 99.7%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st Mem: 8313644k total, 2819832k used, 5493812k free, 266804k buffers Swap: 16383992k total, 0k used, 16383992k free, 1824068k cached PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 2956 root 20 0 466m 216m 62m S 0.0 2.7 0:00.00 java 2957 root 20 0 466m 216m 62m S 0.0 2.7 0:15.31 P=557864:O=0:CT 2958 root 20 0 466m 216m 62m S 0.0 2.7 0:00.23 Signal Reporter 2960 root 20 0 466m 216m 62m S 0.0 2.7 0:08.43 JIT Compilation 2961 root 20 0 466m 216m 62m S 0.0 2.7 0:04.35 JIT Compilation 2962 root 20 0 466m 216m 62m S 0.0 2.7 0:00.00 JIT Compilation 2963 root 20 0 466m 216m 62m S 0.0 2.7 1:20.71 JIT Sampler 2964 root 20 0 466m 216m 62m S 0.0 2.7 0:05.58 JIT IProfiler 2965 root 20 0 466m 216m 62m S 0.0 2.7 0:00.00 Signal Dispatch 2966 root 20 0 466m 216m 62m S 0.0 2.7 0:00.12 Finalizer maste 2967 root 20 0 466m 216m 62m S 0.0 2.7 0:34.41 Concurrent Mark 2968 root 20 0 466m 216m 62m S 0.0 2.7 0:09.31 GC Slave 2969 root 20 0 466m 216m 62m S 0.0 2.7 0:09.36 GC Slave 2970 root 20 0 466m 216m 62m S 0.0 2.7 0:09.28 GC Slave 2973 root 20 0 466m 216m 62m S 0.0 2.7 0:08.57 Thread-8
Recommended Comments
There are no comments to display.
Create an account or sign in to comment
You need to be a member in order to leave a comment
Create an account
Sign up for a new account in our community. It's easy!
Register a new accountSign in
Already have an account? Sign in here.
Sign In Now