Axional Server runs on top of a Java virtual machine. You can use several tools to live monitor the Java VM depending the needs.

  • Standard Java visual consoles
  • Axional Server Consoles using any of the available interfaces (Eclipse SWT, WEB or TCP).
  • OS & Java command line tools

In general, you will use:

  1. Standard Java visual consoles, to monitor a Java VM during normal operations.
  2. Axional Server consoles, to get a deep view of server internals.
  3. Command line tools, to deal with live problems and perform automatic health check.
In the following sections we explain how to monitor Java VM for using command line tools.

1 Production problems resolution

One of the most common “reflexes” with Java EE production support teams is that Java VM / middleware restart is often the first recovery action that is performed. While a premature restart can quickly eliminate the business impact, it can also prevent you to get all the technical facts; reducing your capability to identify the root cause and exposing the platform to future re-occurrences of the problem.

Before pulling the trigger and shutdown your Java VM process, ask yourself the following question:

do I have all the data available to perform a root cause analysis post restart ?

If the answer is no then my recommendation to you is to review and improve your current platform monitoring and / or troubleshooting approaches.

In the following sections we will show how to examine a Java virtual machine using command line tools for either.

  • Detect a problem
  • Analyze a the causes of a problem

For the above tasks we will use several command line tools.

Type Tool Description
Standard linux commands
Linux top top (table of processes) is a task manager program found in many Unix-like operating systems. It produces an ordered list of running processes selected by user-specified criteria, and updates it periodically
Standard java tools that shoud be accessibe from command PATH
Java jps

The jps tool lists the instrumented HotSpot Java Virtual Machines (JVMs) on the target system. The tool is limited to reporting information on JVMs for which it has the access permissions.

In order to monitor a Java VM, you should know or determine it's process identification (PID). To find out the PID of your java process you can use either jps (or just the normal ps command).

Java jstack jstack prints Java stack traces of Java threads for a given Java process or core file or a remote debug server. For each Java frame, the full class name, method name, 'bci' (byte code index) and line number, if available, are printed
Java jcmd The jcmd utility is used to send diagnostic command requests to the JVM, where these requests are useful for controlling Java Flight Recordings, troubleshoot, and diagnose JVM and Java Applications. It must be used on the same machine where the JVM is running, and have the same effective user and group identifiers that were used to launch the JVM.
Java jstat jstat which can be used to read statistics about the garbage collector.
Java jmap jmap prints shared object memory maps or heap memory details of a given process or core file or a remote debug server
Tools provided in server distribution as shell commands under bin directory
Shell jtop.sh A linux shell script to determine top comsuming threads that exceed a maximum value.
Java (jar) jvmtop jvmtop is a Java command line tool (jar) that shows the TOP CPU consuming threads on a virtual machine

2 CPU utilization

We can monitor Java cpu utilization using diferent command tools. In the following sections we will use:

  • top linux command + jstack java command
  • jvmtop command line java tool

2.1 Monitoring thread activity

To obtain a thread dump using jstack, run the following command:

Copy
$ jstack <PID>
Full thread dump Java HotSpot(TM) 64-Bit Server VM (25.60-b23 mixed mode):

"Attach Listener" #72 daemon prio=9 os_prio=31 tid=0x00007fdfe1088800 nid=0x14ae3 waiting on condition [0x0000000000000000]
   java.lang.Thread.State: RUNNABLE

"pool-1-thread-1" #70 prio=5 os_prio=31 tid=0x00007fdfe201c800 nid=0x15603 waiting on condition [0x000070000469f000]
   java.lang.Thread.State: WAITING (parking)
	at sun.misc.Unsafe.park(Native Method)
	- parking to wait for  <0x00000005c0139120> (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
	at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
	at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2039)
	at java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:442)
	at java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:1067)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1127)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
	at java.lang.Thread.run(Thread.java:745)

"pool-20-thread-1" #69 prio=5 os_prio=31 tid=0x00007fdfe2102800 nid=0x15403 waiting on condition [0x000070000459c000]
   java.lang.Thread.State: WAITING (parking)
	at sun.misc.Unsafe.park(Native Method)
	- parking to wait for  <0x00000005d83b3820> (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
	at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
	at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2039)
	at java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:442)
	at java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:1067)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1127)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
	at java.lang.Thread.run(Thread.java:745)
...
"VM Thread" os_prio=31 tid=0x00007fdfe102c800 nid=0x3403 runnable 

"GC task thread#0 (ParallelGC)" os_prio=31 tid=0x00007fdfe3808000 nid=0x2403 runnable 

"GC task thread#1 (ParallelGC)" os_prio=31 tid=0x00007fdfe3808800 nid=0x2603 runnable 

"GC task thread#2 (ParallelGC)" os_prio=31 tid=0x00007fdfe3809000 nid=0x2803 runnable 

"GC task thread#3 (ParallelGC)" os_prio=31 tid=0x00007fdfe380a000 nid=0x2a03 runnable 

"GC task thread#4 (ParallelGC)" os_prio=31 tid=0x00007fdfe380a800 nid=0x2c03 runnable 

"GC task thread#5 (ParallelGC)" os_prio=31 tid=0x00007fdfe380b000 nid=0x2e03 runnable 

"GC task thread#6 (ParallelGC)" os_prio=31 tid=0x00007fdfe380b800 nid=0x3003 runnable 

"GC task thread#7 (ParallelGC)" os_prio=31 tid=0x00007fdfe0804000 nid=0x3203 runnable 

"VM Periodic Task Thread" os_prio=31 tid=0x00007fdfe082a000 nid=0x5203 waiting on condition 

JNI global references: 7672

2.2 Monitoring heavy load

If server overall response is slow, you may need to determine if CPU is heavy loaded. You can use Linux top command to look for that situation. But it's difficult to determine witch thread is consuming most resources.

In the following sections we will learn how to determine the top consuming thread and more important: what's doing.

2.2.1 A sample Java program

Lets consider the following java process we will use for CPU monitoring. Compile and run it.

Copy
import java.security.MessageDigest;
import java.security.NoSuchAlgorithmException;
import java.util.Random;
import java.util.UUID;

/**
 * thread that does some heavy lifting
 */
class HeavyThread implements Runnable {

        private long length;

        public HeavyThread(long length) {
                this.length = length;
                new Thread(this, "Heavy").start();
        }

        @Override
        public void run() {
                while (true) {
                        String data = "";

                        // make some shit up
                        for (int i = 0; i < length; i++) {
                                data += UUID.randomUUID().toString();
                        }

                        MessageDigest digest;
                        try {
                                digest = MessageDigest.getInstance("MD5");
                        } catch (NoSuchAlgorithmException e) {
                                throw new RuntimeException(e);
                        }

                        // hash that shit
                        digest.update(data.getBytes());
                }
        }
}

/**
 * thread that does little work. just count & sleep
 */
class LightThread implements Runnable {

        public LightThread() {
                new Thread(this, "Light").start();
        }

        @Override
        public void run() {
                Long l = 0l;
                while(true) {
                        l++;
                        try {
                                Thread.sleep(new Random().nextInt(10));
                        } catch (InterruptedException e) {
                                e.printStackTrace();
                        }
                        if(l == Long.MAX_VALUE) {
                                l = 0l;
                        }
                }
        }
}

public class cputest {

    public static void main(String[] args) {
            // start 1 heavy ...
            new HeavyThread(1000);

            // ... and 3 light threads
            new LightThread();
            new LightThread();
            new LightThread();
    }
}

2.2.2 Determine the Java proceses PID

To monitor any Java process you need to determine it's operating sistem PID. Use jps command to list all running virtual machines.

Copy
$ jps
4048 Jps
3871 cputest

2.2.3 Determine top consuming thread using top and jstack

Given a Java process we can use top to determine all threads. To do that we need to run top with flags:

  • -b Runs in batch mode
  • -n Runs a specified number of times
  • -H When this toggle is On, all individual threads will be displayed. Otherwise, top displays a summation of all threads in a process.
  • -p Monitor only processes with specified process IDs

Lets now examine the Java process.

  1. To examine which threads are consuming the most CPU use top the -H option for the PID of the Java process.

    top will show a per-thread breakdown of the CPU usage. The column PID in the top output shows the thread ID and the column %CPU will show the percentaje of CPU usage for each thread.

    Copy
    $ top -b -n 1 -Hp 3871
    top - 22:08:33 up 42 days, 13:29,  3 users,  load average: 0.54, 0.17, 0.09
    Threads:  19 total,   2 running,  17 sleeping,   0 stopped,   0 zombie
    %Cpu(s): 24.6 us,  1.4 sy,  0.0 ni, 74.0 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
    KiB Mem :  8175288 total,  1810664 free,  2191828 used,  4172796 buff/cache
    KiB Swap:  2097148 total,  1918904 free,   178244 used.  5199924 avail Mem 
    
      PID USER      PR  NI    VIRT    RES    SHR S %CPU %MEM     TIME+ COMMAND                          
     3886 deister   20   0 4793104 712400  12080 S 98.7  8.7   0:48.73 java                             
     3887 deister   20   0 4793104 712400  12080 S  1.8  8.7   0:00.54 java                             
     3888 deister   20   0 4793104 712400  12080 S  1.3  8.7   0:00.52 java                             
     3877 deister   20   0 4793104 712400  12080 S  1.0  8.7   0:00.38 java                             
     3889 deister   20   0 4793104 712400  12080 S  1.0  8.7   0:00.52 java                             
     3873 deister   20   0 4793104 712400  12080 S  0.3  8.7   0:00.12 java                             
     3874 deister   20   0 4793104 712400  12080 R  0.3  8.7   0:00.10 java                             
     3876 deister   20   0 4793104 712400  12080 S  0.3  8.7   0:00.11 java                             
     3871 deister   20   0 4793104 712400  12080 S  0.0  8.7   0:00.00 java 
     3872 deister   20   0 4793104 712400  12080 S  0.0  8.7   0:00.06 java                             
     3875 deister   20   0 4793104 712400  12080 R  0.0  8.7   0:00.10 java                             
     3878 deister   20   0 4793104 712400  12080 S  0.0  8.7   0:00.00 java                             
     3879 deister   20   0 4793104 712400  12080 S  0.0  8.7   0:00.00 java                             
     3880 deister   20   0 4793104 712400  12080 S  0.0  8.7   0:00.00 java                             
     3881 deister   20   0 4793104 712400  12080 S  0.0  8.7   0:00.35 java                             
     3882 deister   20   0 4793104 712400  12080 S  0.0  8.7   0:00.18 java                             
     3883 deister   20   0 4793104 712400  12080 S  0.0  8.7   0:00.09 java                             
     3884 deister   20   0 4793104 712400  12080 S  0.0  8.7   0:00.00 java                             
     3885 deister   20   0 4793104 712400  12080 S  0.0  8.7   0:00.03 java
  2. Get the HEX values for PID listed as top CPU consumers, in our sample are the thread 3886:
    Copy
    $ echo "obase=16; 3886" | bc
    F2E -> this is the top consumer!
  3. Perform a stack dump of java process and look for the nid equal to F2E:

    if jstack fails

    Under a heavy load jstack may fail with the following message:

    24760: Unable to open socket file: target process not responding or HotSpot VM not loaded

    The -F option can be used when the target process is not responding

    Use the -F (as suggested by jstack):
    Copy
    $ jstack 3871
    2017-04-29 22:18:58
    Full thread dump OpenJDK 64-Bit Server VM (25.121-b13 mixed mode):
    
    "Attach Listener" #14 daemon prio=9 os_prio=0 tid=0x00007fabf4001000 nid=0x104d waiting on condition [0x0000000000000000]
       java.lang.Thread.State: RUNNABLE
    
    "DestroyJavaVM" #13 prio=5 os_prio=0 tid=0x00007fac34008800 nid=0xf20 waiting on condition [0x0000000000000000]
       java.lang.Thread.State: RUNNABLE
    
    "Light" #12 prio=5 os_prio=0 tid=0x00007fac3410d000 nid=0xf31 waiting on condition [0x00007fac0eae9000]
       java.lang.Thread.State: TIMED_WAITING (sleeping)
    	at java.lang.Thread.sleep(Native Method)
    	at LightThread.run(cputest.java:56)
    	at java.lang.Thread.run(Thread.java:745)
    
    "Light" #11 prio=5 os_prio=0 tid=0x00007fac3410b000 nid=0xf30 waiting on condition [0x00007fac0ebea000]
       java.lang.Thread.State: TIMED_WAITING (sleeping)
    	at java.lang.Thread.sleep(Native Method)
    	at LightThread.run(cputest.java:56)
    	at java.lang.Thread.run(Thread.java:745)
    
    "Light" #10 prio=5 os_prio=0 tid=0x00007fac34109800 nid=0xf2f runnable [0x00007fac0eceb000]
       java.lang.Thread.State: RUNNABLE
    	at java.lang.Thread.sleep(Native Method)
    	at LightThread.run(cputest.java:56)
    	at java.lang.Thread.run(Thread.java:745)
    
    "Heavy" #9 prio=5 os_prio=0 tid=0x00007fac34107800 nid=0xf2e runnable [0x00007fac0edec000]
       java.lang.Thread.State: RUNNABLE
    	at sun.security.provider.SHA.implCompress(SHA.java:117)
    	at sun.security.provider.SHA.implDigest(SHA.java:98)
    	at sun.security.provider.DigestBase.engineDigest(DigestBase.java:181)
    	at sun.security.provider.DigestBase.engineDigest(DigestBase.java:160)
    	at java.security.MessageDigest$Delegate.engineDigest(MessageDigest.java:592)
    	at java.security.MessageDigest.digest(MessageDigest.java:365)
    	at sun.security.provider.SecureRandom.engineNextBytes(SecureRandom.java:244)
    	- locked <0x000000008345b5e8> (a sun.security.provider.SecureRandom)
    	at sun.security.provider.NativePRNG$RandomIO.implNextBytes(NativePRNG.java:534)
    	at sun.security.provider.NativePRNG$RandomIO.access$400(NativePRNG.java:331)
    	at sun.security.provider.NativePRNG.engineNextBytes(NativePRNG.java:220)
    	at java.security.SecureRandom.nextBytes(SecureRandom.java:468)
    	at java.util.UUID.randomUUID(UUID.java:145)
    	at HeavyThread.run(cputest.java:25)
    	at java.lang.Thread.run(Thread.java:745)
    
    "Service Thread" #8 daemon prio=9 os_prio=0 tid=0x00007fac340dc000 nid=0xf2c runnable [0x0000000000000000]
       java.lang.Thread.State: RUNNABLE
    
    "C1 CompilerThread2" #7 daemon prio=9 os_prio=0 tid=0x00007fac340cf000 nid=0xf2b waiting on condition [0x0000000000000000]
       java.lang.Thread.State: RUNNABLE
    
    "C2 CompilerThread1" #6 daemon prio=9 os_prio=0 tid=0x00007fac340cd800 nid=0xf2a waiting on condition [0x0000000000000000]
       java.lang.Thread.State: RUNNABLE
    
    "C2 CompilerThread0" #5 daemon prio=9 os_prio=0 tid=0x00007fac340c0000 nid=0xf29 waiting on condition [0x0000000000000000]
       java.lang.Thread.State: RUNNABLE
    
    "Signal Dispatcher" #4 daemon prio=9 os_prio=0 tid=0x00007fac340be000 nid=0xf28 runnable [0x0000000000000000]
       java.lang.Thread.State: RUNNABLE
    
    "Finalizer" #3 daemon prio=8 os_prio=0 tid=0x00007fac34093000 nid=0xf27 in Object.wait() [0x00007fac0f4f3000]
       java.lang.Thread.State: WAITING (on object monitor)
    	at java.lang.Object.wait(Native Method)
    	- waiting on <0x0000000083408c18> (a java.lang.ref.ReferenceQueue$Lock)
    	at java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java:143)
    	- locked <0x0000000083408c18> (a java.lang.ref.ReferenceQueue$Lock)
    	at java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java:164)
    	at java.lang.ref.Finalizer$FinalizerThread.run(Finalizer.java:209)
    
    "Reference Handler" #2 daemon prio=10 os_prio=0 tid=0x00007fac3408e800 nid=0xf26 in Object.wait() [0x00007fac0f5f4000]
       java.lang.Thread.State: WAITING (on object monitor)
    	at java.lang.Object.wait(Native Method)
    	- waiting on <0x0000000083408e48> (a java.lang.ref.Reference$Lock)
    	at java.lang.Object.wait(Object.java:502)
    	at java.lang.ref.Reference.tryHandlePending(Reference.java:191)
    	- locked <0x0000000083408e48> (a java.lang.ref.Reference$Lock)
    	at java.lang.ref.Reference$ReferenceHandler.run(Reference.java:153)
    
    "VM Thread" os_prio=0 tid=0x00007fac34084800 nid=0xf25 runnable 
    
    "GC task thread#0 (ParallelGC)" os_prio=0 tid=0x00007fac3401e000 nid=0xf21 runnable 
    
    "GC task thread#1 (ParallelGC)" os_prio=0 tid=0x00007fac3401f800 nid=0xf22 runnable 
    
    "GC task thread#2 (ParallelGC)" os_prio=0 tid=0x00007fac34021800 nid=0xf23 runnable 
    
    "GC task thread#3 (ParallelGC)" os_prio=0 tid=0x00007fac34023800 nid=0xf24 runnable 
    
    "VM Periodic Task Thread" os_prio=0 tid=0x00007fac340de800 nid=0xf2d waiting on condition 
    
    JNI global references: 13
    
    </result>

We have found the problematic thread under nid=0xf2e correspondin to thread 3886

Copy
"Heavy" #9 prio=5 os_prio=0 tid=0x00007fac34107800 nid=0xf2e runnable [0x00007fac0edec000]
   java.lang.Thread.State: RUNNABLE
	at sun.security.provider.SHA.implCompress(SHA.java:117)
	at sun.security.provider.SHA.implDigest(SHA.java:98)
	at sun.security.provider.DigestBase.engineDigest(DigestBase.java:181)
	at sun.security.provider.DigestBase.engineDigest(DigestBase.java:160)
	at java.security.MessageDigest$Delegate.engineDigest(MessageDigest.java:592)
	at java.security.MessageDigest.digest(MessageDigest.java:365)
	at sun.security.provider.SecureRandom.engineNextBytes(SecureRandom.java:244)
	- locked <0x000000008345b5e8> (a sun.security.provider.SecureRandom)
	at sun.security.provider.NativePRNG$RandomIO.implNextBytes(NativePRNG.java:534)
	at sun.security.provider.NativePRNG$RandomIO.access$400(NativePRNG.java:331)
	at sun.security.provider.NativePRNG.engineNextBytes(NativePRNG.java:220)
	at java.security.SecureRandom.nextBytes(SecureRandom.java:468)
	at java.util.UUID.randomUUID(UUID.java:145)
	at HeavyThread.run(cputest.java:25)
	at java.lang.Thread.run(Thread.java:745)

Putting all together

We can put all operations in a single shell command. Let's first do a step by step approach.

  1. Get the threads for the given process 3871 and extract the first one (top consuming).
    Copy
    top -b -n 1 -Hp 3871 | grep '^\s[0-9]' -m 1
    3886 deister    20   0 4859668 733696  12168 R 93.3  9.0  90:09.77 java
  2. Now, transform the thread id of top thread into hex nid
    Copy
    top -b -n 1 -Hp 3871 | grep '^\s[0-9]' -m 1 | awk '{ printf "nid=0x%x\n", $1 }'
    nid=0xf2e
  3. And to get the stack trace for this thread
    Copy
    jstack 3871 | grep -A 10 nid=0xf2e
    "Heavy" #9 prio=5 os_prio=0 tid=0x00007fac34107800 nid=0xf2e runnable [0x00007fac0edec000]
       java.lang.Thread.State: RUNNABLE
    	at sun.security.provider.SHA.implDigest(SHA.java:98)
    	at sun.security.provider.DigestBase.engineDigest(DigestBase.java:181)
    	at sun.security.provider.DigestBase.engineDigest(DigestBase.java:160)
    	at java.security.MessageDigest$Delegate.engineDigest(MessageDigest.java:592)
    	at java.security.MessageDigest.digest(MessageDigest.java:365)
    	at sun.security.provider.SecureRandom.engineNextBytes(SecureRandom.java:244)
    	- locked <0x000000008345b5e8> (a sun.security.provider.SecureRandom)
    	at sun.security.provider.NativePRNG$RandomIO.implNextBytes(NativePRNG.java:534)
    	at sun.security.provider.NativePRNG$RandomIO.access$400(NativePRNG.java:331)

A single shell command to get top consuming java thread stack trace is:

Copy
jstack 3871 | grep -A 10 `top -b -n 1 -Hp 3871 | grep '^\s[0-9]' -m 1 | awk '{ printf "nid=0x%x\n", $1 }'`

A jtop shell

Finally we can use all previous knowledge to build a Linux shell that automatically scans Java virtual machines using jps. And foreach VM determines if any of it's threads exeeds a maximum CPU usage.

Copy
CPUMAX=75
jps | \
while read LINE; do
    PID=`echo $LINE | awk '{ print $1 }'`
    CMD=`echo $LINE | awk '{ print $2 }'`
    TOP=`top -b -n 1 -Hp $PID | 
        grep '^\s*[0-9]' -m 2 | 
        awk '{ if ($9 > $CPUMAX) {printf ("nid=0x%x %f\n", $1, $9) }}'`

    if [ "X$TOP" != "X" ]; then
       TID=`echo $TOP | awk '{ print $1 }'`
       TLD=`echo $TOP | awk '{ print $2 }'`
       printf "WARNING %7d [%30s] HIGH LOAD %s ON THREAD %s\n" $PID $CMD $TLD $TID
       jstack $PID | grep -A 8 $TID
    else
       printf "        %7d [%30s] GOOD\n" $PID $CMD
    fi
done
.         11834 [                           Jps] GOOD
WARNING    3871 [                       cputest] HIGH LOAD 99.900000 ON THREAD nid=0xf2e
"Heavy" #9 prio=5 os_prio=0 tid=0x00007f2a6c107800 nid=0x1aea runnable [0x00007f2a4dd01000]
   java.lang.Thread.State: RUNNABLE
	at sun.security.provider.SHA.implCompress(SHA.java:117)
...

2.2.4 Determine top consuming thread using jvmtop

A simple way to monitor top CPU usage on a JAVA virtual machine can be accomplished by using jvmtop tool. See the following open-source Github project.

The Heavy thread is shown as the top consumer holding >97% of CPU.

Copy
$ bin/jvmtop.sh 3871
JvmTop 0.8.0 alpha - 20:01:43, x86_64,  8 cpus, Mac OS X 10.12., load avg 2.31
 http://code.google.com/p/jvmtop

 PID 3871: cputest 
 ARGS: 
 VMARGS: -Dfile.encoding=ISO-8859-1
 VM: Oracle Corporation Java HotSpot(TM) 64-Bit Server VM 1.8.0_65
 UP:  0: 0m  #THR: 13   #THRPEAK: 13   #THRCREATED: 14   USER: deister
 GC-Time:  0: 0m   #GC-Runs: 1066      #TotalLoadedClasses: 1562    
 CPU: 13.31% GC:  0.11% HEAP: 270m /7282m NONHEAP:  14m /  n/a

  TID   NAME                                    STATE    CPU  TOTALCPU BLOCKEDBY
     10 Heavy                                RUNNABLE 97.94%    88.57%       
     18 RMI TCP Connection(1)-192.168.       RUNNABLE  1.12%     1.10%       
     11 Light                           TIMED_WAITING  0.64%     0.59%       
     12 Light                           TIMED_WAITING  0.64%     0.59%       
     13 Light                           TIMED_WAITING  0.59%     0.60%       
     20 JMX server connection timeout   TIMED_WAITING  0.08%     0.04%       
     19 RMI Scheduler(0)                TIMED_WAITING  0.00%     0.00%       
     17 RMI TCP Accept-0                     RUNNABLE  0.00%     0.00%       
     15 Attach Listener                      RUNNABLE  0.00%     0.22%       
     14 DestroyJavaVM                        RUNNABLE  0.00%     0.10%

3 Memory utilization

The JVM uses memory in a number of different ways. The primary, but not singular, use of memory is in the heap. Outside of the heap, memory is also consumed by Metaspace and the stack.

  • Java Heap - The heap is where your Class instantiations or “Objects” are stored. Instance variables are stored in Objects. When discussing Java memory and optimization we most often discuss the heap because we have the most control over it and it is where Garbage Collection (and GC optimizations) take place. Heap size is controlled by the -Xms and -Xmx JVM flags. Read more about GC and The Heap
  • Java Stack - Each thread has its own call stack. The stack stores primitive local variables and object references along with the call stack (method invocations) itself. The stack is cleaned up as stack frames move out of context so there is no GC performed here. The -Xss JVM option controls how much memory gets allocated for each thread’s stack.
  • Metaspace - Metaspace stores the Class definitions of your Objects. The size of Metaspace is controlled by setting -XX:MetaspaceSize. Read more about Metaspace.
  • Additional JVM overhead - In addition to the above values there is some memory consumed by the JVM itself. This holds the C libraries for the JVM and some C memory allocation overhead that it takes to run the rest of the memory pools above. Visibility tools that run on the JVM won’t show this overhead so while they can give an idea of how an application uses memory they can’t show the total memory use of the JVM process.

3.1 Profiling memory use of a Java application

It is important to understand how an application will use memory in both a development and production environment. The majority of memory issues can be reproduced in any environment without significant effort. It is often easier to troubleshoot memory issues on your local machine because you’ll have access to more tools and won’t have to be as concerned with side effects that monitoring tools may cause.

There are a number of tools available for gaining insight into Java application memory use. Some are packaged with the Java runtime itself so should already be on your development machine. Some are available from 3rd parties.

This is not meant to be an exhaustive list, but rather a starting point to your exploration of these tools. Tools that come with the Java runtime include jmap for doing heap dumps and gathering memory statistics, jstack for inspecting the threads running at any given time, jstat for general JVM statistic gathering, and jhat for analyzing heap dumps.

3.1.1 jstat

jstat is a monitoring tool in HotSpot JVM. jstat does not provide only the GC operation information display. It also provides class loader operation information or Just-in-Time compiler operation information. Among all the information jstat can provide, in this article we will only cover its functionality to monitor GC operating information.

jstat has serveral options to show different class of information. Those options are listed below.

gc It shows the current size for each heap area and its current usage (Ede, survivor, old, etc.), total number of GC performed, and the accumulated time for GC operations.
gccapactiy It shows the minimum size (ms) and maximum size (mx) of each heap area, current size, and the number of GC performed for each area. (Does not show current usage and accumulated time for GC operations.)
gccause It shows the "information provided by -gcutil" + reason for the last GC and the reason for the current GC.
gcnew Shows the GC performance data for the new area.
gcnewcapacity Shows statistics for the size of new area.
gcold Shows the GC performance data for the old area.
gcoldcapacity Shows statistics for the size of old area.
gcpermcapacity Shows statistics for the permanent area.
gcutil Shows the usage for each heap area in percentage. Also shows the total number of GC performed and the accumulated time for GC operations.
To interpret jstat results you can use a online tool like: jstat Garbage Collection visualizer

3.1.2 jmap

jmap

The Java jmap utility provides a number of useful options to summarize heap usage, and get a break down of objects in the new and old generations. And most important:

jmap can acquire a memory dump from a running Java VM to be analized.

3.2 Monitor memory usage

You can monitor memory usage from a given Java VM using jstat command line tool and it's gc (Garbage-collected heap statistics) flag.

Copy
$ jstat -gc <PID>
S0C    S1C    S0U    S1U      EC       EU        OC         OU       MC        MU      CCSC   CCSU       YGC     YGCT  FGC    FGCT      GCT   
8704.0 9728.0  0.0   9704.9 1219584.0 331789.7  489984.0   264269.6  188160.0 175473.1 24832.0 20906.4    329    4.774  240   115.657  120.431

The Garbage-collected heap statistics columns indicate:

Position Column Type Description
1 S0C CAPACITY Current survivor space 0 capacity (KB).
2 S1C CAPACITY Current survivor space 1 capacity (KB).
3 S0U UTILIZATION Survivor space 0 utilization (KB).
4 S1U UTILIZATION Survivor space 1 utilization (KB).
5 EC CAPACITY Current eden space capacity (KB).
6 EU UTILIZATION Eden space utilization (KB).
7 OC CAPACITY Current old space capacity (KB).
8 OU UTILIZATION Old space utilization (KB).
9 MC CAPACITY Metaspace capacity (kB).
10 MU UTILIZATION Metacspace utilization (kB).
11 CCSC CAPACITY Compressed class space capacity (kB).
12 CCSU UTILIZATION Compressed class space used (kB).
13 YGC COUNT Number of young generation GC Events.
14 YGCT TIME Young generation garbage collection time.
15 FGC COUNT Number of full GC events.
16 FGCT TIME Full garbage collection time.
17 GCT TIME Total garbage collection time.

Now we can get the sum of the used memory on different pools.

Copy
jstat -gc <PID> | tail -n 1 | awk '{split($0,a," "); sum=a[3]+a[4]+a[6]+a[8]+a[10]; print sum}'
619715

The command basically sums up:

  • S0U: Survivor space 0 utilization (kB).
  • S1U: Survivor space 1 utilization (kB).
  • EU: Eden space utilization (kB).
  • OU: Old space utilization (kB).
  • MU: Meta space utilization (kB).
You may also want to include the metaspace and the compressed class space utilization. In this case you have to add a[10] and a[12] to the awk sum.

And to see the sum of available memory on different pools.

Now let's try to get a list of memory usage on heap spaces and metadata.

Copy
jstat -gc <PID> | tail -n 1 | awk '{split($0,a," "); \
sSize=a[1] + a[2]; \
sUsed=a[3] + a[4]; \
eSize=a[5];  \
eUsed=a[6];  \
oSize=a[7];  \
oUsed=a[8];  \
mSize=a[9];  \
mUsed=a[10]; \
sPcnt=(sUsed / sSize) * 100; \
ePcnt=(eUsed / eSize) * 100; \
oPcnt=(oUsed / oSize) * 100; \
mPcnt=(mUsed / mSize) * 100; \
printf (" Memory pool     Size        Used         %% used\n");  \
printf (" --------------  ----------- ------------ -------\n");  \
printf (" Survivor Space: %8d MB  %8.1f MB %5.1f%%\n", sSize/1024, sUsed/1024, sPcnt);  \
printf ("     Eden Space: %8d MB  %8.1f MB %5.1f%%\n", eSize/1024, eUsed/1024, ePcnt);  \
printf ("        Old Gen: %8d MB  %8.1f MB %5.1f%%\n", oSize/1024, oUsed/1024, oPcnt);  \
printf ("      Metaspace: %8d MB  %8.1f MB %5.1f%%\n", mSize/1024, mUsed/1024, mPcnt);  \
}'
Memory pool     Size        Used         % used
 --------------  ----------- ------------ -------
 Survivor Space:      321 MB       0.0 MB   0.0%
     Eden Space:      683 MB     305.0 MB  44.6%
        Old Gen:      690 MB     102.4 MB  14.8%
      Metaspace:       86 MB      72.9 MB  84.1%

3.2.1 Interpreting results

Old Gen space should have space free for normal operations. If this space shows permanently high percent of usage you need to add more memory to JVM or review application logic.

3.3 Monintor GC usage

We can monitor gc performance using -gcutil (and showing timestamp -t).

Copy
$ jstat -gcutil -t <PID>
Timestamp         S0     S1     E      O      M     CCS    YGC     YGCT    FGC    FGCT     GCT   
       317392.4  31.23   0.00  21.22  62.34  80.13  73.80    614   14.875    42    7.761   22.635

The Garbage-collected heap statistics columns indicate:

Position Column Type Description
1 Timestamp TIME Time since JVM was started.
2 S0 UTILIZATION Survivor space 0 utilization as a percentage of the space's current capacity.
3 S1 UTILIZATION Survivor space 1 utilization as a percentage of the space's current capacity
4 E UTILIZATION Eden space utilization as a percentage of the space's current capacity.
5 O UTILIZATION Old space utilization as a percentage of the space's current capacity.
6 M UTILIZATION Metaspace utilization as a percentage of the space's current capacity.
7 CCS UTILIZATION Compressed class space utilization as a percentage.
8 YGC COUNT Number of young generation GC Events.
9 YGCT TIME Young generation garbage collection time.
10 FGC COUNT Number of full GC events.
11 FGCT TIME Full garbage collection time.
12 GCT TIME Total garbage collection time.

Now evaluate the following values:

$$Minor GC time=\frac{YGCT}{YGC}$$ $$Minor GC freq=\frac{YGC}{Timestamp}$$ $$Full GC time=\frac{FGCT}{FGC}$$ $$Full GC freq=\frac{FGC}{Timestamp}$$

3.3.1 Interpreting results

If the GC execution time meets all of the following conditions, GC tuning is not required.

  • Minor GC is processed quickly (within 50 ms / 0.05 s).
  • Minor GC is not frequently executed (about 10 seconds).
  • Full GC is processed quickly (within 1 second).
  • Full GC is not frequently executed (once per 10 minutes / 600 seconds).

A heavy loaded sytem may be caused by excessive GC. If during CPU analisis you determine that GC threads like:

Copy
GC task thread#0 (ParallelGC)

are consuming lot of CPU you are experiencing GC problems.

3.3.2 Putting al together

Now we can put all toguether in a single command line to get all values.

Copy
jstat -gcutil -t <PID> | tail -n 1 | 
awk '{split($0,a," "); \
MGCtime=a[9]/a[8]; \
MGCfreq=a[8]/a[1]; \
FGCtime=a[11]/a[10]; \
FGCfreq=a[10]/(a[1]/60); \
printf("MinorGCtime=%f secs\nMinorGCfreq=%f times/secs\nFullGCtime=%f secs\nFullGCfreq=%f times/min\n", MGCtime, MGCfreq, FGCtime, FGCfreq) }'
MinorGCtime=0.023574 secs
MinorGCfreq=0.002055 times/secs
FullGCtime=0.174174 secs
FullGCfreq=0.008657 times/min

3.4 Acquire a heap dump for diagnostics

If you expecience memory problems, acquire a heap dump of the throubleshoting Java viertual machine instance.

jmap prints shared object memory maps or heap memory details of a given process or core file or a remote debug server. The generated file can be analyzed by tools like Eclipse Mat

For a give process (pid) you can generate a heap dump by doing:

Copy
$ jmap -dump:format=b,file=heapdump.hprof [pid]

3.4.1 Heap dump when OutOfmemoryError

Under high load and specially during Apache FOP rendering or using tools like Apache POI, GraphViz or HtmlUnit, servers can expeience OutOfMemory situations. If it's necessary to determine the cause of a failing server during OutOfMemory situations you can setup JVM to generate a heap dump.

To acquire a heap dump during OutOfMemory add the -XX:+HeapDumpOnOutOfMemoryError parameter in startup script that launches Java VM.

By default the heap dump is created in a file called java_pid.hprof in the working directory of the VM. You can specify an alternative file name or directory with the -XX:HeapDumpPath= option.

For example -XX:HeapDumpPath=/disk2/dumps will cause the heap dump to be generated in the /disk2/dumps directory.

4 Disk space

Each Axional Server instance uses a temporary disk space under the directory defined in standard java.io.tmpdir property variable. The name of the sub directory used by a server takes the form

axional-PID,

where PID is the process id of Java VM for the server instance.

The java.io.tmpdir is defined (may be changed) in the server startup shell.

This directory may grow to allocate temporay data used for application components, specially for JDBC caches of BLOBs and memory mapped SQL cache (if enabled, default is memory cache).

Try to keep free at least 500 Mb to be used foreach server instance and increase if need. The space may be cleanned on shutdown as no permanent data is stored on it.

5 Database server problems

Any Axional Server is mainly used for database intensive operations. In fact, the Axional Studio framework (previosly called webstudio) is fully oriented to run database applications and even application server configuration is located on a central database.

If database experiences problems, is responding slow or hangs, application server will stop expedience problems too and stop responging till database recovers. On Axional Studio database acts like a disk. If disk fails, system will fail. If disk hangs, server will appear it's hanged.

6 In case of server failure

If you experience problems on a server, before stoping it:

  1. Verify cpu consume using top and jtop scripts shown in this document.
  2. Verify server is not blocked waiting on a non responding database server or network operation. If running on informix, run onstat to ensure primary database server is online.
  3. Perform a few stack traces with a few seconds of difference to see threads state and it's evolution using jstack and keep them on files (like stack.1, stack.2, stack.n) and keep the form diagnostics.
  4. Determine memory used using jstat and keep collected verbose gc log if enabled.
  5. Perform a jmap memory dump to a file for diagnostics.