Configuring Axional Server for highload, albeit for load testing or for production, requires that the operating system, the JVM, the network and the load generation all be tuned.
The following document shows the major parameters to configure on a server
machine.
A class of machine referred to as a server-class
machine has been defined as a machine with the following:
- 2 or more physical processors
- 2 or more GB of physical memory
1 Operating System
Linux does a reasonable job of self configuring TCP/IP, but there are a few limits
and defaults that that are best increased. These can mostly be configured in
/etc/security/limits.conf
or via sysctl
1.1 TCP buffer sizes
These should be increased to at least 16MB for 10G paths and tune the autotuning (although buffer bloat now needs to be considered).
sysctl -w net.core.rmem_max=16777216 sysctl -w net.core.wmem_max=16777216 sysctl -w net.ipv4.tcp_rmem="4096 87380 16777216" sysctl -w net.ipv4.tcp_wmem="4096 16384 16777216"
1.2 Queue Sizes
net.core.somaxconn controls the size of the connection listening queue. The default value of 128 and if you are running a high-volume server and connections are getting refused at a TCP level, then you want to increase this. This is a very tweakable setting in such a case. Too high and you'll get resource problems as it tries to notify a server of a large number of connections and many will remain pending, and too low and you'll get refused connections:
sysctl -w net.core.somaxconn=4096
The net.core.netdev_max_backlog controls the size of the incoming packet queue for upper-layer (java) processing. The default (2048) may be increased and other related parameters adjusted with:
sysctl -w net.core.netdev_max_backlog=16384 sysctl -w net.ipv4.tcp_max_syn_backlog=8192 sysctl -w net.ipv4.tcp_syncookies=1
1.3 Ports
If many outgoing connections are made (eg on load generators), then the operating system may run low on ports. Thus it is best to increase the port range used and allow reuse of sockets in TIME_WAIT:
sysctl -w net.ipv4.ip_local_port_range="1024 65535" sysctl -w net.ipv4.tcp_tw_recycle=1
2 Network Tuning
Ensure persistent HTTP/1.1
Intermediaries such as nginx can use non persistent HTTP/1.0 connection. Make sure that persistent HTTP/1.1 connections are used.3 JVM Memory
Probably the most important tuning we can apply to the Java™ Virtual Machine (JVM) is to configure how much heap memory (used for allocations at runtime) to use. If the demands on PingFederate require more memory than is currently available, the JVM must grow the heap (if it can) or perform garbage collection to provide memory to allocate. Resizing the heap and garbage collecting can be expensive processes, and thus detrimental to performance. Sizing the heap to ensure an adequate amount of memory is available but still manageable to garbage collection is important in optimizing overall performance.
By default, if server has less than 2 gigabytes (GB) of available memory, it should configure the Java heap to be a minimum of 256 megabytes (MB) and a maximum of 1GB.
This means that when server is launched, 256 MB are available for newly created objects. If more than 256 MB are required, the JVM garbage collects and expands the heap (if necessary) until it reaches 1 GB. At this point, the JVM must garbage collect in order to free up released memory to be re-used.
Understanding JVM Memory Model, Java Memory Management are very important if you want to understand the working of Java Garbage Collection.
3.1 JDK 8 from PermGen to Metaspace
The JDK 8 HotSpot JVM is now using native memory for the representation of class metadata and is called Metaspace, similar to the Oracle JRockit
and IBM JVM's
.
The Metaspace replaces the clasic PermGen space present in JDK versions 7 and before.
The JDK 7 memory looked like:
The JDK 8 looks like:
3.1.1 PermGen space situation
- This memory space is completely removed
- The -XX:PermSize and -XX:MaxPermSize JVM arguments are ignored and a warning is issued if present at start-up.
Using the new -XX:MaxMetaspaceSize the olf behavious could be simulated. But in general (unless we have memory leak) we will allow mestaspace grows dynamically as required.
3.1.2 Metaspace memory allocation model
- Most allocations for the class metadata are now allocated out of native memory.
- By default class metadata allocation is limited by the amount of available native memory.
3.1.3 Metaspace capacity
- By default class metadata allocation is limited by the amount of available native memory.
- A new flag is available ( MaxMetaspaceSize), allowing you to limit the amount of native memory used for class metadata. If you don’t specify this flag, the Metaspace will dynamically re-size depending of the application demand at runtime.
3.1.4 Metaspace garbage collection
- Garbage collection of the dead classes and classloaders is triggered once the class metadata usage reaches the “ MaxMetaspaceSize”
- Proper monitoring & tuning of the Metaspace will obviously be required in order to limit the frequency or delay of such garbage collections. Excessive Metaspace garbage collections may be a symptom of classes, classloaders memory leak or inadequate sizing for your application.
3.1.5 Java heap space impact
- Some miscellaneous data has been moved to the Java heap space. This means you may observe an increase of the Java heap space following a future JDK 8 upgrade.
3.1.6 Metaspace monitoring
- Metaspace usage is available from the HotSpot 1.8 verbose GC log output.
- jstat & JVisualVM have not been updated at this point based on our testing with b75 and the old PermGen space references are still present.
3.2 Detailed memory view
The Java memory model is specified in JVM specification, Java SE 8 Edition, and mainly in the chapters “2.5 Runtime Data Areas” and “2.6 Frames”.
The heap space holds object data, the method area holds class code, and the native area holds references to the code and object data.
Memory part | Description |
---|---|
Young Generation | Young generation is the place where all the new objects are created. When young generation is filled, garbage collection is performed. This garbage collection is called Minor GC. Young Generation is divided into three parts – Eden Memory and two Survivor Memory spaces. |
Old Generation | Old Generation memory contains the objects that are long lived and survived after many rounds of Minor GC. Usually garbage collection is performed in Old Generation memory when it’s full. Old Generation Garbage Collection is called Major GC and usually takes longer time. |
Permanent Generation | Permanent Generation or “Perm Gen” contains the application metadata required by the JVM to describe the classes and methods used in the application. Note that Perm Gen is not part of Java Heap memory.
Perm Gen is populated by JVM at runtime based on the classes used by the application. Perm Gen also contains Java SE library classes and methods. Perm Gen objects are garbage collected in a full garbage collection. |
Method Area | Method Area is part of space in the Perm Gen and used to store class structure (runtime constants and static variables) and code for methods and constructors. |
Memory Pool | Memory Pools are created by JVM memory managers to create a pool of immutable objects, if implementation supports it. String Pool is a good example of this kind of memory pool. Memory Pool can belong to Heap or Perm Gen, depending on the JVM memory manager implementation. |
Runtime Constant Pool | Runtime constant pool is per-class runtime representation of constant pool in a class. It contains class runtime constants and static methods. Runtime constant pool is the part of method area. |
Java Stack Memory | Java Stack memory is used for execution of a thread. They contain method specific values that are short-lived and references to other objects in the heap that are getting referred from the method. |
3.3 JVM memory switchs
Java provides a lot of memory switches that we can use to set the memory sizes and their ratios. Some of the commonly used memory switches are
VM switch | Description |
---|---|
-Xmx | Max heap size |
-Xms | Initial heap size |
-XX:NewSize | Initial young generation size |
-XX:SurvivorRatio | For providing ratio of Eden space and Survivor Space, for example if Young Generation size is 10m and VM switch is -XX:SurvivorRatio=2 then 5m will be reserved for Eden Space and 2.5m each for both the Survivor spaces. The default value is 8. |
-XX:NewRatio | For providing ratio of old/new generation sizes. The default value is 2. |
-Xss | Thread’s stack size. The default value is 0 and lets the OS define the thread size. |
-XX:MaxDirectMemorySize | This option sets a limit on the amount of memory that can be reserved for all Direct Byte Buffers. The size is the limit on memory that can be reserved for all Direct Byte Buffers. If a value is set for this option, the sum of all Direct Byte Buffer sizes cannot exceed the limit. After the limit is reached, a new Direct Byte Buffer can be allocated only when enough old buffers are freed to provide enough space to allocate the new buffer. |
-XX:+UseLargePages | The goal of large page support is to optimize processor Translation-Lookaside Buffers. |
-XX:+CompressedOops | This option allows pointer compression in 64bit JVM to reduce the heap. |
Removed since JDK1.8 as Metaspace replaces PermGen | |
-XX:PermSize |
|
-XX:MaxPermSize |
|
-XX:PermGen |
|
-XX:MaxPermGen |
|
To see the initial values (default) of your Java instance
$ java -XX:+PrintFlagsFinal -version | grep -iE 'heapsize|permsize|threadstacksize'
intx CompilerThreadStackSize = 0 {pd product} uintx ErgoHeapSizeLimit = 0 {product} uintx HeapSizePerGCThread = 87241520 {product} uintx InitialHeapSize := 536870912 {product} uintx LargePageHeapSizeThreshold = 134217728 {product} uintx MaxHeapSize := 8589934592 {product} intx ThreadStackSize = 1024 {pd product} intx VMThreadStackSize = 1024 {pd product}
3.4 Tuning the JVM Heap
Without getting into too much detail on the generational memory management model in Java™, the
basic principal is that new objects are created in the "young"
generation and are garbage collected
when they are no longer used. If an object is created, and a reference is maintained, that object is
eventually moved to the "old"
generation.
The old generation is typically more expensive to clean up than the young generation because the old
generation is cleaned only during a full garbage collection (meaning that the JVM has almost reached
the value of -Xmx
and the entire heap must be cleaned). However, the young generation is garbage
collected more frequently and with multiple threads by default (on multi-core machines), so the pauses
for collections are shorter.
By default, the JVM tends to size the generations biased to the old generation, giving it most of the
total space of the heap. This means more frequently moving objects from the young generation into
the old generation to make space for new objects, and more frequent "full" collections as the old
generation fills up. By telling the JVM to provide more memory to the "young"
generation, we reduce
the frequency of the more costly full collections.
To do this we can either specify fixed values for the size of the young generation or modify the ratio of
young generation to old generation. To specify a fixed value for the young generation, use the
-XX:NewSize=
and -XX:MaxNewSize=
arguments.
These arguments are to the young generation what
-Xms
and -Xmx
are to the entire heap.
Like -Xms
, -XX:NewSize=
defines the initial (or minimum) size.
And, like -Xmx
, -XX:MaxNewSize=
defines the maximum size.
And yes, the same reasoning applies for adjusting these values.
To specify a ratio between the old and new generation size, use the -XX:NewRatio=
argument.
For example, setting -XX:NewRatio=3
means that the ratio between the young and old generation is 1:3.
Put another way, the size of the young generation is one fourth of the total heap size.
-
Fixed Heap size:
-Xms2048m -Xmx-2048m -
Fixed Heap Size with 50% Young Generation Bias using NewSize:
-Xms2048m -Xmx-2048m -XX:NewSize=1024m -XX:MaxNewSize=1024m -
Fixed Heap Size with 50% Young Generation Bias using NewRatio:
-Xms2048m –Xmx-2048m -XX:NewRatio=1
3.5 Tunning stack size
To determine your the default stack size of your JVM run:
$ java -XX:+PrintFlagsFinal -version | grep ThreadStackSize
intx ThreadStackSize = 1024 {pd product}
In the previous example systems shows a 1K stack size. To increase the stack size you can use the -xss option to start the JVM.
The server uses regular expression to analyze SQL code. In some circumstances, when
processing (splitting) a large SQL code, the regexp may fall due StackOverflow
.
In that cases you could increase the stack size.
In reference implementation of Pattern class (which comes with Oracle's JRE, OpenJDK, and a number of other JVMs), greedy and lazy quantifiers are implemented with recursion when the repeated pattern is non-trivial. Therefore, you will run into StackOverflowError when the input string is long enough.
3.6 Garbage collection
Keeping track of garbage collection statistics is vital to optimum Java performance, especially if you run the JVM with large heap sizes. Tuning the garbage collector for your use case is often a critical performance practice prior to deployment. Likewise, knowing what baseline garbage collection behavior looks like and monitoring for behavior outside of normal tolerances will keep you apprised of potential memory leaks and other pathological memory usage.
Moreover, the best way to minimize the performance impact of garbage collection is to keep heap usage small. Maintaining a small heap can save countless hours of garbage collection tuning and will provide much higher stability and predictability across your entire application.
3.6.1 Selecting a collector
Java 8 includes three different types of collectors, each with different performance characteristics.
Collector | Description |
-XX:+UseSerialGC | The serial collector uses a single thread to perform all garbage collection work, which makes it relatively efficient because there is no communication overhead between threads. It is best-suited to single processor machines, because it cannot take advantage of multiprocessor hardware, although it can be useful on multiprocessors for applications with small data sets (up to approximately 100 MB) |
-XX:+UseParallelGC | The parallel collector (also known as the throughput collector) performs minor collections in parallel, which can significantly reduce garbage collection overhead. It is intended for applications with medium-sized to large-sized data sets that are run on multiprocessor or multithreaded hardware |
-XX:+UseConcMarkSweepGC | The mostly concurrent collector performs most of its work concurrently (for example, while the application is still running) to keep garbage collection pauses short. It is designed for applications with medium-sized to large-sized data sets in which response time is more important than overall throughput because the techniques used to minimize pauses can reduce application performance. |
-XX:+UseG1GC |
3.6.2 JVM GC options
These options are general to the Sun JVM, and work in a JDK 8 installation. They provide good information about how your JVM is running; based on that initial information, you can then tune more finely.
- To print the implicit flags with which the JVM is configured:
-XX:+PrintCommandLineFlags
- To print the date and time stamps of GC activity with high details:
-XX:+PrintGCDateStamps -XX:+PrintGCTimeStamps -XX:+PrintGCDetails -XX:+PrintTenuringDistribution
- To print GC activity with less detail:
-verbose:gc
- To log GC details to a file:
-Xloggc:[path/to/gc.log
- To log GC details to a file:
-XX:+PrintCommandLineFlags
- To use the concurrent marksweep GC with full GC at 80% old generation full:
-XX:+UseConcMarkSweepGC -XX:CMSInitiatingOccupancyFraction=80
3.7 Large pages (XX:+UseLargePages)
Most of the major operating systems support LargeMemory page settings. This setting improves performance due to following reasons.
- increased performance through increased Translation Lookaside Buffer (TLB) hits
- pages are locked in memory and are never swapped out which will guarantee whole JVM heap remains in RAM. Same guarantee could not be given for Direct ByteBuffer memory.
- contiguous pages are pre-allocated and cannot be used for anything else but for System V shared memory, for example JVM heap.
- less bookkeeping work for the kernel in that part of virtual memory due to larger page sizes.
Please note that LargeMemory pages need to be enabled on the machine before using this option. Please click Oracle site here to understand how to enable Large pages in different operating systems including Linux, Windows and Solaris.
The way this option can be used is, set the size of Large Pages little bigger than the size of java heap size you are considering. This way, whole java heap will be pinned to the memory and os won't page in and out its contents.
4 JVM options for production environment
Finally, lets see the list of options we should consider to startup a JVM for an enterprise server.
-server -Xms<heap size>[g|m|k] -Xmx<heap size>[g|m|k] -XX:MaxMetaspaceSize=<metaspace size>[g|m|k] -Xmn<young size>[g|m|k] -XX:SurvivorRatio=<ratio> -XX:+UseConcMarkSweepGC -XX:+CMSParallelRemarkEnabled -XX:+UseCMSInitiatingOccupancyOnly -XX:CMSInitiatingOccupancyFraction=<percent> -XX:+ScavengeBeforeFullGC -XX:+CMSScavengeBeforeRemark -XX:+PrintGCDateStamps -verbose:gc -XX:+PrintGCDetails -Xloggc:"<path to log>" -XX:+HeapDumpOnOutOfMemoryError -XX:HeapDumpPath=<path to dump>`date`.hprof -Djava.rmi.server.hostname=<external IP> -Dcom.sun.management.jmxremote.port=<port> -Dcom.sun.management.jmxremote.authenticate=false -Dcom.sun.management.jmxremote.ssl=false
Java VM Memory Setting | Small | Medium | Large |
---|---|---|---|
-Xms | 1g | 2g | 4g |
-Xmx | 1g | 2g | 4g |
4.1 Make Server a Server
-server
Turns Java VM features specific to server applications, such as sofisticated JIT compiler. Though this option is implicitely enabled for x64 virtual machines, it still makes since to use it as according to documentation behaviour maybe changed in the future.
4.2 Make your Heap Explicit
-Xms<heap size>[g|m|k] -Xmx<heap size>[g|m|k]
TTo avoid dynamic heap resizing and lags, which could be caused by this, we explicitly specify minimal and maximal heap size. Thus Java VM will spend time only once to commit on all the heap.
-XX:MaxMetaspaceSize=<metaspace size>[g|m|k]
By default Metaspace in Java VM 8 is not limited, though for the sake of system stability it makes sense to limit it with some finite value.
-Xmn<young size>[g|m|k]
Explicitly define size of the young generation.
-XX:SurvivorRatio=<ratio>
Ratio which determines size of the survivor space relatively to eden size. Ratio can be calculated using following formula: $$survivor ratio = (\frac{young size}{survivor size}) - 2$$
4.3 Make GC Right
As response time is critical for server application concurrent collector fits best for Web applications. We could choose the new G1 or we may still use Concurrent Mark-Sweep collector.
4.3.1 Using G1
-XX:+UseG1GC
Start the G1 collector
-XX:InitiatingHeapOccupancyPercent=<percent>
4.3.2 Using Concurrent Mark-Sweep
-XX:+UseConcMarkSweepGC -XX:+CMSParallelRemarkEnabled
By default CMS GC uses set of heuristic rules to trigger garbage collection. This makes GC less predictable and usually tends to delay collection until old generation is almost occupied.
-XX:+UseCMSInitiatingOccupancyOnly -XX:CMSInitiatingOccupancyFraction=<percent>
-XX:CMSInitiatingOccupancyFraction informs Java VM when CMS should be triggered. Basically, it allows to create a buffer in heap, which can be filled with data, while CMS is working. Thus percent should be back calculated from the speed in which memory is consumed in old generation during production load.
Such percent should be chosen carefully, if it will be small — CMS will work to often, if it will be to big — CMS will be triggered too late and concurrent mode failure may occur.
-XX:+ScavengeBeforeFullGC -XX:+CMSScavengeBeforeRemark
nstructs garbage collector to collect young generation before doing Full GC or CMS remark phase and as a result improvde their performance due to absence of need to check references between young generation and tenured.
4.4 GC Logging
-XX:+PrintGCDateStamps -verbose:gc -XX:+PrintGCDetails -Xloggc:"<path to log>"
These options make Java to log garbage collector activities into specified file. All records will be prepended with human readable date and time. Meanwhile you should avoid using -XX:+PrintGCTimeStamps as it will prepend record with useless timestamp since Java application start.
4.5 Dump on Out of Memory
-XX:+HeapDumpOnOutOfMemoryError -XX:HeapDumpPath=<path to dump>`date`.hprof
If server application would ever fail with out-of-memory in production, you would not want to wait for another chance to reproduce the problem. These options instruct Java VM to dump memory into file, when OOM occurred.
4.6 Make JMX work
-Djava.rmi.server.hostname=<external IP>
IP address, which would be embedded into RMI stubs sent to clients. Later clients will use this stubs to communicate with server via RMI. As in modern datacenters machine often has two IPs (internal and external) you would want explicitly specify which IP to use, otherwise JVM will make it’s own choice. Correct value of this property is a precondition for successfully using JMX
-Dcom.sun.management.jmxremote.port=<port> -Dcom.sun.management.jmxremote.authenticate=false -Dcom.sun.management.jmxremote.ssl=false
To make JMX available for remote access specific listen port should be provided. Also to reduce additional troubles with connecting disable standard authentication, but be sure that only authorized users can connect to environment using firewall.
5 Additional reading
Tunning JVM has a long learning curve based on experience. The following readings may help to better understand JVM internals.