Ultra low latency tuning summary

https://github.com/baoyingwang/OrderBook/wiki/Ultra-low-latency-tuning-summary wiz is the latest update.

Basic idea

Build app
Build profiling script
Analyze profiling result
Refactor your app; and upgrade profiling scripts(if required); and re-profiling
Got to #3

before tuning, setup below

Print GC

Why How Monitor the GC behavior and understand it

Easy measure related performance data

Why: know your app and easy verify diff options

Big picture - less layers( thread model) How: less layers Less transforming , eg all bytes

Understand the physical and your app

how many cpu and hz? How cpu works? L1/L2/L3 cache.
how many memory? how memory works? how it communicate with Cput(and L1/2/3）
how IO is related with memory and cpu? why SSD is much faster than HDD?
your application thread model, and the layers of critical path
how many calculation/memory will be used in your app. Convert to max possible performance considering cpu and memory limit.

tuning from below aspects

Disk IO

Avoid any disk IO on critical path
Use SSD
Log level info
Log buffer size increasing . It will reduce kernel / user switch.
avoid string concat while debugging
avoid toString on debug

Network IO

CPU

busy spin , or block wait

Less GC and/or fast GC

Know your app feature. How many long live objects, and how long.
know basic : minor GC / major GC
(par new + CMS ) or G1

Applications stop because of biased locking

disable it if many such stops

Application stop because of safepoint

java program

mutable is preferred on critical path, to avoid more temp objects
avoid log.debug("{}", obj.toString()); log.debug("{}", obj). You should save toString().
put less data to your executor, to use less old generation e.g. below code lines, #1 handleER is better, since it will only pass short string(128) to executor, but #2 handleER will pass the whole fixER(1k?) message to executor. If the executor is slow for some reason, old generation increase much faster for #2.

    int bufferSize = 16*1024*1024;
    BufferedOutputStream output = TestToolUtil.setupOutputLatencyFile(_latencyDataFile, bufferSize);
    #1 
    void handleER(String fixER, long recvTimeNano){

        if(fixER.indexOf("\u000156="+MatchingEngineApp.LATENCY_ENTITY_PREFIX) > 0){

            _latencyWritingExecutor.submit(()->{
                try {
                    output.write(latencyRecord.getBytes());
                }catch (Exception e){
                    log.error("fail to write", e);
                }
            });
        }
    }
	
    #2	
    void handleER(String fixER, long recvTimeNano){

        if(fixER.indexOf("\u000156="+MatchingEngineApp.LATENCY_ENTITY_PREFIX) > 0){

            String latencyRecord= TestToolUtil.getLantecyRecord(fixER,recvTimeNano);

            _latencyWritingExecutor.submit(()->{
                try {
				    String latencyRecord= TestToolUtil.getLantecyRecord(fixER,recvTimeNano);
                    output.write(latencyRecord.getBytes());
                }catch (Exception e){
                    log.error("fail to write", e);
                }
            });
        }
    }

tools at hand

thoughts

It has been very different with years ago. Everything is easy(for starting at least).

easy to setup/run an application now, because of new libraries, including Spring Boot(both standalong app and web app)
easy to setup a dev project from scratch with Intellij + Gradle + Maven repo, and integrate with populate libraries, e.g. spring boots, etc
easy to write java scripts, e.g. Angular framework, C3 chart, etc.

But some aspects are still difficult, especially those some libraries for business, e.g. QFJ.

QFJ has been upgraded/pushed to maven repo. Good.
But it is still NOT well documented on the start guide. I am writing a wiki to make it easier.

Find the good tools/libraries to free yourself.

setup application

Spring boot
Angular for browser side
QuickFixJ for FIX interface(both client and server)

dev env

Gradle - save much time to avoid copy libs, or setup projects Gradle Fat jar - same time to setup runtime classpath
Intellij Good IDE with Gradle. Suppoty Python highlight(and python project) Support bash highlight Eclipse is not very good for Gradle

performance

Disruptor Good at busy cpu spin. For the sleep strategy, similar with jdk BlockingQ( then not required to introduce the complexity).
ChronicleQ Use file. Good for bytes. But for same jvm, bytes marshal/unmarshal are also big burden. Good for cross jvm sharing. It has a util LongPause. It will adapt (based on input) the wait intervals. Downgrade sleep/park time from nano ,to us, to ms,..
Netty NIO framework.
guava event bus ( sync / async mode). Not good for ultra latency(us level). But it greatly simply our code. In memory.

profiling

btrace instead of injecting lines to source code, btrade is recommended. pls use Sampled for fast method to reduce burden to application, see http://btraceio.github.io/btrace/2015/02/sampled-profiling/

graph

C3 for web graph
python matlibplot/seasorn/etc

statistics

python pandas

sys monitor

linux vmstat
windows - performance monitor (define your own collector, and write to csv file)

jvm monitor

collect the status by Java JMX MBean - refer baoying.orderbook.app.SysPerfDataCollectionEngine

jvm analyze

MAT for heap analyze
jvisualvm for cpu usage, memory allocation of each thread, etc

Provide feedback

Saved searches

Use saved searches to filter your results more quickly