Skip to content

Ultra low latency tuning summary

Baoying Wang edited this page Jan 19, 2018 · 12 revisions

https://github.com/baoyingwang/OrderBook/wiki/Ultra-low-latency-tuning-summary wiz is the latest update.

Basic idea

  • Build app
  • Build profiling script
  • Analyze profiling result
  • Refactor your app; and upgrade profiling scripts(if required); and re-profiling
  • Got to #3

before tuning, setup below

Print GC

Why How Monitor the GC behavior and understand it

Easy measure related performance data

Why: know your app and easy verify diff options

  • Big picture - less layers( thread model) How: less layers Less transforming , eg all bytes

Understand the physical and your app

  • how many cpu and hz? How cpu works? L1/L2/L3 cache.
  • how many memory? how memory works? how it communicate with Cput(and L1/2/3)
  • how IO is related with memory and cpu? why SSD is much faster than HDD?
  • your application thread model, and the layers of critical path
  • how many calculation/memory will be used in your app. Convert to max possible performance considering cpu and memory limit.

tuning from below aspects

Disk IO

  • Avoid any disk IO on critical path
  • Use SSD
  • Log level info
  • Log buffer size increasing . It will reduce kernel / user switch.
  • avoid string concat while debugging
  • avoid toString on debug

Network IO

CPU

  • busy spin , or block wait

Less GC and/or fast GC

  • Know your app feature. How many long live objects, and how long.
  • know basic : minor GC / major GC
  • (par new + CMS ) or G1
  • Applications stop because of biased locking
  • disable it if many such stops
  • Application stop because of safepoint

java program

  • mutable is preferred on critical path, to avoid more temp objects
  • avoid log.debug("{}", obj.toString()); log.debug("{}", obj). You should save toString().
  • put less data to your executor, to use less old generation e.g. below code lines, #1 handleER is better, since it will only pass short string(128) to executor, but #2 handleER will pass the whole fixER(1k?) message to executor. If the executor is slow for some reason, old generation increase much faster for #2.
    int bufferSize = 16*1024*1024;
    BufferedOutputStream output = TestToolUtil.setupOutputLatencyFile(_latencyDataFile, bufferSize);
    #1 
    void handleER(String fixER, long recvTimeNano){

        if(fixER.indexOf("\u000156="+MatchingEngineApp.LATENCY_ENTITY_PREFIX) > 0){

            _latencyWritingExecutor.submit(()->{
                try {
                    output.write(latencyRecord.getBytes());
                }catch (Exception e){
                    log.error("fail to write", e);
                }
            });
        }
    }
	
    #2	
    void handleER(String fixER, long recvTimeNano){

        if(fixER.indexOf("\u000156="+MatchingEngineApp.LATENCY_ENTITY_PREFIX) > 0){

            String latencyRecord= TestToolUtil.getLantecyRecord(fixER,recvTimeNano);

            _latencyWritingExecutor.submit(()->{
                try {
				    String latencyRecord= TestToolUtil.getLantecyRecord(fixER,recvTimeNano);
                    output.write(latencyRecord.getBytes());
                }catch (Exception e){
                    log.error("fail to write", e);
                }
            });
        }
    }	

tools at hand

thoughts

It has been very different with years ago. Everything is easy(for starting at least).

  • easy to setup/run an application now, because of new libraries, including Spring Boot(both standalong app and web app)
  • easy to setup a dev project from scratch with Intellij + Gradle + Maven repo, and integrate with populate libraries, e.g. spring boots, etc
  • easy to write java scripts, e.g. Angular framework, C3 chart, etc.

But some aspects are still difficult, especially those some libraries for business, e.g. QFJ.

  • QFJ has been upgraded/pushed to maven repo. Good.
  • But it is still NOT well documented on the start guide. I am writing a wiki to make it easier.

Find the good tools/libraries to free yourself.

setup application

  • Spring boot
  • Angular for browser side
  • QuickFixJ for FIX interface(both client and server)

dev env

  • Gradle - save much time to avoid copy libs, or setup projects Gradle Fat jar - same time to setup runtime classpath
  • Intellij Good IDE with Gradle. Suppoty Python highlight(and python project) Support bash highlight Eclipse is not very good for Gradle

performance

  • Disruptor Good at busy cpu spin. For the sleep strategy, similar with jdk BlockingQ( then not required to introduce the complexity).
  • ChronicleQ Use file. Good for bytes. But for same jvm, bytes marshal/unmarshal are also big burden. Good for cross jvm sharing. It has a util LongPause. It will adapt (based on input) the wait intervals. Downgrade sleep/park time from nano ,to us, to ms,..
  • Netty NIO framework.
  • guava event bus ( sync / async mode). Not good for ultra latency(us level). But it greatly simply our code. In memory.

profiling

graph

  • C3 for web graph
  • python matlibplot/seasorn/etc

statistics

  • python pandas

sys monitor

  • linux vmstat
  • windows - performance monitor (define your own collector, and write to csv file)

jvm monitor

  • collect the status by Java JMX MBean - refer baoying.orderbook.app.SysPerfDataCollectionEngine

jvm analyze

  • MAT for heap analyze
  • jvisualvm for cpu usage, memory allocation of each thread, etc

Code le