Restful API(spring boot)性能测试与分析 on v1.1 20180408

介绍

今天(20180408)对Web接口做了一次性能测试，并试图进行调优。这里记录一些有用的总结，以及这个过程的细节数据。代码结构基本是，由spring boots restful接收order，交给vertx eventloop处理， restful接口直接返回（不等待成交结果).event loop处理完成后，发送到内部一个simple order message server上处理（当前版本不发出到其他系统）。

注意：测试用所生成的execution report只是在撮合引擎jvm内存处理，而没有被发送出来。market data也只有snapshot每3秒左右发送一次，如果打开了localhost:8080/main.html的话。但是真正的系统中，execution report要通过message bus, 或者restful response方法返回，将使系统performance低于这次的测试结果。

测试结果

先给出结果。有兴趣的话，再看后面的测试方法等。

40个客户，5个或者default200个tomcat线程，支持5000 ord/second是没有问题的。
但是无法达到6000/s，卡在5700/s左右上不去了需要调查是否测试工具问题。因为没有测试报告中无error；内存100M， cpu15～20%等没看出问题。 6000/s还远没有达到网络瓶颈。

下面是default tomcat threads(max 200, real:67 http-nio) + 40 clients(20 buy + 20 offer) + 5000 order/s的jmeter测试报告

Label	# Samples	Average	Max	Std. Dev.	Error %	Throughput	Received KB/sec	Sent KB/sec	Avg. Bytes
place BACKGROUND - Offer order	274874	0	137	1.77	0.00%	2481.03619	443.85	499.11	183.2
place BACKGROUND - Bid order	274841	0	138	1.75	0.00%	2480.76073	443.8	494.21	183.2
place latency - Bid order	21	8	57	13.62	0.00%	0.19441	0.03	0.04	180
TOTAL	549736	0	138	1.76	0.00%	4961.96408	887.68	993.36	183.2

结果上看，一切安好。无error，测试速率为4961，与5000的误差为 0.8%以内

下面是5 tomcat threads + 40 clients(20 buy + 20 offer) + 5000 order/s的jmeter测试报告. 基本与default tomcat thread报告一直，std dev略小，但是我觉得这个差距可以忽略。

Label	# Samples	Average	Max	Std. Dev.	Error %	Throughput	Received KB/sec	Sent KB/sec	Avg. Bytes
place BACKGROUND - Offer order	352115	0	72	1.29	0.00%	2492.83191	444.26	501.49	182.5
place BACKGROUND - Bid order	352197	0	44	1.26	0.00%	2493.46539	444.37	496.75	182.5
place latency - Bid order	24	1	3	0.76	0.00%	0.19996	0.04	0.04	179.4
TOTAL	704336	0	72	1.28	0.00%	4986.41426	888.64	998.26	182.5

测试方法

客户端：通过jmeter测试脚本模拟服务器端：直接通过Intellij启动MatchingEngine v1.1（为什么不build一个release package，再启动？因为懒...) 都在同一台笔记本电脑上，所以netowrk 是loopback

基本流程：

启动服务器端之后，测试脚本将

发送一些大额交易充实order book
按照期望的系统压力发送订单(全部与orderbook中的订单成交) . 买方，卖方个占一半流量
使用jmeter的summary report实时查看throughput, 并且http://localhost:8080/statistics.html也支持查看撮合服务器内部处理的order/execution/marketdata速率

测试环境的关键数据

客户连接数量（设置与jmeter脚本中) 设定为40个。关键问题：有没有必要测试更多的客户数量，如200个，400个？

从需求的角度，没有
从研究的角度，可以做，而且做了，结果不好(400个客户，3000ord/s，错误率10%）。

为什么说从需求的角度没有？因为从架构上看，撮合服务器的前端为1. web服务器（web客户) 2. API Gateway(API客户). 每个前端与撮合服务器保持1个链接。所以web 服务器与 API Gateway的数量一般来说不会太多，我希望前期web服务器在20台以内（好吧，都是意淫的，没有任何公司部署这个系统呢）， API Gateway在初期2台。

tomcat线程数(设置与springboot中）默认200(https://stackoverflow.com/questions/25399400/maximum-client-request-thread-pool-size-in-spring) 但是可以通过server.tomcat.max-threads设置(create a new file application.properties, set server.tomcat.max-threads=5 or other value you expect)

如果我们的业务线程直接基于tomcat的线程中运行，而且很慢，那么这个线程数大一些比较好。但是对于当前(v1.1)的撮合程序来说，速度极快(纳秒级), 其实多个线程并没有帮助。反而可能因为线程上下文切换而引起性能损失(暂时没有证据哦)

log打印目前只打印了一条比较短的订单信息（e.g. 2018-04-08 15:14:27,022 INFO b.o.a.MatchingEngineWebWrapper [http-nio-8080-exec-2] received order request, symbol:USDJPY, client ordID:BACKGROUND-Buy_1523171168129_5352739, ordID:USDJPY_BACKGROUND-Buy_1523171168129_5352740) 每次打印完，都将flush

        <!-- 
            section : Rollover Strategies of  https://logging.apache.org/log4j/2.x/manual/appenders.html 
            the bufferSize does not work, since immediateFlush is true(as default)
        -->
        <RollingFile name="RollingFile"
                     fileName="log/${app_name}.log"
                     filePattern="logarchive/${app_name}-%d{MM-dd-yyyy}.log.gz"
                     ignoreExceptions="false"
                     immediateFlush="true"
                     bufferSize="8192000">
            <PatternLayout>
                <Pattern>%d %p %c{1.} [%t] %m%n</Pattern>
            </PatternLayout>
            <TimeBasedTriggeringPolicy />
        </RollingFile>

正式生产环境中，我希望打印所有的输入输出消息。这些消息有助于我们线上问题排查，以及线上问题线下重现(通过replay这些消息）

收到的order详细信息打印出来
产生的每一条execution report打印出来
发送的每一条market data打印出来

其他的只打印

启动/关闭过程过程
错误/警告信息
系统资源使用情况，如cpu，heap，thread数量

测试过程

其实测试过程还是满琐碎的，经历了一些坑，譬如

上来就测试400个客户连接。如上所述，并没有这个业务需求，或者说优先级极低。
基于400个客户连接，抓取了cpu使用情况(jvisualvm cpu sample), 并试图调试log打印immediateFlush="false"(因为write使用了10%的时间 3秒 over 32 秒）。但是，我是期望实时打印这些log的。而且调整后，对测试结果影响极小，可以忽略。

一些琐碎的细节（包括抓取的jvisualvm截图）放在这里了 Spring Restful Tuning Tracking at 20180408 https://note.wiz.cn/web?dc=ef8dce79-e49a-400d-9413-8844a841be32&kb=&cmd=km%2C

尝试试把restful接口更改为直接返回execution report

但是performance downgrade的厉害。勉强支撑100/秒，200/秒都无法处理。同步返回execution report的代码如下

placeOrder(arguements ignored){

	....some lines ignored

	final CountDownLatch latch = new CountDownLatch(1);
	final List<OrderBook.MEExecutionReportMessageFlag> matchResult = new ArrayList<>();
	_vertxContext.runOnContext((v)->{

		final List<OrderBook.MEExecutionReportMessageFlag> reports = _engine.matchOrder(originalOrder);
		matchResult.addAll(reports);
		latch.countDown();

	});

	try {
		latch.await(3, TimeUnit.SECONDS);
	}catch (InterruptedException e){
		log.error("interrupted while processing symbol:{}, client ordID:{}, ordID:{}", new Object[]{symbol,clientOrdID,orderID });
	}
	Gson gson = new GsonBuilder().create();
	String jsonString = gson.toJson(matchResult);
	return jsonString;
	
}

从jvisulvm的cpu sample结果上，并无法明显看出这段代码占用过多时间。例如：在每秒100 ord压力下，测试22秒，某http-nio线程（共5个）调用place_order 2.7秒（10+%），其余时间为blockQueue.take(). 在这2.7秒中，countdownlatch await:2.1秒，toJason:0.4秒，log.info:0.1秒， runOnContext:0.1秒。即这个countdownlatch的通知效率很低。

下一步

(high) 研究能否直接高性能的直接返回match结果给restful接口。本文尝试过一次，但是效果很差。可以看一下vertx的restful接口的实现方式，是否有能有所改善。在将来v2.0的系统中，我们需要将execution report通过数据流的方式发送出去，如果能够通过restful response方式发送回去，将大大简化代码开发（前提是performance够用)
(middle)研究给系统增大压力到6000 以及更大。我怀疑jmeter是瓶颈；换一个测试工具试一下。不是high priority因为我觉得（直觉）5000 order/s可以应付大多数业务需求了。如果有这个测试的需求，可以继续做(交给专门的测试工程师？），否则可以放一放。
(low)作为兴趣，研究为何在400个链接的情况下，Error那么多。是哪个环节导致了问题。怀疑是Spring那里卡住了。

Provide feedback

Saved searches

Use saved searches to filter your results more quickly