Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Perfomance analysis about #551 #559

Closed
sbernard31 opened this issue Feb 22, 2018 · 26 comments
Closed

Perfomance analysis about #551 #559

sbernard31 opened this issue Feb 22, 2018 · 26 comments

Comments

@sbernard31
Copy link
Contributor

This issue aims to regroup discussion/tests about performance analysis of #551.

Test 1

Protocol:
testing cf-benchmark server behavior. (as it does not changed in #551) and see how modifications impact it.

java -Xms4096m -Xmx4096m -jar cf-benchmark-2.0.0-SNAPSHOT.jar

using cf-extplugtest-client as client (I always use the same version of the client commit 4e86856, only version of server change)

java -Xms4096m -Xmx4096m -cp cf-extplugtest-client-2.0.0-SNAPSHOT.jar  org.eclipse.californium.extplugtests.BenchmarkClient "coap://localhost:5683/benchmark" 500 2000

For each run, I launched client command 3 times in a row. (For visual VM chart I terminate performing a manual GC).

Results:

Using cf-benchmark before #551. (commit 4e86856)
Create 500 benchmark clients, expect to send 1000000 request overall to coap://localhost:5683/benchmark
Benchmark clients created.
Benchmark started.
354436 requests (35443 reqs/s, 0 retransmissions (0,00%), 0 transmission errors (0,00%), 500 clients)
808318 requests (45388 reqs/s, 0 retransmissions (0,00%), 0 transmission errors (0,00%), 500 clients)
Benchmark clients finished.
1000000 requests sent, 1000000 expected
1000000 requests in 24445ms, 40908 reqs/s
0 notifies sent, 0 expected, 0 observe requests
0 notifies in 24445ms, 0 notifies/s
249 clients with 1983 to 1999 requests.
250 clients with 2000 to 2019 requests.
  1 clients with 2020 requests.
===========================================================
Create 500 benchmark clients, expect to send 1000000 request overall to coap://localhost:5683/benchmark
Benchmark clients created.
Benchmark started.
360784 requests (36078 reqs/s, 0 retransmissions (0,00%), 0 transmission errors (0,00%), 500 clients)
835320 requests (47453 reqs/s, 0 retransmissions (0,00%), 0 transmission errors (0,00%), 500 clients)
Benchmark clients finished.
1000000 requests sent, 1000000 expected
1000000 requests in 23803ms, 42011 reqs/s
0 notifies sent, 0 expected, 0 observe requests
0 notifies in 23803ms, 0 notifies/s
246 clients with 1981 to 1999 requests.
250 clients with 2000 to 2019 requests.
  4 clients with 2020 to 2027 requests.
=============================================================
Create 500 benchmark clients, expect to send 1000000 request overall to coap://localhost:5683/benchmark
Benchmark clients created.
Benchmark started.
397611 requests (39761 reqs/s, 0 retransmissions (0,00%), 0 transmission errors (0,00%), 500 clients)
858955 requests (46134 reqs/s, 0 retransmissions (0,00%), 0 transmission errors (0,00%), 500 clients)
Benchmark clients finished.
1000000 requests sent, 1000000 expected
1000000 requests in 23790ms, 42034 reqs/s
0 notifies sent, 0 expected, 0 observe requests
0 notifies in 23790ms, 0 notifies/s
  6 clients with 1978 to 1979 requests.
270 clients with 1980 to 1999 requests.
194 clients with 2000 to 2019 requests.
 23 clients with 2020 to 2037 requests.
  7 clients with 2042 to 2059 requests

serverref

Using cf-benchmark from #551 (commit 54828e7 rebased on 4e86856 to benefit of maven log fix)
Create 500 benchmark clients, expect to send 1000000 request overall to coap://localhost:5683/benchmark
Benchmark clients created.
Benchmark started.
321568 requests (32156 reqs/s, 0 retransmissions (0,00%), 0 transmission errors (0,00%), 500 clients)
705669 requests (38410 reqs/s, 0 retransmissions (0,00%), 0 transmission errors (0,00%), 500 clients)
Benchmark clients finished.
1000000 requests sent, 1000000 expected
1000000 requests in 27755ms, 36029 reqs/s
0 notifies sent, 0 expected, 0 observe requests
0 notifies in 27755ms, 0 notifies/s
265 clients with 1981 to 1999 requests.
218 clients with 2000 to 2019 requests.
 17 clients with 2020 to 2036 requests.
===================================
Create 500 benchmark clients, expect to send 1000000 request overall to coap://localhost:5683/benchmark
Benchmark clients created.
Benchmark started.
272947 requests (27294 reqs/s, 0 retransmissions (0,00%), 0 transmission errors (0,00%), 500 clients)
585271 requests (31232 reqs/s, 0 retransmissions (0,00%), 0 transmission errors (0,00%), 500 clients)
933816 requests (34854 reqs/s, 0 retransmissions (0,00%), 0 transmission errors (0,00%), 500 clients)
Benchmark clients finished.
1000000 requests sent, 1000000 expected
1000000 requests in 32561ms, 30711 reqs/s
0 notifies sent, 0 expected, 0 observe requests
0 notifies in 32561ms, 0 notifies/s
  1 clients with 1978 requests.
259 clients with 1983 to 1999 requests.
228 clients with 2000 to 2019 requests.
 12 clients with 2020 to 2033 requests.
===============================================
Create 500 benchmark clients, expect to send 1000000 request overall to coap://localhost:5683/benchmark
Benchmark clients created.
Benchmark started.
330138 requests (33013 reqs/s, 0 retransmissions (0,00%), 0 transmission errors (0,00%), 500 clients)
698271 requests (36813 reqs/s, 0 retransmissions (0,00%), 0 transmission errors (0,00%), 500 clients)
Benchmark clients finished.
1000000 requests sent, 1000000 expected
1000000 requests in 29033ms, 34443 reqs/s
0 notifies sent, 0 expected, 0 observe requests
0 notifies in 29033ms, 0 notifies/s
243 clients with 1982 to 1999 requests.
254 clients with 2000 to 2018 requests.
  3 clients with 2020 to 2027 requests.

server551

Using cf-benchmark from #551 removing all `new Throwable`
Create 500 benchmark clients, expect to send 1000000 request overall to coap://localhost:5683/benchmark
Benchmark clients created.
Benchmark started.
275643 requests (27564 reqs/s, 0 retransmissions (0,00%), 0 transmission errors (0,00%), 500 clients)
717956 requests (44231 reqs/s, 0 retransmissions (0,00%), 0 transmission errors (0,00%), 500 clients)
Benchmark clients finished.
1000000 requests sent, 1000000 expected
1000000 requests in 27212ms, 36748 reqs/s
0 notifies sent, 0 expected, 0 observe requests
0 notifies in 27212ms, 0 notifies/s
243 clients with 1982 to 1999 requests.
256 clients with 2000 to 2019 requests.
  1 clients with 2026 requests.
================================
Create 500 benchmark clients, expect to send 1000000 request overall to coap://localhost:5683/benchmark
Benchmark clients created.
Benchmark started.
335922 requests (33592 reqs/s, 0 retransmissions (0,00%), 0 transmission errors (0,00%), 500 clients)
734060 requests (39813 reqs/s, 0 retransmissions (0,00%), 0 transmission errors (0,00%), 500 clients)
Benchmark clients finished.
1000000 requests sent, 1000000 expected
1000000 requests in 27296ms, 36635 reqs/s
0 notifies sent, 0 expected, 0 observe requests
0 notifies in 27296ms, 0 notifies/s
248 clients with 1982 to 1999 requests.
247 clients with 2000 to 2019 requests.
  5 clients with 2021 to 2033 requests.
===============================
Create 500 benchmark clients, expect to send 1000000 request overall to coap://localhost:5683/benchmark
Benchmark clients created.
Benchmark started.
317951 requests (31795 reqs/s, 0 retransmissions (0,00%), 0 transmission errors (0,00%), 500 clients)
713891 requests (39594 reqs/s, 0 retransmissions (0,00%), 0 transmission errors (0,00%), 500 clients)
Benchmark clients finished.
1000000 requests sent, 1000000 expected
1000000 requests in 27423ms, 36465 reqs/s
0 notifies sent, 0 expected, 0 observe requests
0 notifies in 27423ms, 0 notifies/s
275 clients with 1980 to 1999 requests.
199 clients with 2000 to 2019 requests.
 25 clients with 2020 to 2036 requests.
  1 clients with 2046 requests.

server551withoutthrowable

Interpretations:
We note there is more CPU activities and especially more GC activities.
It seems we also allocate more heap space.
The number of request/seconds seems lower too.

Removing call to new Throwable sounds to improve it a bit but this seems not so smooth than before.

@boaks
Copy link
Contributor

boaks commented Feb 22, 2018

Removing call to new Throwable sounds to improve it a bit but this seems not so smooth than before.

Do you remember, if there are new Throwable left, if Exchange.DEBUG is false?

@sbernard31
Copy link
Contributor Author

As far as I know there is only 2 new Throwable and both can be avoided setting Exchange.DEBUG to false.

@boaks
Copy link
Contributor

boaks commented Feb 23, 2018

Some of the memory usage seems to be caused by over-use of AtomicReference.
I introduced them before I recognised, that "more" synchronization is required.

@sbernard31
Copy link
Contributor Author

Did you plan to push a commit with less AtomicReference ? Or do you prefer thant I try to do it on my side ?
Then I can launch my tests.

@boaks
Copy link
Contributor

boaks commented Feb 23, 2018

I currently play with that. I guess I will push beginning next week.

@boaks
Copy link
Contributor

boaks commented Feb 25, 2018

The gc/heap stuff seems to be improved with using a smaller stripe object and not the Exchange itself.
The performance penalty of the StripedExecutor is still the case.

@boaks
Copy link
Contributor

boaks commented Feb 25, 2018

See new commits in #551

@sbernard31
Copy link
Contributor Author

I looked at the github version of stripedExecutor and it uses an IdentityHashMap instead of HashMap
Maybe this could have impact and in our case I think we want to use == for Exchange and not equals
(This remarks is maybe obsolete, with your last changes)

I also began to discuss about performance issue with the creator of StripedExecutorService.
https://github.com/kabutz/javaspecialists/issues

I will test tomorrow the changes you done in #551.

@boaks
Copy link
Contributor

boaks commented Feb 26, 2018

I guess, the HashMap is used because of scandium, which uses InetSocketAddress as stripe.
My change only improves the heap/gc consumption, the execution time still get's a penalty for the stripes :-).

@sbernard31
Copy link
Contributor Author

My last results for Test1:

Using cf-benchmark from #551 with last modification (commit dca4c1d) and DEBUG = false (so without` new Throwable`
Create 500 benchmark clients, expect to send 1000000 request overall to coap://localhost:5683/benchmark
Benchmark clients created.
Benchmark started.
349419 requests (34941 reqs/s, 0 retransmissions (0,00%), 0 transmission errors (0,00%), 500 clients)
775733 requests (42631 reqs/s, 0 retransmissions (0,00%), 0 transmission errors (0,00%), 500 clients)
Benchmark clients finished.
1000000 requests sent, 1000000 expected
1000000 requests in 26009ms, 38448 reqs/s
0 notifies sent, 0 expected, 0 observe requests
0 notifies in 26009ms, 0 notifies/s
  2 clients with 1973 to 1978 requests.
278 clients with 1981 to 1999 requests.
190 clients with 2000 to 2019 requests.
 28 clients with 2020 to 2038 requests.
  2 clients with 2051 to 2053 requests.
============================
Create 500 benchmark clients, expect to send 1000000 request overall to coap://localhost:5683/benchmark
Benchmark clients created.
Benchmark started.
343985 requests (34398 reqs/s, 0 retransmissions (0,00%), 0 transmission errors (0,00%), 500 clients)
768012 requests (42402 reqs/s, 0 retransmissions (0,00%), 0 transmission errors (0,00%), 500 clients)
Benchmark clients finished.
1000000 requests sent, 1000000 expected
1000000 requests in 26262ms, 38077 reqs/s
0 notifies sent, 0 expected, 0 observe requests
0 notifies in 26262ms, 0 notifies/s
  3 clients with 1975 to 1979 requests.
260 clients with 1980 to 1999 requests.
213 clients with 2000 to 2019 requests.
 22 clients with 2020 to 2039 requests.
  1 clients with 2044 requests.
  1 clients with 2071 requests.
==============================
Create 500 benchmark clients, expect to send 1000000 request overall to coap://localhost:5683/benchmark
Benchmark clients created.
Benchmark started.
332781 requests (33278 reqs/s, 0 retransmissions (0,00%), 0 transmission errors (0,00%), 500 clients)
755433 requests (42265 reqs/s, 0 retransmissions (0,00%), 0 transmission errors (0,00%), 500 clients)
Benchmark clients finished.
1000000 requests sent, 1000000 expected
1000000 requests in 25993ms, 38471 reqs/s
0 notifies sent, 0 expected, 0 observe requests
0 notifies in 25993ms, 0 notifies/s
  1 clients with 1979 requests.
247 clients with 1980 to 1999 requests.
246 clients with 2000 to 2019 requests.
  6 clients with 2020 to 2024 requests.

server551modif1withoutthrowable

Using cf-benchmark from #551 with last modification (commit dca4c1d) and DEBUG = false (so without` new Throwable` and without using stripedExecutor
Create 500 benchmark clients, expect to send 1000000 request overall to coap://localhost:5683/benchmark
Benchmark clients created.
Benchmark started.
383125 requests (38312 reqs/s, 0 retransmissions (0,00%), 0 transmission errors (0,00%), 500 clients)
820346 requests (43722 reqs/s, 0 retransmissions (0,00%), 0 transmission errors (0,00%), 500 clients)
Benchmark clients finished.
1000000 requests sent, 1000000 expected
1000000 requests in 24557ms, 40721 reqs/s
0 notifies sent, 0 expected, 0 observe requests
0 notifies in 24557ms, 0 notifies/s
  1 clients with 1979 requests.
230 clients with 1984 to 1999 requests.
268 clients with 2000 to 2018 requests.
  1 clients with 2020 requests.
=================================
Create 500 benchmark clients, expect to send 1000000 request overall to coap://localhost:5683/benchmark
Benchmark clients created.
Benchmark started.
352237 requests (35223 reqs/s, 0 retransmissions (0,00%), 0 transmission errors (0,00%), 500 clients)
785943 requests (43370 reqs/s, 0 retransmissions (0,00%), 0 transmission errors (0,00%), 500 clients)
Benchmark clients finished.
1000000 requests sent, 1000000 expected
1000000 requests in 25443ms, 39303 reqs/s
0 notifies sent, 0 expected, 0 observe requests
0 notifies in 25443ms, 0 notifies/s
262 clients with 1981 to 1999 requests.
233 clients with 2000 to 2019 requests.
  5 clients with 2022 to 2028 requests.
================================
Create 500 benchmark clients, expect to send 1000000 request overall to coap://localhost:5683/benchmark
Benchmark clients created.
Benchmark started.
340208 requests (34020 reqs/s, 0 retransmissions (0,00%), 0 transmission errors (0,00%), 500 clients)
771022 requests (43081 reqs/s, 0 retransmissions (0,00%), 0 transmission errors (0,00%), 500 clients)
Benchmark clients finished.
1000000 requests sent, 1000000 expected
1000000 requests in 25647ms, 38990 reqs/s
0 notifies sent, 0 expected, 0 observe requests
0 notifies in 25647ms, 0 notifies/s
  6 clients with 1977 to 1979 requests.
249 clients with 1980 to 1999 requests.
233 clients with 2000 to 2019 requests.
 11 clients with 2020 to 2026 requests.
  1 clients with 2040 requests.

server551modif1withoutthrowablenostriped

Interpretations:
It confirms that "a smaller stripe object and not the Exchange itself" behaves better in terms of Heap Memory/GC behavior. Slightly better in terms of CPU and request/seconds too.

Removing StripedExecutor show this globally behaves better in terms of performance (especially GC cpu) but this is still less than the reference test.

@boaks
Copy link
Contributor

boaks commented Mar 5, 2018

Any further progress?

@sbernard31
Copy link
Contributor Author

Nope, I tried several little changes without real impact. I plan to try to reproduce a test environment near than what we have in production to see how this behaves.

@boaks
Copy link
Contributor

boaks commented Mar 20, 2018

I plan to try to reproduce a test environment near than what we have in production to see how this behaves.

Any news?

FMPOV, having a "small" performance penalty (it will be even smaller, if coaps, larger messages and/or deduplication is used), is the smaller problem than having leaks.

@sbernard31
Copy link
Contributor Author

I'm working on it but first tests seems to show that performance penalty is limited.
And I agree that fixing leaks is more valuable.

@boaks
Copy link
Contributor

boaks commented Apr 10, 2018

@sbernard31
Are you still on it? Or could it be closed?

@sbernard31
Copy link
Contributor Author

We can close it. I will reopen it if needed.

@joechen2010
Copy link

Hi there, why closed it , was it fixed in new code? seems we met the same issue. The cpus are not busy, and lots of DTLS-Connection-Handler are waiting for task.

Cpu: 6.7, threadId: 310827, "DTLS-Connection-Handler-2" #120 daemon prio=5 os_prio=0 tid=0x00000000016f2000 nid=0x4be2b waiting on condition [0x00007f8b4e6e7000]
Cpu: 6.7, threadId: 310828, "DTLS-Connection-Handler-3" #121 daemon prio=5 os_prio=0 tid=0x00000000016f5000 nid=0x4be2c runnable [0x00007f8b4e5e6000]
Cpu: 6.7, threadId: 310832, "DTLS-Connection-Handler-7" #125 daemon prio=5 os_prio=0 tid=0x00000000016fc800 nid=0x4be30 waiting on condition [0x00007f8b4e1e2000]
Cpu: 6.7, threadId: 310834, "DTLS-Connection-Handler-9" #127 daemon prio=5 os_prio=0 tid=0x0000000001700800 nid=0x4be32 waiting on condition [0x00007f8b4dfe0000]
Cpu: 6.7, threadId: 310842, "DTLS-Connection-Handler-16" #135 daemon prio=5 os_prio=0 tid=0x000000000170e800 nid=0x4be3a waiting on condition [0x00007f8b4d7d8000]
Cpu: 6.7, threadId: 310844, "DTLS-Connection-Handler-18" #137 daemon prio=5 os_prio=0 tid=0x0000000001712000 nid=0x4be3c waiting on condition [0x00007f8b4d5d6000]
Cpu: 6.7, threadId: 310846, "DTLS-Connection-Handler-20" #139 daemon prio=5 os_prio=0 tid=0x0000000001715800 nid=0x4be3e waiting on condition [0x00007f8b4d3d4000]
Cpu: 6.7, threadId: 310849, "DTLS-Connection-Handler-23" #142 daemon prio=5 os_prio=0 tid=0x000000000171b000 nid=0x4be41 waiting on condition [0x00007f8b4d0d1000]
Cpu: 6.7, threadId: 310852, "DTLS-Connection-Handler-26" #145 daemon prio=5 os_prio=0 tid=0x0000000001720000 nid=0x4be44 waiting on condition [0x00007f8b4cdce000]
Cpu: 6.7, threadId: 310865, "DTLS-Connection-Handler-39" #158 daemon prio=5 os_prio=0 tid=0x0000000001738800 nid=0x4be51 waiting on condition [0x00007f8b47ffe000]
Cpu: 6.7, threadId: 310868, "DTLS-Connection-Handler-42" #161 daemon prio=5 os_prio=0 tid=0x000000000173e800 nid=0x4be54 waiting on condition [0x00007f8b47cfb000]

@boaks boaks reopened this Apr 10, 2018
@boaks
Copy link
Contributor

boaks commented Apr 10, 2018

The cpus are not busy, and lots of DTLS-Connection-Handler are waiting for task.

Maybe you can give more information about your setup?

The benchmark results above are from plain coap, so it's no wonder that the
"DTLS-Connection-Handler" are waiting, they have just nothing to do.

@sbernard31
Copy link
Contributor Author

@boaks is right your issue is not linked to this performances tests.

You should open a new issue and give us more information to help to understand.
(e.g. a thread dump to see stack trace and see on which condition threads are blocked could help)

@joechen2010
Copy link

joechen2010 commented Apr 11, 2018

Hi, boaks, thanks for the quick response.
we benchmark the coap server which support both coap and coaps with different two different ports in a 8 vCPUs server, and simulate about 1000 coapclients (actually, whatever how many, we have excluded the clients cause this issue) with jmeter , the server response a hello immediately. The throughput for coap is around 40K but coaps is around 16k, and we found many UDP packets was dropped in server, seems the coap server can not handle the messages as fase as expect. and then we found most of DTLS-Connection-Handler are free.
we check out the code, we doubt the bottleneck is the StripedExecutorService which introduce a global lock that was reported by sbernard31 before (kabutz/javaspecialists#1), but we can't sure.

thread-dump.txt

@boaks
Copy link
Contributor

boaks commented Apr 11, 2018

How many "ConnectionThread"s do you use for the DTLSConnector?

@joechen2010
Copy link

it's default, should be 6 * 8Cpu = 48

@boaks
Copy link
Contributor

boaks commented Apr 11, 2018

So let me point out:
This issue was created to check, if the introduction of the striped exchange execution (PR #590) causes some unacceptable resource/performance penalty. The heap consumption was optimized easily (based on simon's first results), but for the CPU penalty of the "stripped execution" I didn't find such an easy optimization (my guess is, that the penalty is not based on the stripped lock, but I have no analysis on that). So our conclusion was, that the penalty is acceptable (also, because nobody was able to spend time into implement alternatives).

For a more general performance discussion I updated the demos and pushed the current benchmark setup to PR #607.

Using that, I start the cf-extplugtest-server/ExtendedTestServer with:

-d64 -Xmx4g -XX:+UseG1GC -onlyDtlsLoopback -noPlugtest

and the cf-extplugtest-client/BenchmarkClient

-d64 -Xmx4g -XX:+UseG1GC coaps://localhost:5784/benchmark?rlen=200 1000 5000

My results are:

1000 benchmark clients finished.
5000000 requests sent, 5000000 expected
5000000 requests in 292066ms, 17119 reqs/s
2172 retransmissions (0,04%)

My CPU monitor shows, that the CPU is almost busy, so I don't see, that there are principal "bottlenecks" caused by "waiting threads". For an improvement in that particular scenario, it would be required to find something "waiting busy", but I hope this is not the case.

So, maybe you try to adapt your benchmark setup closer to PR #607 and we will see, if there could be something found, to improve the performance. But I agree with @sbernard31, that this general issue is only related to particular issue. So if you want to provide your results, please open an other issue.

@boaks boaks closed this as completed Apr 11, 2018
@sbernard31
Copy link
Contributor Author

@joechen2010 Did you try my experimental "lockless" implementation of StripedExecutor ? Personally, I didn't find any positive impact with my tests, that's why I leave this way for now :/

@joechen2010
Copy link

Thanks for your comments.
@sbernard31 Yes, we tried that but seems nothing change.
@boaks We will try that setup later.

@joechen2010
Copy link

Hi, we found the main performance penalty is the decrypt the message in CCMBlockCipher. the cipher init every message and it 's a synchronized action, should be move to DTLSession when it creating.
Cipher cipher = Cipher.getInstance(CIPHER_NAME); cipher.init(Cipher.ENCRYPT_MODE, new SecretKeySpec(key, KEY_TYPE));

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants