-
-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Can't scale out origin over 64 rtmp streams #999
Comments
Your ffmepg is encoding with x264. Is your hardware good enough for 100 encodings? Check your CPU stats. And is your network performance good enough? And check if the kernel option is tuned. Check the socket buffer size and ulimit. |
@getroot we could scale out to 300 ffmpeg instances, but without trying to watch the video.
|
Have you tuned the network kernel values? Please check this first. And it's hard to analyze the problem with just that part of the log. I often test 200+ rtmp inputs and I haven't encountered the same problem yet. When you encode with ffmpeg, if you set the bitrate to cbr 1 meg, does the result change? And what about the cpu usage per thread of the ome process when the problem occurs? |
@getroot I tried so many different cases, with default kernel, and with setup suggested in this bug. I don't think that traffic is an issue, 100 ffmpeg doesn't create a lot of load actually. We modified our test case to play pre-recorded video instead of encoding it with test stream. We were able to feed 600 rtmps without traffic from the edges, but the problems start with 64+ rtmp streams:
I am going to send you a video with our steps and setup to reproduce the issue in email. |
Here is the kernel settings that we apply. The only difference is net.ipv4.tcp_fin_timeout=30 to be safe with GCP NAT:
|
Thanks for the detailed report. I'll look into this. |
I saw the video and configuration file in the email you sent. Could you explain the repro path again step by step? Your email is below.
|
And the log in the video is a bit incomprehensible. The player is trying to play /push/rtsp, but the log doesn't print anything about it, only a trail of trying and failing to play /push/load500. Is ws://localhost:3333/push/rtsp making a request to another OvenMediaEngine server that isn't getting recorded video? Is it composed of Origin-Edge, and only Origin appears in the video, and Edge has a different configuration? So, is the Server.xml you sent a configuration for Origin? |
@getroot The idea was to provide you the simplest configuration which helps to reproduce the issue with 64+ streams, as you can see on the video, the actual number somewhere between 70 and 75.
Enable RTMP Provider and RTSP Pull Provider
The same problem we experience in our cluster with multiple origins and edges: |
Reproducible with both dev and latest docker containers Step 1:
Step 2: Step 3:
Step 4 (Repeat step 2): Step 5:
Step 6: |
Thanks for the detailed reproduction path. I'll start this as soon as the task I'm doing now is finished. |
Thanks for reporting the bug. I patched a problem where stream ids generated by different types of providers could be duplicated and committed it to the master branch. Please check if the image airensoft/ovenmediaengine:dev solves your problem. |
Hi @getroot, thanks for the update. The dev build looks good! We couldn't reproduce the issue! |
PROBLEM
Origin performes very well until we injest ~64 RTMP streams, but as soon as it gets few more connections the OVT connections get dropped in cascade.
EXPECTATION
The CPU and memory are barely utilized because we don't use transcoding, so it would be great to be able to scale it out more than 64 streams.
ENVIRONMENTAL INFORMATION
The latest docker
Dockerfile:
Just 100 rtmp streams generated by ffmpeg with h264:
Oven configuration:
no player is needed
Reproducible on Mac and Ubuntu with Docker above ^^^
SETUP INFORMATION AND LOGS
Please upload Server.xml.
config.zip
[2023-01-12 00:16:18.620] W [OutboundWorker:31] MediaRouter | mediarouter_stream.cpp:1039 | [#dev#dev/streamXX] Detected out of order DTS of packet. track_id:0 dts:536201->536201
[2023-01-12 00:16:18.659] W [OutboundWorker:31] MediaRouter | mediarouter_stream.cpp:1039 | [#dev#dev/streamXX] Detected out of order DTS of packet. track_id:0 dts:536202->536201
[2023-01-12 00:16:51.457] I [SPRTMP-T1935:14] Provider | stream.cpp:49 | Unknown/(50) has been started stream
[2023-01-12 00:16:51.458] I [SPRTMP-T1935:14] RTMPProvider | rtmp_provider.cpp:152 | A RTMP client has connected from <ClientSocket: 0x7f66d6fe5610, #50, Connected, TCP, Nonblocking, IP1:38518>
[2023-01-12 00:16:51.722] I [SPRTMP-T1935:14] AccessController | access_controller.cpp:183 | AdmissionWebhooks queried http://{host}/v1/ac whether client 10.138.0.20:38518 could access rtmp://{host}:1935/dev/streamXX. (Result : Allow Elapsed : 8 ms)
[2023-01-12 00:17:04.564] I [SPOvtPub-T9000:9] OVT | ovt_publisher.cpp:225 | OvtProvider is disconnected(1) : <ClientSocket: 0x7f66d9e2c610, #38, Closed, TCP, Nonblocking, IP2:26662>
[2023-01-12 00:17:04.564] I [SPOvtPub-T9000:9] OVT | ovt_publisher.cpp:225 | OvtProvider is disconnected(1) : <ClientSocket: 0x7f66d9e2ec10, #41, Closed, TCP, Nonblocking, IP2:26698>
[2023-01-12 00:17:05.565] I [SPOvtPub-T9000:9] OVT | ovt_publisher.cpp:225 | OvtProvider is disconnected(1) : <ClientSocket: 0x7f66d9e05810, #32, Closed, TCP, Nonblocking, IP2:10424>
[2023-01-12 00:17:05.565] I [SPOvtPub-T9000:9] OVT | ovt_publisher.cpp:225 | OvtProvider is disconnected(1) : <ClientSocket: 0x7f66d9e2fc10, #36, Closed, TCP, Nonblocking, IP2:26642>
[2023-01-12 00:17:05.565] I [SPOvtPub-T9000:9] OVT | ovt_publisher.cpp:225 | OvtProvider is disconnected(1) : <ClientSocket: 0x7f66d9e2e610, #35, Closed, TCP, Nonblocking, IP2:26628>
[2023-01-12 00:17:05.565] I [SPOvtPub-T9000:9] OVT | ovt_publisher.cpp:225 | OvtProvider is disconnected(1) : <ClientSocket: 0x7f66d9e2d210, #34, Closed, TCP, Nonblocking, IP2:10432>
[2023-01-12 00:17:05.565] I [SPOvtPub-T9000:9] OVT | ovt_publisher.cpp:225 | OvtProvider is disconnected(1) : <ClientSocket: 0x7f66d9e05610, #33, Closed, TCP, Nonblocking, IP2:10430>
[2023-01-12 00:17:05.565] I [SPOvtPub-T9000:9] OVT | ovt_publisher.cpp:225 | OvtProvider is disconnected(1) : <ClientSocket: 0x7f66d9e2d610, #37, Closed, TCP, Nonblocking, IP2:26648>
[2023-01-12 00:17:05.608] I [SPOvtPub-T9000:9] OVT | ovt_publisher.cpp:140 | OvtProvider is connected : <ClientSocket: 0x7f66d9e05610, #32, Connected, TCP, Nonblocking, IP2:24836>
[2023-01-12 00:17:05.609] I [SPOvtPub-T9000:9] OVT | ovt_publisher.cpp:225 | OvtProvider is disconnected(1) : <ClientSocket: 0x7f66d9e06410, #39, Closed, TCP, Nonblocking, IP2:26678>
[2023-01-12 00:17:05.609] I [SPOvtPub-T9000:9] OVT | ovt_publisher.cpp:225 | OvtProvider is disconnected(1) : <ClientSocket: 0x7f66d9e2be10, #40, Closed, TCP, Nonblocking, IP2:26688>
[2023-01-12 00:17:05.610] I [SPOvtPub-T9000:9] OVT | ovt_publisher.cpp:140 | OvtProvider is connected : <ClientSocket: 0x7f66d9e06410, #33, Connected, TCP, Nonblocking, IP2:24840>
OTHER HELPFUL INFORMATION
Please write any information that anyone viewing this issue can reference.
The text was updated successfully, but these errors were encountered: