Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Imp: try to fix too many files open error #797

Merged
merged 7 commits into from
Nov 17, 2020
Merged

Conversation

wenxuwan
Copy link
Member

When the amount of concurrency is high, a large number of tcp links will be created, resulting in a large number of time_awaits, resulting in too many files error, as shown in the figure below.

A maximum of 1024 files are allowed to open
image

image

image

@AlexStocks AlexStocks requested review from pantianying and fangyincheng and removed request for pantianying October 18, 2020 03:53
@AlexStocks AlexStocks changed the title try to fix too many files open error Imp: try to fix too many files open error Oct 18, 2020
@wenxuwan
Copy link
Member Author

wenxuwan commented Oct 19, 2020

Add more information for review.

The qps and tcp connection number in v1.4.0 as below:

企业微信截图_b1d9840d-73c3-40f3-94ed-5a81b1652918

The picture is the number of tcp connection:

企业微信截图_a4f68b12-265c-4465-bd01-c590c0bcefaf

I set poolSize to 2 in my branch and below is the test result:

企业微信截图_ae697d2d-b924-4e5d-9924-ba24c886f969

企业微信截图_0ca65779-a356-4f44-9a26-8f5a7f6e1b74

Btw, the qps value may not be accurate because it is restricted by the dubbo provider.

protocol/dubbo/pool.go Outdated Show resolved Hide resolved
protocol/dubbo/pool.go Outdated Show resolved Hide resolved
protocol/dubbo/pool.go Outdated Show resolved Hide resolved
protocol/dubbo/poolqueue.go Outdated Show resolved Hide resolved
protocol/dubbo/pool.go Outdated Show resolved Hide resolved
protocol/dubbo/pool.go Outdated Show resolved Hide resolved
protocol/dubbo/pool.go Show resolved Hide resolved
protocol/dubbo/pool.go Outdated Show resolved Hide resolved
@zouyx zouyx added this to the v1.5.5 milestone Oct 30, 2020
@zouyx zouyx added bug Something isn't working good first issue Good for newcomers improve Refactor or improve and removed bug Something isn't working labels Nov 3, 2020
@watermelo watermelo self-requested a review November 3, 2020 10:35
protocol/dubbo/client.go Outdated Show resolved Hide resolved
protocol/dubbo/pool.go Show resolved Hide resolved
@gaoxinge

This comment has been minimized.

@wenxuwan
Copy link
Member Author

wenxuwan commented Nov 10, 2020

@wenxuwan Nice job, and I approve your pull request. But I still have three confusions:

  1. Why you use cas + busy waiting instead of a lock to protect the enque of a SPMC queue?
  • cas + busy waiting:

for {
ok := atomic.CompareAndSwapUint32(&c.pool.pushing, 0, 1)
if ok {
c.pool.poolQueue.pushHead(conn)
c.pool.pushing = 0
c.pool.ch <- struct{}{}
return
}
failNumber++
if failNumber%10 == 0 {
time.Sleep(1e6)
}
}

  • lock:
 	lock()
 	c.pool.poolQueue.pushHead(conn) 
 	c.pool.ch <- struct{}{}
 	unlock()
  1. Why you choose the SPMC queue instead of MPMC queue?
  1. Why you use channel + queue instead of only one channel to implement connection pool?
  • I think you can put connection into a channel directly, but I'm not sure. Codes may like:
func (p *gettyRPCClientPool) get(protocol, addr string) {
    once.Do(
        func() {
               p.ch = make(chan *gettyRPCClient, p.maxSize)
		for i := 0; i < p.maxSize; i++ {
			p.ch <- nil
		}
        }
    )
    
    conn := <- p.ch
    if conn == nil {
        return newGettyRPCClientConn(p, protocol, addr)
    } 
    return conn
}

func (p *gettyRPCClientPool) put(conn *gettyRPCClient) {
    p.ch <- conn
}

@gaoxinge ,Thank you for your careful review。

Why you use cas + busy waiting instead of a lock to protect the enque of a SPMC queue?

I just want to use atomic instead lock, maybe there is no big difference in their performance with sleep.

Why you choose the SPMC queue instead of MPMC queue?

I'am not familiar with MPMC queue, i will read the code you provided in the comments.

Why you use channel + queue instead of only one channel to implement connection pool?

This is a good question. In my opinion, use signal channel it is ok, but there is such a scene,for example, maxSize is three, When the three tasks come, will create three new conns. Then task1 and task2 failed, task3 success. Now the value in chan should be nil, nil, conn. when new task come, it will create new connections although there have available conn.In other words, if we only use channel, the conn will not be reused before the channel is filled with conn . So in my code ,the queue used for store available conn, channel used to control the number of concurrent tasks.

@gaoxinge
Copy link

gaoxinge commented Nov 10, 2020

@wenxuwan Thank you for your answers. You have designd and implemented the connetion pool carefully and throughly, so I have no other questions.

Copy link
Member

@zouyx zouyx left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about now? If no more questions, i will merge it to 1.4 and release it.

@wenxuwan wenxuwan force-pushed the master branch 2 times, most recently from 7232064 to 36f92db Compare November 14, 2020 14:55
@codecov-io
Copy link

codecov-io commented Nov 16, 2020

Codecov Report

Merging #797 (7232064) into 1.4 (e80c2e4) will decrease coverage by 2.23%.
The diff coverage is 64.70%.

Impacted file tree graph

@@            Coverage Diff             @@
##              1.4     #797      +/-   ##
==========================================
- Coverage   67.10%   64.87%   -2.24%     
==========================================
  Files         175      176       +1     
  Lines        9355     8054    -1301     
==========================================
- Hits         6278     5225    -1053     
+ Misses       2465     2235     -230     
+ Partials      612      594      -18     
Impacted Files Coverage Δ
protocol/dubbo/poolqueue.go 54.00% <54.00%> (ø)
protocol/dubbo/pool.go 79.05% <70.58%> (-2.23%) ⬇️
protocol/dubbo/client.go 66.45% <76.92%> (-2.64%) ⬇️
protocol/dubbo/dubbo_invoker.go 81.03% <80.00%> (-1.23%) ⬇️
common/extension/health_checker.go 50.00% <0.00%> (-16.67%) ⬇️
config_center/nacos/facade.go 58.62% <0.00%> (-13.26%) ⬇️
config_center/nacos/factory.go 0.00% <0.00%> (-12.50%) ⬇️
config_center/zookeeper/factory.go 0.00% <0.00%> (-12.50%) ⬇️
filter/filter_impl/metrics_filter.go 87.50% <0.00%> (-12.50%) ⬇️
...ter/cluster_impl/registry_aware_cluster_invoker.go 62.50% <0.00%> (-12.50%) ⬇️
... and 170 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update e80c2e4...c8cf468. Read the comment docs.

@AlexStocks AlexStocks merged commit 4b73658 into apache:1.4 Nov 17, 2020
cityiron pushed a commit that referenced this pull request Apr 11, 2021
Imp: try to fix too many files open error
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
good first issue Good for newcomers improve Refactor or improve
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants