Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Full support for websocket reconnection/resubscription #1966

Closed
wants to merge 8 commits into from

Conversation

gabmontes
Copy link
Contributor

Description

Websocket disconnection is a long standing issue in the 1.0 branch of this library and is the source of many issues when developing applications that for any reason use a not-so-reliable connection to the nodes.

Solving the disconnection and reconnection means basically resolving several major problems:

  • Reconnecting the transport layer, this is the websocket connection itself.
  • Reattaching client's event listeners that handle the different websocket connection states.
  • Resubscribing to higher-level events like incoming blocks, events.

This PR addresses these problems by:

  • Updating the events management functions of the websocket provider, adding once and updating on to use addEventListener instead of just overwriting the default event listener. That was needed to properly manage the error events that needed to be sent to every RequestManagers using the provider and later allow them to listen for the new connect events.
  • Update the RequestManager's error handling logic so it does the same steps when the connection was already established and an error occurs or when the connection cannot be established on start due to a recoverable cause. This was needed to resubscribe in all scenarios. In addition there was two small bugs that was fixed so the resubscriptions can actually be processed and issues are avoided in the case several resubscriptions are tried simultaneously (no function reentry).
  • Once errors are sent by the websocker provider, those have to be properly broadcasted and handled by all subscriptions to trigger unsubscription/resubscription logic.
  • In order to avoid extensive changes to the websocket provider, the websocket object used was -optionally- replaced by a wrapper object exposing the W3C Websocket API that automatically handles the reconnection on recoverable error conditions.
  • On Node.js environments, the websocket package used has the ability to send keepalive packets and detect a zombie connection (connection was lost but since no packages are sent, no errors are triggered). In this scenarios, the reconnection is also triggered. Unfortunately, browser implementations do not have keepalive options so client's code will need to detect zombie connections and force a restart.

The changes were extensively tested and work as expected, reconnecting and resubscribing under several different circumstances.

The commits of this PR have extensive comments explaining the rationale behind each one.

Fixes #1085
Fixes #1391
Fixes #1558
Fixes #1852
Fixes #1933

And possibly addresses others issues as well.

@nivida @frozeman Please review and evaluate this PR. I will be happy to address any comment you might have on it.

Thanks!

Type of change

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to not work as expected)

Checklist:

  • I have selected the correct base branch.
  • I have performed a self-review of my own code.
  • I have commented my code, particularly in hard-to-understand areas.
  • I have made corresponding changes to the documentation.
  • My changes generate no new warnings.
  • Any dependent changes have been merged and published in downstream modules.
  • I ran npm run test with success and extended the tests if necessary.
  • I ran npm run build and tested the resulting file from dist folder in a browser.
  • I have tested my code on the live network.

When using `.on<event>=fn` to attach listeners, only one listener can be
set at the same time. Since multiple request managers can use the same
provider, the EventTarget API has to be used to ensure all of them
receive the events emitted from the provider.

This is needed on both the `on` and `removeListener` functions.
The method `once` is required to allow the subscription logic to
identify if the provider is able to reconnect/resubscribe and then
attach to the following `connect` event the function to resubscribe.
When the subscription fails on start and when it fails after it was
successfully established, the same logic needs to be executed: remove
subscription, listen for the next `connect` event if available to
actually subscribe again, emit the error and call the callback.

Prior code did that only for established subscriptions so if a
subscription was unable to be set right on start, no resubscription was
ever tried.

The logic was moved to a single method to avoid duplication of code.
In addition reentry is avoided by checking and properly clearing the
`_reconnectIntervalId` variable.
On subscribe, if there is an existing `id`, the subscription listeners
are removed. In the case of a resubscription, the listeners have to be
kept. Therefore, the `id` property -that will change anyway- must be
cleared so the listeners are not removed.

Then, after the subscription object resubscribes, the listeners set by
the subscription user code remain untouched, making the resubscription
transparent to the user code.
When the request manager removes a subscription due to an error, it
tries to send an unsubscribe package, which can also fail if i.e. the
network is down. In such a case, the function must not allow reentry.
Removing the subscription first ensures it will not do so.

In addition, if the subscription was already removed, the callback shall
be called anyway.
When error events are emitted by the provider, all subscriptions shall
receive the event and trigger the unsubscription/resubscription logic.
By wrapping the available WebSocket implementation (native WebSocket
object or `websocket` package) with `websocket-reconnector`, the
provider is given a WebSocket that will automatically reconnect on
errors.

A new option was added to the WebSocket provider to controll whether it
should automatically reconnect or it should behave as usual.
In the case any websocket call takes too long to return and a timeout
was set for the provider to timeout, the provider should try to restart
the connection.

This could happen, for instance, if the client loses connection with the
server, the server closes the connection and later, the connectivity is
up again but since the client did not receive the closing frame *and*
the client does not attempt to send any package to the server, no error
is observed.

`websocket` implementation for Node.js has an option to send keep-alive
frames and detect such scenarios, but the standard browser W3C WebSocket
does not, so it is "vulnerable" to this kind of failure which will
mostly affect web3 subscriptions.
@coveralls
Copy link

Coverage Status

Coverage decreased (-0.3%) to 82.839% when pulling ae17773 on bloq:reconnect-resubscribe into 326e42f on ethereum:1.0.

@frozeman
Copy link
Contributor

Looks very interesting! Thanks for describing your PR so well.
Be aware that reconnecting needs to warms the developer properly as subscriptions are based on sockets. So silent reconnects would lead to canceled subscriptions (or better those are sent to an non existing socket now) so people would wonder why they don’t get logs etc anymoe.

@nivida you probably need to manually incorporate this into your work on the ethereum provider.

@gabmontes
Copy link
Contributor Author

@frozeman thanks for taking a look at it! You made a very key point here. Most of the complexity behind this PR was to ensure the lib resubscribes after websocket reconnects.

This PR handles both automatic websocket reconnection and automatic unsubscription/resubscription on websocket reconnection so i.e. new header events and logs are received again after reconnection.

My rationale was not to notify the client/developer if some error condition is being automatically handled. That worked for me but to be honest I also thought on the possibility of adding some additional events so the client/developer is aware of the reconnection/resubscription taking place in the background.

By the way, Travis is failing only in Node.js 5 due to a rest argument in the websocket reconnection library. I will work on that so all tests pass.

@monitz87
Copy link

Does this also reattach SolidityEvent listeners? Is that a possibility if not?

@gabmontes
Copy link
Contributor Author

gabmontes commented Oct 1, 2018

@monitz87 yes it does!

Once the underlying websocket reconnects, all subscriptions (i.e. web3.eth.subscribe(), myContract.events.MyEvent()) are resend to the Ethereum node over the new websocket connection so events are received again.

@nivida nivida self-assigned this Oct 9, 2018
@nivida nivida self-requested a review October 9, 2018 05:31
@monitz87
Copy link

Maybe this is covered, but better safe than sorry. When I was toying with this concept, reattaching event listeners caused the queue to grow uncontrollably until the memory ran out.

Just a heads up. I hope this pans out and your solution gets merged soon.

@nivida
Copy link
Contributor

nivida commented Nov 20, 2018

@gabmontes First of all thanks for submitting this PR! This was on my roadmap for the next release! The problem is that I've refactored close to the entire code architecture in the ethereumProvider branch because the current code is not as good as it could be ;) (see here: #2000)
Could you do these changes against the ethereumProvider branch? I think it shouldn't be a big thing to change it to ES6 etc. sorry :)

@nivida
Copy link
Contributor

nivida commented Jan 21, 2019

This got reimplemented and improved in the ethereumProvider branch and because of this I will close your PR. Thanks for the inspiration and your efforts it speeded up the process a lot.

@nivida nivida closed this Jan 21, 2019
@gabmontes
Copy link
Contributor Author

Awesome!

@7-of-9
Copy link

7-of-9 commented Jan 23, 2019

@nivida Any word on when the ethereumProvider implementation of this will be available in an npm package version?

I just tried a server disconnect/reconnect test with npm v1 beta 37 but there wasn't any reconnect on the client when the server was put back online. Is the fix present in that release, or any special incantations required to get it to operate?

thanks in advance!

@kielabokkie
Copy link

@nivida Is this going to be part of beta.38 or will it be a later version?

@7-of-9
Copy link

7-of-9 commented Jan 25, 2019

+1 -- would be great to get this in v38 !

@7-of-9
Copy link

7-of-9 commented Jan 30, 2019

Hi @nivida , fyi -- still can't get this to work intuitively (or at all) in beta 38 or 41 (Chrome, Windows 10).

Network right click disable...
provider.on("error") fires (expected), subscription callbacks (.on('data')) stop firing (expected).

Network, right click, enable...
subscription(s) on('data') {} don't fire; also web3.eth.subscribe(.... () => {}) callbacks also don't fire.

I can see another websocket is setup and is streaming -- issue is that the subscription callbacks aren't being called (expected sections marked below /###/). Is there any "reconnect" or similar event that needs to be trapped and/or subscriptions recreated?

`
web3s[x] = new Web3(new Web3.providers.WebsocketProvider(web3_pendingTx_ws_config[x]))
const web3 = web3s[x]
const provider = web3.currentProvider

            // set disconnect/error handlers
            provider.on("error", err => { 
                console.warn(`## web3 ${x} socket error, err=`, err)
                diag_networkConnected[x] = false
                NetworkStatusChanged(x)
            })
            provider.on("end", err => {
                console.warn(`## web3 ${x} socket end, err=`, err)
                diag_networkConnected[x] = false
                NetworkStatusChanged(x)
            })
            
            provider.on("connect", data => {
                console.log(`DIAG WS ${x} - web3 - socket connect, data=`, data)
                diag_networkConnected[x] = true //web3.eth.net.isListening() // always true, even when no connection

                    var sub1 = web3.eth.subscribe('pendingTransactions', { /*options*/ }, (err, txid) => { /*###*/ })
                    sub1.on('data', (txid) => {
                       /*###*/

                        console.log(`DIAG WS ${x} - web3 - pendingTransactions`, txid) 
                        NetworkStatusChanged(x, txid)
                    })

                    var sub2 = web3.eth.subscribe('newBlockHeaders', { /*POTIONS!*/ }, (err, evt) => {  /*###*/ })
                    sub2.on('data', (evt) => {
                        console.log(`DIAG WS ${x} - web3 - newBlockHeaders`, evt) 
                         /*###*/
                        }
                    })
                }

`

@gabmontes
Copy link
Contributor Author

@nivida what do you guys think about revisiting this PR but targeting branch 1.x?

It was based on 1.0.0-beta.37 so I think it is still relevant to that branch. Of course is not compatible with 2.x due to the re-architecture.

@gabmontes
Copy link
Contributor Author

@alcuadrado please consider adding this PR to #3070 list.

@cgewecke
Copy link
Collaborator

cgewecke commented Oct 7, 2019

@gabmontes This topic is currently being tracked there and is a high priority fwiw. I make sure this PR is ref'd though. Thanks for pinging.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Enhancement Includes improvements or optimizations
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants