Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Suggested modifications of EIP758 #963

Closed
reductionista opened this issue Apr 2, 2018 · 21 comments
Closed

Suggested modifications of EIP758 #963

reductionista opened this issue Apr 2, 2018 · 21 comments

Comments

@reductionista
Copy link
Contributor

reductionista commented Apr 2, 2018

Having made it most of the way through in implementing EIP758, I've run into a few places where I'd recommend some slight modifications of the original proposal. 3 of them have question marks next to them because I'm unsure whether they are the right solution but it's an idea.

eth_newReturnDataFilter

The first is to maintain a consistent JSON-RPC interface, by using eth_newReturnDataFilter instead of eth_newFilter with params=["ReturnData"].

Presently eth_newFilter creates a log filter, eth_newBlockFilter creates a block filter, and eth_newPendingTransactionFilter creates a new pending transaction filter. Ideally, eth_newFilter would be named eth_newLogFilter but I assume we are stuck there due to backwards compatibility requirements with older versions where logs were the only filter type you could create. Since there are no such concerns with the new return data filter type, it seems pretty straightforward that it should be named eth_newReturnDataFilter (or perhaps eth_newTransactionFilter, see below)

Add from (and to?) param to eth_newReturnDataFilter ?

The biggest issue I see with the present EIP758 is that it assumes the node can tell which client sent which transactions over http (so that only the return data from those transactions would be returned upon calling eth_getFilterChanges). However, this is not the case--at least, not unless other unspecified modifications to the JSON-RPC interface are added.

One solution to this problem is to add a from param to eth_newReturnDataFilter that specifies a list of wallet addresses the client controls and might send from. Then the node would know to associate any transactions originating from those addresses with this subscription id. It would save return data only from transactions with a from address matching one in the list, and store them for retrieval with eth_getFilterChanges. If the client really wants to get return data from all transactions that gets added to the blockchain, it could simply leave off the from param or set it to 0. Or, we could decide not to allow that as it might have greater potential for mis-use and make it easier to launch a DoS attack aimed at wasting memory resources on the node.

Instead add subID param to eth_SendTransaction & eth_SendRawTransaction ?

This could either be an alternative to the "from" param or in addition to it. My instinct is that we should avoid modifying these two JSON-RPC commands simply to get a new command to work properly. But if it's important to have eth_newReturnDataFilter function like eth_subscribe, where the set of transactions being filtered for is determined by which client sent them rather than the from address, I think this may be the only way.

Add noEmpty param to eth_newReturnDataFilter

I propose we add a noEmpty param (suggestions for other names welcome) to eth_newReturnDataFilter. This would be very easy to add to the current implementation, and provide more general functionality. If noEmpty is set, the node would notify the client only about transactions where the return data is non-empty. Presently, it returns return data for any transaction sent from the client, even if there is no data returned from that transaction (in which case the data field is just []). It's easy to imagine some cases where a client would want to know when any of the transactions it submits complete, and other cases where it would only care about ones that have non-empty return data.

NewTransactionFilter or NewCompletedTransactionFilter?

If we do decide to add the noEmpty flag, then it seems logical to explore taking this one step further. Since the functionality is then more general, we could rename eth_newReturnDataFilter to eth_NewTransactionFilter. This would reflect the more general use case where a client wants to get notified of any transaction it submits as soon as it completes execution. The internal naming of the kind of data structure returned by the node would also make more sense if they were changed too. For example, in geth instead of a ReturnData struct, which includes a tx hash, a flag indicating whether the transaction was removed due to a chain reorg, and any return data associated with the completed transaction, it could simply be named CompletedTransaction struct. (I assume the same could be said for parity.) This puts subscription to pending transactions (NewPendingTransactionFilter) and completed transactions (NewTransactionFilter or perhaps NewCompletedTransactionFilter) on the same footing. You can make a filter for either, and subscribe to either, and the code is already basically written to do this.

Also, if we opt for this idea, then it would probably make sense to call the noEmpty flag something like hasReturnData instead.

@Arachnid
Copy link
Contributor

Arachnid commented Apr 2, 2018

I'd suggest making this an issue or a PR on @pirapira's repository.

@reductionista
Copy link
Contributor Author

reductionista commented Apr 2, 2018

It looks like his fork of the EIPs repo hasn't been updated since May 2017, while the original EIP758 this is referencing was created in Nov 2017 and merged in here in Jan 2018. But if you're sure that's the best place, I'll go ahead and add it there.

@pirapira
Copy link
Member

pirapira commented Apr 3, 2018

To me, it sounds like @reductionista and @tinybike need to talk.

@reductionista
Copy link
Contributor Author

We've been talking--he's on board with the first one, but hasn't really had time to think about the rest yet. He was the one who suggested I open this Issue.

@adisney
Copy link

adisney commented Apr 11, 2018

I would suggest that the from parameter be added as an element in a filter object that you can pass to both the websocket and the polling filter. In fact, I suggest that filter allow one more field: to.

For example,

{
    "from": "0x00000000000000000000000000000000000000000000000000000000deadbeef",
    "to": "0x0000000000000000000000000000000000000000000000000000000000cabbac"
}

Both fields would be optional.

@reductionista
Copy link
Contributor Author

@adisney I agree. The from field is more important, but if we're going to add one we may as well add the other.

By the way, the eth_newReturnDataFilter change has now been merged.

@tinybike
Copy link
Member

@reductionista I'm on board with #2. Should we add from and to fields to the new subscription as well? I think it's good to have comparable functionality for http rpc and pubsub.

The subID thing isn't a bad idea but I agree that we shouldn't modify sendtx and sendrawtx just for this. Let's hold off on that IMO.

I like the "new completed transaction filter" proposal, and I think a hasReturnData flag sounds great.

@adisney is this all reasonable on your end? Trying to avoid moving goalposts on you guys!

@adisney
Copy link

adisney commented May 15, 2018

#2 makes sense to me. I agree the filtering should be available in both rpc and pubsub.

I think the hasReturnData flag makes sense and I have no opinion regarding the "new completed transaction filter" rename. That's a fairly straightforward update.

@reductionista
Copy link
Contributor Author

@tinybike I'd lean toward saying yes, we should include from and to fields with eth_subscribe. It's not really needed there since it already filters by rpc client--but it makes sense to me to make them as parallel as possible.

@reductionista
Copy link
Contributor Author

reductionista commented May 16, 2018

Oh, and I have one more proposal for modifying the EIP, which I only thought of after opening this issue:

I think when removed=true (signaling the transaction was removed from canonical chain), there is no reason to include the actual return data. It requires re-executing extra transactions again after the transaction was removed, which is a waste of CPU, plus wasting extra bandwidth sending information that was already sent to the client previously (when the transaction was mined and removed=false). The only new information is that the transaction was removed, so it's really just the removed=true flag that matters (and the tx hash and subscription id). My opinion is that the rpc client should be responsible for remembering what that return data was, if it needs to (although most of the time I would expect, there would be no reason for it to care about the return data any more, as the transaction was not even included in the canonical chain).

@tinybike
Copy link
Member

I think that makes sense, re: not re-executing the transaction on removed=true.

@adisney
Copy link

adisney commented May 16, 2018

I don't know that we can assume that the data has already been sent to the client in the case of removed=false. I could see situations in which a client could take advantage of the removed data without being aware of the data as it was originally executed.

FWIW, I've already implemented that portion of the EIP for Parity, so I'm biased. To ease the burden on the CPU I'm caching the return data after the block is replayed the first time. The additional CPU load could be a problem not only for replaying txns when the block is removed but also if multiple clients are connected and subscribed to return_data.

In general, it seems like there are members of the Ethereum community who are pushing to have node client designs take into account that not everyone will be running their own node. That means there will be more public nodes out there (e.g. Infura) and so node implementations should take into account that many clients may be connected to a single node. This is why Parity asked me to add caching to my EIP pull request. Incidentally, that caching addresses the CPU load concern in the case of a block being removed from the canonical chain.

@tinybike
Copy link
Member

tinybike commented May 16, 2018

I don't know that we can assume that the data has already been sent to the client in the case of removed=false. I could see situations in which a client could take advantage of the removed data without being aware of the data as it was originally executed.

FWIW, I've already implemented that portion of the EIP for Parity, so I'm biased.

@adisney These are both fair points. Okay, let's keep the removed=true return data requirement in. (Pinging @reductionista as well.)

@reductionista
Copy link
Contributor Author

@adisney Good point about caching--yes, that would avoid the extra CPU time. I guess I'm also biased having not implemented any transaction replays, but as long as the node is behaving according to the EIP758 spec, I don't see any case where a removed=true could arrive with no previous removed=false. The only transactions it gets notified about are ones which it submitted after the subscription begins. If the client submits a transaction, and then it subscribes to return data, it should not receive any notifications about that transaction--if it did, that would be in violation of the EIP758 spec. Would you agree?

As I see it, the removed=true is a way for the node to signal the rpc client saying "please forget about that data I just sent you!" If it never received any return data in the first place, then there would be no reason to send this signal.

"The additional CPU load could be a problem not only for replaying txns when the block is removed but also if multiple clients are connected and subscribed to return_data."

Personally, I don't see a good reason why a tx would ever need to be replayed--so this has not been a relevant issue for me so far with geth (which has never had to replay transactions for any reason, so far). I should probably look at your Parity code to see how you accomplish this, but from my investigations into it so far it looks to me like it would be a complex and costly operation. You can't guarantee that the return data will be accurate unless you replay all prior transactions in the same block, which is often hundreds of transactions. So from my perspective, the problem isn't multiple vs single clients, it's that you shouldn't be re-running hundreds of transactions just to get the return data from a single transaction, when it could have just been sent as soon as the transaction completed originally.

@adisney
Copy link

adisney commented May 17, 2018

The only transactions it gets notified about are ones which it submitted after the subscription begins. If the client submits a transaction, and then it subscribes to return data, it should not receive any notifications about that transaction--if it did, that would be in violation of the EIP758 spec. Would you agree?

I don't believe I agree. My thinking is that we should send the return data for all txns that were on blocks that were added and send return data with removed: true for all blocks that were removed. The client would not need to know if it sent return_data for a txn that was on an added block before determining if it should send return_data for a invalidated block. It seems like that would simplify things as you don't need to track txns sent to the clients.

Regarding txn replays, all of your points regarding replaying txns are valid. Unfortunately, in the case of Parity, the location at which the subscriptions and HTTP RPC functionality lives does not have access to the txns as they execute. In fact, it doesn't have access to the txn return data at all unless the txns are replayed. Since a txn may depend on state that was changed by a txn in the same block, getting the return_data for a specific txn requires executing all of the txns that came before it in that block. Or, at the very least, restoring the state the txn saw when it began execution.

For that reason, replaying the txns is unavoidable without significant modification to Parity and the CPU load is a given, which is why the caching is so important.

@reductionista
Copy link
Contributor Author

@adisney I think in your first paragraph, you are disagreeing with something that was stated twice in EIP-785 as it currently stands. Both times the word "after" was emphasized in italics, which to me indicates that this was considered an important requirement.

First time:

Notifications are sent to the client for all transactions submitted from the client that are sealed after subscribing.

and again later:

All transactions submitted by the client that were sealed after the initial eth_newReturnDataFilter request are included in this array.

Given that the EIP as it stands is not fully implementable, we have to make some changes. So I'm not saying there's anything wrong with disagreeing with the current EIP. Just pointing out that what you're suggesting is a major change, and would change the original vision (which is what my implementation is based around).

"It seems like that would simplify things as you don't need to track txns sent to the clients."

Tracking transactions submitted by the rpc client is one of the central features of the way I designed mine, so I guess I'm biased in wanting to keep this; but it does seem useful to me. I saw it as a necessary requirement of the EIP, and already invested the time at the very beginning in designing it to work that way, but if @tinybike thinks it's not necessary we can change that I guess.

"Unfortunately, in the case of Parity, the location at which the subscriptions and HTTP RPC functionality lives does not have access to the txns as they execute."

The same is true in geth, rpc module is somewhat isolated from the blockchain itself. However, in geth there is an easy way to pass messages back and forth using Go channels, which is what I use to pass return data back to the rpc module. I assume there is some way for two different modules to communicate in Rust, but don't know if the architecture of Parity might make that more difficult. If so, I guess that explains why we've come at this project from such different angles.

@adisney
Copy link

adisney commented May 17, 2018

@adisney I think in your first paragraph, you are disagreeing with something that was stated twice in EIP-785 as it currently stands. Both times the word "after" was emphasized in italics, which to me indicates that this was considered an important requirement.

Fair point!

Fortunately, for txns in newly added blocks, my implementation follows the spec as written. However, my implementation does send return_data for blocks being removed that may have initially been sealed prior to the subscription. If it's determined that the EIP should specify that return data be sent on removed blocks and it's important that the removed txn data not be sent for blocks sealed prior to the subscription then I will have to make some changes. If the EIP is updated so that return data need not be sent when a block is removed, then I won't have to change as much.

@reductionista
Copy link
Contributor Author

reductionista commented Jun 11, 2018

I've revised the EIP with all of the changes we've come to agreement about here (from/to filtering, hasReturnData, and renaming ReturnData to CompletedTransaction and NewReturnDataFilter to NewCompletedTransactionFilter).

I didn't make the removed=true change, as it seems we are still in disagreement about that. I'm still in favor of it, but @tinybike has yet to respond to my latest arguments for it.

#1148

@reductionista
Copy link
Contributor Author

reductionista commented Jun 11, 2018

Copying over here an important point I made in recent Discord discussions:

Here's my philosophy about what these events should mean... feel free to disagree with any of this. Purpose of this subscription or filter is for the client to get notified of new completed transaction events. Purpose of the removed=true version of the event is to say "oh wait, nevermind about that last one I just notified you about... please disregard!". So if the client does receive a nevermind message without the original, it's outside of the intended behavior, and should just ignore it anyway. But I do think the "never send a removed=true without first sending a removed=false" rule should be enforced as much as possible on the server side also. It's already enforced in my implementation for eth_subscribe, and I don't think it would be difficult to ensure it's also enforced for polling, if we care about that. Original EIP does stipulate this is enforced, but we can decide to relax that if we want.

This was in response to the argument that we should include return data for removed=true in case the client never received a removed=false.

@reductionista
Copy link
Contributor Author

Update regarding the last remaining issue: @tinybike has convinced me now that it's better to include ReturnData with removed=true events, in order to be consistent with existing log filter/notification behavior. While I think it probably would be better if they both didn't include the Data field, I withdraw my suggestion to change it for EIP758--let's just keep it as is.

I think everything is settled now as far as the specs! But I'll leave this Issue open until my latest revision of the EIP gets merged, in case review of it brings up any other further concerns.

@reductionista
Copy link
Contributor Author

All changes have been merged now--closing issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants
@Arachnid @pirapira @adisney @tinybike @reductionista and others