Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Upstream v2.60.7 #117

Merged
merged 28 commits into from
Sep 13, 2024
Merged

Upstream v2.60.7 #117

merged 28 commits into from
Sep 13, 2024

Conversation

boyuan-chen
Copy link

@boyuan-chen boyuan-chen commented Sep 12, 2024

Upstream v2.60.7

domiwei and others added 28 commits August 15, 2024 14:29
Fixed issue with setup diagnostics client on erigon start
…erigontech#11669)

I have done specific integration test(on e2) and compare the result with
etherscan (block number 0x3D08FF: 3999999).
etherscan link:

https://etherscan.io/tx/0x535bd65cfee8655a1ffd8a28e065b47bcf557b10641491a541ab1dd08496709e#statechange
for account 0xEce701C76bD00D1C3f96410a0C69eA8Dfcf5f34E if use
block_number + 1 the debug_accountRange() returns correctly for storage
0x0000000000000000000000000000000000000000000000000000000000000004 the
new value:
0x000000000000000000000000000000000000000000000064a66fad67042ceb4a;
using current software returns the old value
(0x000000000000000000000000000000000000000000000064a3a922764918eb4b).
debug_accountRange() returns :
 "storage": {
     ......
"0x0000000000000000000000000000000000000000000000000000000000000004":
"64a66fad67042ceb4a",
      ----
     }
 If approved I will make change for e3
…ontech#11649)

- Added functionality to grab the heap profile by calling the
diagnostics endpoint
- Added support to pass heap profile file to diagnostics UI through
WebSocket
…chan… (erigontech#11810)

cherry-pick from PR erigontech#11061 from main (the regression is present also in
release 2.60)

Regression caused by this change in uint256 library:


holiman/uint256@f24ed59

It was causing trace json to encode stack values as quoted (string)
decimal values rather than a hex encoded value.

Co-authored-by: bgelb <ben.gelb@gmail.com>
In the first run, init validators was skipped due to the check. Updated
it by checking the erigon3 code to init snapshot for 1st block (this
logic is borrowed from main branch).

cherry-pick adaptation of
erigontech#11725
…ontech#11813)

**Existing behaviour:**
- Add up the possible value that user must pay beforehand to buy gas
- Deduct that amount from the sender's account in `intraBlockState`,
but:
- Don't deduct the gas value amount if the user doesn't have enough, and
`gasBailout` is set

**New behaviour:**
- Don't check if sender's balance is enough to pay gas value amount, nor
deduct it, if `gasBailout` is set

**More rationale**
This would mean the sender's account would show `"balance": "="` in
`trace_call` rpc method, that is, no change, if gas is the only thing
the user pays for. This is fine because the gas price can fluctuate in a
real transaction. This also removes the inconsistency of sometimes
having to bother deducting the amount if it is less than sender's
balance, thereby causing a bug/inconsistency.
…h#10672) (erigontech#11892)

relates to erigontech#11890
cherry pick from E3
[2f2ce6a](erigontech@2f2ce6a)

-----

#### Issue:
At line 129 in `logsfilter.go`, we had the following line of code:
```go
_, addrOk := filter.addrs[gointerfaces.ConvertH160toAddress(eventLog.Address)]
```


This line caused a panic due to a fatal error:
```logs
fatal error: concurrent map read and map write

goroutine 106 [running]:
github.com/ledgerwatch/erigon/turbo/rpchelper.(*LogsFilterAggregator).distributeLog.func1({0xc009701db8?, 0x8?}, 0xc135d26050)
github.com/ledgerwatch/erigon/turbo/rpchelper/logsfilter.go:129 +0xe7
github.com/ledgerwatch/erigon/turbo/rpchelper.(*SyncMap[...]).Range(0xc009701eb0?, 0xc009701e70?)
github.com/ledgerwatch/erigon/turbo/rpchelper/subscription.go:97 +0x11a
github.com/ledgerwatch/erigon/turbo/rpchelper.(*LogsFilterAggregator).distributeLog(0x25f4600?, 0xc0000ce090?)
github.com/ledgerwatch/erigon/turbo/rpchelper/logsfilter.go:131 +0xc7
github.com/ledgerwatch/erigon/turbo/rpchelper.(*Filters).OnNewLogs(...)
github.com/ledgerwatch/erigon/turbo/rpchelper/filters.go:547
github.com/ledgerwatch/erigon/cmd/rpcdaemon/rpcservices.(*RemoteBackend).SubscribeLogs(0xc0019c2f50, {0x32f0040, 0xc001b4a280}, 0xc001c0c0e0, 0x0?)
github.com/ledgerwatch/erigon/cmd/rpcdaemon/rpcservices/eth_backend.go:227 +0x1d1
github.com/ledgerwatch/erigon/turbo/rpchelper.New.func2()
github.com/ledgerwatch/erigon/turbo/rpchelper/filters.go:102 +0xec
created by github.com/ledgerwatch/erigon/turbo/rpchelper.New
github.com/ledgerwatch/erigon/turbo/rpchelper/filters.go:92 +0x652
```

This error indicates that there were simultaneous read and write
operations on the `filter.addrs` map, leading to a race condition.


#### Solution:
To resolve this issue, I implemented the following changes:

- Moved SyncMap to erigon-lib common library: This allows us to utilize
a thread-safe map across different packages that require synchronized
map access.
- Refactored logsFilter to use SyncMap: By replacing the standard map
with SyncMap, we ensured that all map operations are thread-safe, thus
preventing concurrent read and write errors.
- Added documentation for SyncMap usage: Detailed documentation was
provided to guide the usage of SyncMap and related refactored
components, ensuring clarity and proper utilization.

Co-authored-by: Bret <787344+bretep@users.noreply.github.com>
…ontech#10718) (erigontech#11894)

relates to erigontech#11890
cherry pick from E3
erigontech@adf3f43

-----

- Introduced `GetOrCreateGaugeVec` function to manage Prometheus
`GaugeVec` metrics.
- Implemented various subscription filter limits (max logs, max headers,
max transactions, max addresses, and max topics).
- Updated various test files to utilize the new `FiltersConfig` and
added comprehensive tests for the new filter limits.
- Added new flags for subscription filter limits.
- Improved log filtering and event handling with better memory
management and metric tracking.
- Added configuration to handle RPC subscription filter limits.
- Minor typo fixes.

-------
RPC nodes would frequently OOM. Based on metrics and pprof, I observed
high memory utilization in the rpchelper:
```text
File: erigon
Type: inuse_space
Entering interactive mode (type "help" for commands, "o" for options)
(pprof) top
Showing nodes accounting for 69696.67MB, 95.37% of 73083.31MB total
Dropped 1081 nodes (cum <= 365.42MB)
Showing top 10 nodes out of 43
      flat  flat%   sum%        cum   cum%
65925.80MB 90.21% 90.21% 65925.80MB 90.21%  github.com/ledgerwatch/erigon/turbo/rpchelper.(*LogsFilterAggregator).distributeLog.func1
 1944.80MB  2.66% 92.87%  1944.80MB  2.66%  github.com/ledgerwatch/erigon/turbo/rpchelper.(*Filters).AddLogs.func1
 1055.97MB  1.44% 94.31%  1055.97MB  1.44%  github.com/ledgerwatch/erigon/turbo/rpchelper.(*Filters).AddPendingTxs.func1
  362.04MB   0.5% 94.81%   572.20MB  0.78%  github.com/ledgerwatch/erigon/core/state.(*stateObject).GetCommittedState
  257.73MB  0.35% 95.16%   587.85MB   0.8%  encoding/json.Marshal
   57.85MB 0.079% 95.24%  1658.66MB  2.27%  github.com/ledgerwatch/erigon/core/vm.(*EVMInterpreter).Run
   51.48MB  0.07% 95.31%  2014.50MB  2.76%  github.com/ledgerwatch/erigon/rpc.(*handler).handleCallMsg
   23.01MB 0.031% 95.34%  1127.21MB  1.54%  github.com/ledgerwatch/erigon/core/vm.opCall
   11.50MB 0.016% 95.36%  1677.68MB  2.30%  github.com/ledgerwatch/erigon/core/vm.(*EVM).call
    6.50MB 0.0089% 95.37%   685.70MB  0.94%  github.com/ledgerwatch/erigon/turbo/transactions.DoCall
(pprof) list distributeLog.func1
Total: 71.37GB
ROUTINE ======================== github.com/ledgerwatch/erigon/turbo/rpchelper.(*LogsFilterAggregator).distributeLog.func1 in github.com/ledgerwatch/erigon/turbo/rpchelper/logsfilter.go
   64.38GB    64.38GB (flat, cum) 90.21% of Total
         .          .    206:   a.logsFilters.Range(func(k LogsSubID, filter *LogsFilter) error {
         .          .    207:           if filter.allAddrs == 0 {
         .          .    208:                   _, addrOk := filter.addrs.Get(gointerfaces.ConvertH160toAddress(eventLog.Address))
         .          .    209:                   if !addrOk {
         .          .    210:                           return nil
         .          .    211:                   }
         .          .    212:           }
         .          .    213:           var topics []libcommon.Hash
         .          .    214:           for _, topic := range eventLog.Topics {
   27.01GB    27.01GB    215:                   topics = append(topics, gointerfaces.ConvertH256ToHash(topic))
         .          .    216:           }
         .          .    217:           if filter.allTopics == 0 {
         .          .    218:                   if !a.chooseTopics(filter, topics) {
         .          .    219:                           return nil
         .          .    220:                   }
         .          .    221:           }
   37.37GB    37.37GB    222:           lg := &types2.Log{
         .          .    223:                   Address:     gointerfaces.ConvertH160toAddress(eventLog.Address),
         .          .    224:                   Topics:      topics,
         .          .    225:                   Data:        eventLog.Data,
         .          .    226:                   BlockNumber: eventLog.BlockNumber,
         .          .    227:                   TxHash:      gointerfaces.ConvertH256ToHash(eventLog.TransactionHash),
(pprof) %
```

- Added Metrics Implemented metrics to gain visibility into memory
utilization.
- Optimized Allocations Optimized object and slice allocations,
resulting in significant improvements in memory usage.
```text
File: erigon
Type: inuse_space
Entering interactive mode (type "help" for commands, "o" for options)
(pprof) top
Showing nodes accounting for 25.92GB, 94.14% of 27.53GB total
Dropped 785 nodes (cum <= 0.14GB)
Showing top 10 nodes out of 70
      flat  flat%   sum%        cum   cum%
   15.37GB 55.82% 55.82%    15.37GB 55.82%  github.com/ledgerwatch/erigon/turbo/rpchelper.(*Filters).AddLogs.func1
    8.15GB 29.60% 85.42%     8.15GB 29.60%  github.com/ledgerwatch/erigon/turbo/rpchelper.(*Filters).AddPendingTxs.func1
    0.64GB  2.33% 87.75%        1GB  3.65%  github.com/ledgerwatch/erigon/turbo/rpchelper.(*LogsFilterAggregator).distributeLog
    0.50GB  1.82% 89.57%     0.65GB  2.35%  github.com/ledgerwatch/erigon/core/state.(*stateObject).GetCommittedState
    0.36GB  1.32% 90.89%     0.36GB  1.32%  github.com/ledgerwatch/erigon/turbo/rpchelper.(*LogsFilterAggregator).distributeLog.func1
    0.31GB  1.12% 92.01%     0.31GB  1.12%  bytes.growSlice
    0.30GB  1.08% 93.09%     0.30GB  1.08%  github.com/ledgerwatch/erigon/core/vm.(*Memory).Resize (inline)
    0.15GB  0.56% 93.65%     0.35GB  1.27%  github.com/ledgerwatch/erigon/core/state.(*IntraBlockState).AddSlotToAccessList
    0.07GB  0.24% 93.90%     0.23GB  0.84%  github.com/ledgerwatch/erigon/core/types.codecSelfer2.decLogs
    0.07GB  0.24% 94.14%     1.58GB  5.75%  github.com/ledgerwatch/erigon/core/vm.(*EVMInterpreter).Run
(pprof) list AddLogs.func1
Total: 27.53GB
ROUTINE ======================== github.com/ledgerwatch/erigon/turbo/rpchelper.(*Filters).AddLogs.func1 in github.com/ledgerwatch/erigon/turbo/rpchelper/filters.go
   15.37GB    15.37GB (flat, cum) 55.82% of Total
         .          .    644:   ff.logsStores.DoAndStore(id, func(st []*types.Log, ok bool) []*types.Log {
         .          .    645:           if !ok {
         .          .    646:                   st = make([]*types.Log, 0)
         .          .    647:           }
   15.37GB    15.37GB    648:           st = append(st, logs)
         .          .    649:           return st
         .          .    650:   })
         .          .    651:}
         .          .    652:
         .          .    653:// ReadLogs reads logs from the store associated with the given subscription ID.
(pprof) list AddPendingTxs.func1
Total: 27.53GB
ROUTINE ======================== github.com/ledgerwatch/erigon/turbo/rpchelper.(*Filters).AddPendingTxs.func1 in github.com/ledgerwatch/erigon/turbo/rpchelper/filters.go
    8.15GB     8.15GB (flat, cum) 29.60% of Total
         .          .    686:   ff.pendingTxsStores.DoAndStore(id, func(st [][]types.Transaction, ok bool) [][]types.Transaction {
         .          .    687:           if !ok {
         .          .    688:                   st = make([][]types.Transaction, 0)
         .          .    689:           }
    8.15GB     8.15GB    690:           st = append(st, txs)
         .          .    691:           return st
         .          .    692:   })
         .          .    693:}
         .          .    694:
         .          .    695:// ReadPendingTxs reads pending transactions from the store associated with the given subscription ID.
(pprof) %
```
- Identified unbound slices that could grow indefinitely, leading to
memory leaks. This issue occurs if subscribers do not request updates
using their subscription ID, especially when behind a load balancer that
does not pin clients to RPC nodes.

1. Architecture Design Changes
- Implement architectural changes to pin clients to RPC Nodes ensuring
subscribers that request updates using their subscription IDs will hit
the same node, which does clean up the objecs on each request.
- Reason for not choosing: This still relies on clients with
subscriptions to request updates and if they never do and do not call
the unsubscribe function it's an unbound memory leak. The RPC node
operator has no control over the behavior of the clients.
2. Implementing Timeouts
- Introduce timeouts for subscriptions to automatically clean up
unresponsive or inactive subscriptions.
- Reason for not choosing: Implementing timeouts would inadvertently
stop websocket subscriptions due to the current design of the code,
leading to a potential loss of data for active users. This solution
would be good but requires a lot more work.
3. Configurable Limits (Chosen Solution)
- Set configurable limits for various subscription parameters (e.g.,
logs, headers, transactions, addresses, topics) to manage memory
utilization effectively.
- Reason for choosing: This approach provides flexibility to RPC node
operators to configure limits as per their requirements. The default
behavior remains unchanged, making it a non-breaking change.
Additionally, it ensures current data is always available by pruning the
oldest data first.

```bash
	--rpc.subscription.filters.maxlogs=60
	--rpc.subscription.filters.maxheaders=60
	--rpc.subscription.filters.maxtxs=10_000
	--rpc.subscription.filters.maxaddresses=1_000
	--rpc.subscription.filters.maxtopics=1_000
```
```text
File: erigon
Type: inuse_space
Entering interactive mode (type "help" for commands, "o" for options)
(pprof) top
Showing nodes accounting for 1146.51MB, 89.28% of 1284.17MB total
Dropped 403 nodes (cum <= 6.42MB)
Showing top 10 nodes out of 124
      flat  flat%   sum%        cum   cum%
  923.64MB 71.92% 71.92%   923.64MB 71.92%  github.com/ledgerwatch/erigon/turbo/rpchelper.(*Filters).AddPendingTxs.func1
   54.59MB  4.25% 76.18%    54.59MB  4.25%  github.com/ledgerwatch/erigon/rlp.(*Stream).Bytes
   38.21MB  2.98% 79.15%    38.21MB  2.98%  bytes.growSlice
   29.34MB  2.28% 81.44%    29.34MB  2.28%  github.com/ledgerwatch/erigon/p2p/rlpx.growslice
   26.44MB  2.06% 83.49%    34.16MB  2.66%  compress/flate.NewWriter
   18.66MB  1.45% 84.95%    21.16MB  1.65%  github.com/ledgerwatch/erigon/core/state.(*stateObject).GetCommittedState
   17.50MB  1.36% 86.31%    64.58MB  5.03%  github.com/ledgerwatch/erigon/core/types.(*DynamicFeeTransaction).DecodeRLP
      15MB  1.17% 87.48%    82.08MB  6.39%  github.com/ledgerwatch/erigon/core/types.UnmarshalTransactionFromBinary
   13.58MB  1.06% 88.54%    13.58MB  1.06%  github.com/ledgerwatch/erigon/turbo/stages/headerdownload.(*HeaderDownload).addHeaderAsLink
    9.55MB  0.74% 89.28%     9.55MB  0.74%  github.com/ledgerwatch/erigon/turbo/rpchelper.newChanSub[...]
(pprof) list AddPendingTxs.func1
Total: 1.25GB
ROUTINE ======================== github.com/ledgerwatch/erigon/turbo/rpchelper.(*Filters).AddPendingTxs.func1 in github.com/ledgerwatch/erigon/turbo/rpchelper/filters.go
  923.64MB   923.64MB (flat, cum) 71.92% of Total
         .          .    698:   ff.pendingTxsStores.DoAndStore(id, func(st [][]types.Transaction, ok bool) [][]types.Transaction {
         .          .    699:           if !ok {
         .          .    700:                   st = make([][]types.Transaction, 0)
         .          .    701:           }
         .          .    702:
         .          .    703:           // Calculate the total number of transactions in st
         .          .    704:           totalTxs := 0
         .          .    705:           for _, txBatch := range st {
         .          .    706:                   totalTxs += len(txBatch)
         .          .    707:           }
         .          .    708:
         .          .    709:           maxTxs := ff.config.RpcSubscriptionFiltersMaxTxs
         .          .    710:           // If adding the new transactions would exceed maxTxs, remove oldest transactions
         .          .    711:           if maxTxs > 0 && totalTxs+len(txs) > maxTxs {
         .          .    712:                   // Flatten st to a single slice
  918.69MB   918.69MB    713:                   flatSt := make([]types.Transaction, 0, totalTxs)
         .          .    714:                   for _, txBatch := range st {
         .          .    715:                           flatSt = append(flatSt, txBatch...)
         .          .    716:                   }
         .          .    717:
         .          .    718:                   // Remove the oldest transactions to make space for new ones
         .          .    719:                   if len(flatSt)+len(txs) > maxTxs {
         .          .    720:                           flatSt = flatSt[len(flatSt)+len(txs)-maxTxs:]
         .          .    721:                   }
         .          .    722:
         .          .    723:                   // Convert flatSt back to [][]types.Transaction with a single batch
         .          .    724:                   st = [][]types.Transaction{flatSt}
         .          .    725:           }
         .          .    726:
         .          .    727:           // Append the new transactions as a new batch
    5.95MB     4.95MB    728:           st = append(st, txs)
         .          .    729:           return st
         .          .    730:   })
         .          .    731:}
         .          .    732:
         .          .    733:// ReadPendingTxs reads pending transactions from the store associated with the given subscription ID.
```

With these changes, the memory utilization has significantly improved,
and the system is now more stable and predictable.

-----
<img width="1567" alt="image"

src="https://github.com/ledgerwatch/erigon/assets/787344/264c9757-93cf-4be8-9883-5ca3187acd73">

Blue line indicates a deployment
Sharp drop of green line is an OOM

Co-authored-by: Bret <787344+bretep@users.noreply.github.com>
…ntech#11908)

relates to erigontech#11890
cherry pick from E3 to E2:
erigontech@b760da2

----

- Simplify and enhance tests.
- Add test for invalid slice operations panic.
- Remove unused Mutex

----
### Issue
I experienced a rare panic with the new filter code.

```text
panic: runtime error: slice bounds out of range [121:100]

goroutine 25311363 [running]:
github.com/ledgerwatch/erigon/turbo/rpchelper.(*Filters).AddPendingTxs.func1({0xc011c67020?, 0xc00a049e20?, 0xc020720b40?}, 0x48?)
github.com/ledgerwatch/erigon/turbo/rpchelper/filters.go:720 +0x31a
github.com/ledgerwatch/erigon-lib/common/concurrent.(*SyncMap[...]).DoAndStore.func1(0x20?)
github.com/ledgerwatch/erigon-lib@v1.0.0/common/concurrent/concurrent.go:52 +0x22
github.com/ledgerwatch/erigon-lib/common/concurrent.(*SyncMap[...]).Do(0x33209a0, {0xc013409fa0, 0x20}, 0xc00a049ee0)
github.com/ledgerwatch/erigon-lib@v1.0.0/common/concurrent/concurrent.go:40 +0xff
github.com/ledgerwatch/erigon-lib/common/concurrent.(*SyncMap[...]).DoAndStore(0xc0097a3c70?, {0xc013409fa0?, 0x30?}, 0xc000bc7d40?)
github.com/ledgerwatch/erigon-lib@v1.0.0/common/concurrent/concurrent.go:51 +0x4b
github.com/ledgerwatch/erigon/turbo/rpchelper.(*Filters).AddPendingTxs(0xc010d587d0?, {0xc013409fa0?, 0xc0097de0f0?}, {0xc01239a800?, 0xc00c820500?, 0xc011beee70?})
github.com/ledgerwatch/erigon/turbo/rpchelper/filters.go:698 +0x6b
github.com/ledgerwatch/erigon/turbo/jsonrpc.(*APIImpl).NewPendingTransactionFilter.func1()
github.com/ledgerwatch/erigon/turbo/jsonrpc/eth_filters.go:24 +0x88
created by github.com/ledgerwatch/erigon/turbo/jsonrpc.(*APIImpl).NewPendingTransactionFilter
github.com/ledgerwatch/erigon/turbo/jsonrpc/eth_filters.go:22 +0xca
```

### Resolution
1. Create a unit test reproducing the panic.
2. Ensure the slicing indices are calculated correctly and do not
produce an invalid range.

### Running the new unit test on unfixed code:
```bash
$ go test
--- FAIL: TestFilters_AddPendingTxs (0.00s)
    --- FAIL: TestFilters_AddPendingTxs/TriggerPanic (0.00s)
        filters_test.go:451: AddPendingTxs caused a panic: runtime error: slice bounds out of range [10:5]
FAIL
exit status 1
FAIL    github.com/ledgerwatch/erigon/turbo/rpchelper   0.454s
```

Co-authored-by: Bret <787344+bretep@users.noreply.github.com>
…r amoy network (erigontech#11901)

[Polygon] Bor: Added Ahmedabad HF related configs and block number for
amoy network
This is the Ahmedabad block number -
[11865856](https://amoy.polygonscan.com/block/countdown/11865856)

PR in bor - [bor#1324](maticnetwork/bor#1324)
Extended pprof read API to include: goroutine, threadcreate, heap,
allocs, block, mutex
…1916)

- Changed UI serving port to 5137 as 6060 is busy with diag API
- Refactored request headers to be set in middleware
@boyuan-chen boyuan-chen merged commit 55aba34 into boba-develop Sep 13, 2024
18 checks passed
@boyuan-chen boyuan-chen deleted the release/v2.60.7 branch September 13, 2024 16:46
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.