Skip to content

Commit

Permalink
feat: support profiling block replay during abci handshake (#14953)
Browse files Browse the repository at this point in the history
## Description

by default, the signal trap is not setup during abci handshake, so you can't profile at this stage, but it's an interesting way to profile production block data with the block replay.

---

### Author Checklist

*All items are required. Please add a note to the item if the item is not applicable and
please add links to any relevant follow up issues.*

I have...

- [ ] included the correct [type prefix](https://github.com/commitizen/conventional-commit-types/blob/v3.0.0/index.json) in the PR title
- [ ] added `!` to the type prefix if API or client breaking change
- [ ] targeted the correct branch (see [PR Targeting](https://github.com/cosmos/cosmos-sdk/blob/main/CONTRIBUTING.md#pr-targeting))
- [ ] provided a link to the relevant issue or specification
- [ ] followed the guidelines for [building modules](https://github.com/cosmos/cosmos-sdk/blob/main/docs/docs/building-modules)
- [ ] included the necessary unit and integration [tests](https://github.com/cosmos/cosmos-sdk/blob/main/CONTRIBUTING.md#testing)
- [ ] added a changelog entry to `CHANGELOG.md`
- [ ] included comments for [documenting Go code](https://blog.golang.org/godoc)
- [ ] updated the relevant documentation or specification
- [ ] reviewed "Files changed" and left comments if necessary
- [ ] confirmed all CI checks have passed

### Reviewers Checklist

*All items are required. Please add a note if the item is not applicable and please add
your handle next to the items reviewed if you only reviewed selected items.*

I have...

- [ ] confirmed the correct [type prefix](https://github.com/commitizen/conventional-commit-types/blob/v3.0.0/index.json) in the PR title
- [ ] confirmed `!` in the type prefix if API or client breaking change
- [ ] confirmed all author checklist items have been addressed
- [ ] reviewed state machine logic
- [ ] reviewed API design and naming
- [ ] reviewed documentation is accurate
- [ ] reviewed tests and test coverage
- [ ] manually tested (if applicable)

(cherry picked from commit 4d02519)

# Conflicts:
#	CHANGELOG.md
#	server/start.go
  • Loading branch information
yihuang authored and mergify[bot] committed Feb 8, 2023
1 parent d76ad98 commit e642af8
Show file tree
Hide file tree
Showing 2 changed files with 69 additions and 34 deletions.
13 changes: 13 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -215,7 +215,20 @@ Ref: https://keepachangelog.com/en/1.0.0/
* [#13794](https://github.com/cosmos/cosmos-sdk/pull/13794) `types/module.Manager` now supports the
`cosmossdk.io/core/appmodule.AppModule` API via the new `NewManagerFromMap` constructor.
* [#14019](https://github.com/cosmos/cosmos-sdk/issues/14019) Remove the interface casting to allow other implementations of a `CommitMultiStore`.
<<<<<<< HEAD
* [#14175](https://github.com/cosmos/cosmos-sdk/pull/14175) Add `server.DefaultBaseappOptions(appopts)` function to reduce boiler plate in root.go.
=======
* (x/gov) [#14390](https://github.com/cosmos/cosmos-sdk/pull/14390) Add title, proposer and summary to proposal struct.
* (baseapp) [#14417](https://github.com/cosmos/cosmos-sdk/pull/14417) `SetStreamingService` accepts appOptions, AppCodec and Storekeys needed to set streamers.
* Store pacakge no longer has a dependency on baseapp.
* (store) [#14438](https://github.com/cosmos/cosmos-sdk/pull/14438) Pass logger from baseapp to store.
* (store) [#14439](https://github.com/cosmos/cosmos-sdk/pull/14439) Remove global metric gatherer from store.
* By default store has a no op metric gatherer, the application developer must set another metric gatherer or us the provided one in `store/metrics`.
* [#14406](https://github.com/cosmos/cosmos-sdk/issues/14406) Migrate usage of types/store.go to store/types/..
* (x/staking) [#14590](https://github.com/cosmos/cosmos-sdk/pull/14590) Return undelegate amount in MsgUndelegateResponse
* (tools) [#14793](https://github.com/cosmos/cosmos-sdk/pull/14793) Dockerfile optimization.
* (cli) [#14953](https://github.com/cosmos/cosmos-sdk/pull/14953) Enable profiling block replay during abci handshake with `--cpu-profile`.
>>>>>>> 4d02519ec (feat: support profiling block replay during abci handshake (#14953))
### State Machine Breaking

Expand Down
90 changes: 56 additions & 34 deletions server/start.go
Original file line number Diff line number Diff line change
Expand Up @@ -138,14 +138,25 @@ is performed. Note, when enabled, gRPC will also be automatically enabled.
return err
}

<<<<<<< HEAD
withTM, _ := cmd.Flags().GetBool(flagWithTendermint)
if !withTM {
serverCtx.Logger.Info("starting ABCI without Tendermint")
return startStandAlone(serverCtx, appCreator)
=======
withCMT, _ := cmd.Flags().GetBool(flagWithComet)
if !withCMT {
serverCtx.Logger.Info("starting ABCI without CometBFT")
return wrapCPUProfile(serverCtx, func() error {
return startStandAlone(serverCtx, appCreator)
})
>>>>>>> 4d02519ec (feat: support profiling block replay during abci handshake (#14953))
}

// amino is needed here for backwards compatibility of REST routes
err = startInProcess(serverCtx, clientCtx, appCreator)
err = wrapCPUProfile(serverCtx, func() error {
return startInProcess(serverCtx, clientCtx, appCreator)
})
errCode, ok := err.(ErrorCode)
if !ok {
return err
Expand Down Expand Up @@ -257,27 +268,6 @@ func startStandAlone(ctx *Context, appCreator types.AppCreator) error {
func startInProcess(ctx *Context, clientCtx client.Context, appCreator types.AppCreator) error {
cfg := ctx.Config
home := cfg.RootDir
var cpuProfileCleanup func()

if cpuProfile := ctx.Viper.GetString(flagCPUProfile); cpuProfile != "" {
f, err := os.Create(cpuProfile)
if err != nil {
return err
}

ctx.Logger.Info("starting CPU profiler", "profile", cpuProfile)
if err := pprof.StartCPUProfile(f); err != nil {
return err
}

cpuProfileCleanup = func() {
ctx.Logger.Info("stopping CPU profiler", "profile", cpuProfile)
pprof.StopCPUProfile()
if err := f.Close(); err != nil {
ctx.Logger.Info("failed to close cpu-profile file", "profile", cpuProfile, "err", err.Error())
}
}
}

db, err := openDB(home, GetAppDBBackend(ctx.Viper))
if err != nil {
Expand All @@ -290,16 +280,11 @@ func startInProcess(ctx *Context, clientCtx client.Context, appCreator types.App
return err
}

// Clean up the traceWriter in the cpuProfileCleanup routine that is invoked
// when the server is shutting down.
fn := cpuProfileCleanup
cpuProfileCleanup = func() {
if fn != nil {
fn()
}

// if flagTraceStore is not used then traceWriter is nil
if traceWriter != nil {
// Clean up the traceWriter when the server is shutting down.
var traceWriterCleanup func()
// if flagTraceStore is not used then traceWriter is nil
if traceWriter != nil {
traceWriterCleanup = func() {
if err = traceWriter.Close(); err != nil {
ctx.Logger.Error("failed to close trace writer", "err", err)
}
Expand Down Expand Up @@ -524,8 +509,8 @@ func startInProcess(ctx *Context, clientCtx client.Context, appCreator types.App
_ = tmNode.Stop()
}

if cpuProfileCleanup != nil {
cpuProfileCleanup()
if traceWriterCleanup != nil {
traceWriterCleanup()
}

if apiSrv != nil {
Expand All @@ -545,3 +530,40 @@ func startTelemetry(cfg serverconfig.Config) (*telemetry.Metrics, error) {
}
return telemetry.New(cfg.Telemetry)
}

// wrapCPUProfile runs callback in a goroutine, then wait for quit signals.
func wrapCPUProfile(ctx *Context, callback func() error) error {
if cpuProfile := ctx.Viper.GetString(flagCPUProfile); cpuProfile != "" {
f, err := os.Create(cpuProfile)
if err != nil {
return err
}

ctx.Logger.Info("starting CPU profiler", "profile", cpuProfile)
if err := pprof.StartCPUProfile(f); err != nil {
return err
}

defer func() {
ctx.Logger.Info("stopping CPU profiler", "profile", cpuProfile)
pprof.StopCPUProfile()
if err := f.Close(); err != nil {
ctx.Logger.Info("failed to close cpu-profile file", "profile", cpuProfile, "err", err.Error())
}
}()
}

errCh := make(chan error)
go func() {
errCh <- callback()
}()

select {
case err := <-errCh:
return err

case <-time.After(types.ServerStartTime):
}

return WaitForQuitSignals()
}

0 comments on commit e642af8

Please sign in to comment.