Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Development feat structured logging #881

Merged
merged 20 commits into from
Dec 18, 2023

Conversation

sameh-farouk
Copy link
Member

@sameh-farouk sameh-farouk commented Oct 30, 2023

Description

Note: Require tfchain Runtime to include the update from another PR

The goal of this PR it to instrument the bridge source code for the sake of improving observability and perform tracing on workflows and data.

For full description and event schema see the the included doc here

Related Issues:

Checklist:

Please delete options that are not relevant.

@sameh-farouk sameh-farouk force-pushed the development-feat-structured-logging branch from c2a2538 to c9080e7 Compare October 30, 2023 03:47
@sameh-farouk sameh-farouk marked this pull request as ready for review November 5, 2023 12:22
Copy link
Member

@muhamadazmy muhamadazmy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Over time this gonna be hard to maintain

  • First read the comments
  • Second i think it's gonna be really hard to maintain context of operation and error like this.
  • event_action, event_kind, category are not well defined, they don't have known types and rely on whatever the developer type in. which makes it hard to track and prone to typoes

zerolog.SetGlobalLevel(zerolog.DebugLevel)
log.Debug().Msg("debug mode enabled")
}
logger.Init_logger(debug)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Init_logger should be InitLogger

}
source_log_entry := logger.SourceCommonLogEntry{
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why using snake case naming in go ? go always uses camelCase

Comment on lines +73 to +75
Str("event_action", "bridge_unexpectedly_exited").
Str("event_kind", "error").
Str("category", "availability").
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't this be part of the error context somehow? not printed as strings? May be return some sort of a ErrorContext that implements error then once received an ErrorContext can be downcasted, then the data extracted from it and printed.

The values of the category, event-kind, and action can then be pre-defined constants

@@ -23,18 +26,30 @@ func (bridge *Bridge) mint(ctx context.Context, senders map[string]*big.Int, tx
}

if minted {
log.Info().Str("tx_id", tx.Hash).Msg("transaction is already minted")
logger.Info().
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure if we should log before return every error, or just log when we receive an error. If the caller to mint function now decide to log the returned error we will have double logs.

What I do is that I always log an error if i can't bubble it up more, the error then can have all the information we need to build a detailed log

Otherwise we can endup with the same error printed multiple times

@@ -46,37 +61,35 @@ func (bridge *Bridge) mint(ctx context.Context, senders map[string]*big.Int, tx
}

if tx.Memo == "" {
log.Info().Str("tx_id", tx.Hash).Msg("transaction has empty memo, refunding now")
return bridge.refund(context.Background(), receiver, depositedAmount.Int64(), tx)
ctx = context.WithValue(ctx, "refund_reason", "no memo in transaction")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is not a nice way to set (and get) context. typoes and naming conflicts are a problem instead it's ususally done like this

type refundReasonKey struct {}
func WithRefundReason(ctx, reason string) Context {
    return context.WithValue(ctx, refundReasonKey{}, reason)
}

func GetRefundReason(ctx) string {
    return context.GetValue(ctx, refundReasonKey{}).(string)
}

This makes sure

  • no key conflict because a key is a type, not a value (actually it's the zero value of a type but 2 zero values of 2 different types don't equal each other)
  • Clear interface to set, get a reason and reason type is clear
  • no possible typoes or mistakes

@sameh-farouk
Copy link
Member Author

event_action, event_kind, category are not well defined, they don't have known types and rely on whatever the developer type in. which makes it hard to track and prone to typoes

@muhamadazmy
In this iteration, the majority of the effort was spent on designing the events schema and documenting it, complete with examples and use cases. So it may not be undefined as you think. While I acknowledge that it is not enforced yet, prone to errors, and needs to be refactored into an interface, I decided to leave this to the next iteration as it already took longer than expected.

Copy link
Collaborator

@renauter renauter left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great job!!
Very complete and very well documented.
I basically read the documentation and made some formatting remarks.
Implementation seems ok, I guess @muhamadazmy is better placed to evaluate the Go code and I see he already made its comments.
I would also add ADR file since it is an important feature from bridge side.

- `event_burn_tx_ready_received`: The bridge instance has received TFChain `BurnTransactionReady` event which means all bridge validators signed a withdraw transaction.
- `event_burn_tx_expired_received`: The bridge instance has received TFChain `BurnTransactionExpired` event.
- `withdraw_skipped`: a refund request skipped by the bridge instance as it has already been refunded.
- `withdraw_proposed`: a withdraw has proposed or signed by the bridge instance.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

has been proposed

- `event_burn_tx_created_received`: The bridge instance has received TFChain `BurnTransactionCreated` event.
- `event_burn_tx_ready_received`: The bridge instance has received TFChain `BurnTransactionReady` event which means all bridge validators signed a withdraw transaction.
- `event_burn_tx_expired_received`: The bridge instance has received TFChain `BurnTransactionExpired` event.
- `withdraw_skipped`: a refund request skipped by the bridge instance as it has already been refunded.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

a refund request was skipped

- `event_burn_tx_expired_received`: The bridge instance has received TFChain `BurnTransactionExpired` event.
- `withdraw_skipped`: a refund request skipped by the bridge instance as it has already been refunded.
- `withdraw_proposed`: a withdraw has proposed or signed by the bridge instance.
- `withdraw_completed`: a withdraw has completed and received on the target stellar account.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

has been completed

#### Business Events:
These events are for business intelligence and always tied to a user transfer, and include a `trace_id` field.

you can use `trace_id` to correlate multiple log entries and track an transfer from one chain to the other. it is simply a stellar tx ID or TFChain burn ID depend on the side the transfer initiated from.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You ... It ...


### Metrics:

This events describes a numeric measurement taken at given point in time.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These events

| version | number | yes |Log schema version |
| source | object | yes | Information about the bridge instance |
| event_action | string | yes | The action captured by the event |
| event_kind | string | yes | Can gives high-level information about what type of information the event contains, without being specific to the contents of the event. One of `Event`, `Alert`, `Error`, `metric` |
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

event, alert, error, metric

Comment on lines +107 to +113
- alert: This value indicates an event such as an alert or notable event.

- event: This value is the most general and most common value for this field. It is used to represent events that indicate that something happened.

- error: This value indicates that an error occurred during the operation of the bridge.

- metric: This events describes a numeric measurement taken at given point in time.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • event: ...
  • alert: ...
  • error: ...
  • metric: ...

Comment on lines +118 to +126
- availability
- persistency
- stellar_monitor
- tfchain_monitor
- transfer
- mint
- refund
- withdraw
- vault
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • availability
  • persistency
  • stellar_monitor
  • tfchain_monitor
  • transfer
  • mint
  • refund
  • withdraw
  • vault

- vault

#### source object
the source field set contains all fields which are included in the source object, it is common across all types of events except of `bridge_init_aborted` error event.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The source field set contains all fields which are included in the source object. It is common across all types of events except for bridge_init_aborted error event.

Comment on lines +140 to +144
##### bridge_init_aborted

- kind: error

- category: availability
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would use this formatting for all event sections below to improve md file readability:

bridge_init_aborted event
  • kind: error
  • category: availability

@sameh-farouk sameh-farouk merged commit 42cf314 into development Dec 18, 2023
2 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Bridge: Improved observability for our bridges
3 participants