Integrate Tracing (derived from OpenTelemetry) #15

CMCDragonkai · 2022-07-19T16:01:07Z

Specification

OpenTelemetry is an overly complicated beast. It's far too complex to adopt into a logging system. However the basic principles of tracing makes sense. Here I'm showing how you can set one up for comparison testing, for us to derive a tracing schema and later visualise it ourselves or by passing it into an OTLP compatible visualiser.

docker run -d --name jaeger \
  -e COLLECTOR_ZIPKIN_HOST_PORT=:9411 \
  -e COLLECTOR_OTLP_ENABLED=true \
  -p 6831:6831/udp \
  -p 6832:6832/udp \
  -p 5778:5778 \
  -p 16686:16686 \
  -p 4317:4317 \
  -p 4318:4318 \
  -p 14250:14250 \
  -p 14268:14268 \
  -p 14269:14269 \
  -p 9411:9411 \
  jaegertracing/all-in-one:1.36

The above command runs jaeger. Take note of 4318 port which is the OTLP protocol over HTTP.

Visit localhost:16686 to be able to view the jaeger system.

Then any example code, like for example https://github.com/open-telemetry/opentelemetry-js/blob/main/examples/basic-tracer-node/index.js can run and push traces directly to the docker container.

What is frustrating is:

OpenTelemetry code only exports to stderr as an afterthought, it's not considered first class usage
The stderr exporters output via console.log and produce pretty printed results that are not actual JSON. Thus you cannot just pipe it to a relevant location.
The schema of the span data isn't clear, it seems different parts of the documentation still have old data, or maybe the JS implementation itself is hasn't been updated to the new schema.

The plan:

Create your own "span" derived from opentelemetry and output as just regular structured JSON
Massage it to be compatible to open telemetry viewers like jaeger
Use jaeger's 4318 to stream the JSON and view data in the interim
Find an easier way to visualise traces, maybe something that can be used CLI or in the GUI
For production usage, feed to any structured log capturer, and then feed into a viewer that understands trace information

Additional context

Tasks

...
...
...

The text was updated successfully, but these errors were encountered:

CMCDragonkai · 2022-07-19T16:05:56Z

It seems alot of the complexity is due to the vendors fragmentation and they are trying to make everything compatible.

CMCDragonkai · 2022-07-19T17:04:36Z

Most tracing tools like https://nodejs.org/api/tracing.html and chrome:://tracing expect a finite dataset, that is expected that a trace has a beginning and end. That's why it's always been "request" driven. Open telemetry is just deriving stuff that came before like in https://github.com/gaogaotiantian/viztracer https://github.com/janestreet/magic-trace https://github.com/kunalb/panopticon and more.

I'm interested in more than just request-driven tracing but live infinite traces (call it continuous tracing that shows finished and live spans at the same time), and correlates them too. I'm guessing we need zoomable levels of detail the ability to filter out irrelevant information dynamically.

Open telemetry in particular does not appear to emit a span until it is done. I'd imagine knowing when a span started even if it did not end yet would be useful for live continuous tracing.

Abby010 · 2025-02-18T03:03:33Z

Why do parents span finish before child span ? Will this always be the case ?

Abby010 · 2025-02-18T03:05:15Z

In case of our solution how should spans be structured in JSON ?

Abby010 · 2025-02-18T03:06:45Z

The tracing goes from top to bottom, and represents an 'infinite' live visualisation of what the current state of the system is. Why do we need an 'infinite' live visualisation ?

Abby010 · 2025-02-18T03:08:32Z

Are we visualising both completed and in-progress spans at the same time? If yes, how ?

Abby010 · 2025-02-18T03:09:13Z

How does forking impact trace performance ? Will it slow down the system ?

Abby010 · 2025-02-18T03:13:16Z

If we need zoomable levels of details to filter out information dynamically, how will be support this functionality ?

CMCDragonkai · 2025-02-18T03:19:34Z

Because that's how we can debug how object contexts exist in real time.

CMCDragonkai · 2025-02-18T03:20:04Z

Tracing isn't going to be super fast. It's fine, we do this for debug reasons - we can optimise this later.

CMCDragonkai · 2025-02-18T03:20:40Z

Zoom able is a UI concern. Tracing data is logged entirely.

CMCDragonkai · 2025-02-18T19:17:54Z

Are we visualising both completed and in-progress spans at the same time? If yes, how ?

Let's separate issues for collecting data vs visualising data.

Abby010 · 2025-02-18T21:52:05Z

Functional Requirements

Generate structured JSON spans - Instead of OpenTelemetry’s pretty-printed stderr output, generate structured JSON logs for better processing.
Ensure compatibility with OpenTelemetry viewers (Jaeger, Zipkin, etc.) - The generated spans should be formatted correctly so they can be visualized in Jaeger or other OTLP-compatible tools.
Stream JSON spans via Jaeger’s 4318 OTLP HTTP port - Send the structured JSON directly to Jaeger to provide immediate visualization of spans.
Allow easy visualisation of traces (CLI or GUI tool) - Provide a real-time visualisation option that allows developers to inspect spans dynamically.
Enable live & historical trace views - Ensure that completed spans are stored and retrievable, so historical traces can be analysed alongside real-time data.
Integrate with log collectors (Fluentd, Loki, Elasticsearch, etc.) Traces should be stored in a structured log system for long-term observability and debugging.

Non Functional Requirements

Low performance overhead - The tracing system should not introduce significant latency to applications.
Scalability - The system should handle high throughput tracing data without bottlenecks.
Real-time processing - The tracing solution should emit spans immediately when they start, rather than waiting for them to finish.
Reliability - Traces must not be lost, even in case of network failures or system crashes.

CMCDragonkai · 2025-02-18T23:08:50Z

I want to avoid any IO in or out tracing. Our library should be pure data structures first and then allow generic construction of a span. I don't like open tracing spans but we can be backwards compatible with it.

CMCDragonkai · 2025-02-18T23:09:08Z

See the 3 layer cake concept.

CMCDragonkai · 2025-02-18T23:09:33Z

Avoid Jaeger or any of the OT ecosystem. I don't like them they suck.

CMCDragonkai · 2025-02-18T23:33:56Z

Btw in-memory format should just be a POJO that can be converted to json.

CMCDragonkai · 2025-02-19T00:08:30Z

Create a span structure for beginning and end. Use react-ink to setup a CLI that visualised top to bottom. Get a preview of this using asciinema. And post the video here.

CMCDragonkai · 2025-02-19T19:38:42Z

I want you to try and write a simple library right here with a new PR:

Creation of a span and ending of a span.
Forking spans.
Then in a separate src/bin directory, create a CLI script using TS that can use react-ink to visualise the spans as vertical lines starting from the top to the bottom, it should auto-scroll downwards every second. We can iterate this.
Note that react ink takes over the full screen, thus it's a TUI. A CLI app would actually just print one line at a time. We should be able to do this as well, similar to a follow function of tail. Try it.
I would like to see this prototype end of the week, so you can demonstrate beginning of the next cycle.

CMCDragonkai · 2025-02-19T19:39:06Z

@abhishek.mehta you need to link up your github account, at the moment assignments aren't aligned between github and linear.

Abby010 · 2025-02-19T19:47:34Z

I have already started working on this; however, as I have reached my 24 hour limit for this week, I will be able to continue next week. Thanks

valyala · 2025-02-20T19:19:22Z

Integrate with log collectors (Fluentd, Loki, Elasticsearch, etc.) Traces should be stored in a structured log system for long-term observability and debugging.

Consider also using VictoriaLogs. Is is easier to setup and operate than Loki and Elasticsearch, and it usually uses less RAM, CPU and disk space comparing to Loki and Elasticsearch.

CMCDragonkai · 2025-02-20T20:47:27Z

Cool @valyala but this issue is more about specific in-app tracing that shouldn't be tied to any cloud service. We want to separate collection from visualisation from storage.

CMCDragonkai · 2025-02-20T20:49:04Z

Most tracing tools like https://nodejs.org/api/tracing.html and chrome:://tracing expect a finite dataset, that is expected that a trace has a beginning and end. That's why it's always been "request" driven. Open telemetry is just deriving stuff that came before like in https://github.com/gaogaotiantian/viztracer https://github.com/janestreet/magic-trace https://github.com/kunalb/panopticon and more.

I'm interested in more than just request-driven tracing but live infinite traces (call it continuous tracing that shows finished and live spans at the same time), and correlates them too. I'm guessing we need zoomable levels of detail the ability to filter out irrelevant information dynamically.

Open telemetry in particular does not appear to emit a span until it is done. I'd imagine knowing when a span started even if it did not end yet would be useful for live continuous tracing.

@Abby010 the last paragraph is key.

Abby010 · 2025-02-23T23:20:20Z

Preview of the CLI using react-ink
https://asciinema.org/a/cIfBMyC4ENq1UoQ6z2kFRZqwF

CMCDragonkai · 2025-02-25T20:58:20Z

What's the status?

Abby010 · 2025-03-03T10:42:58Z

Status Update

Current Progress:

We now have a working visualization (Attaching terminal recordings for reference at the end).
The core library has been implemented, and we are now focusing on improving visualization.

Research Done:

✔ Analyzed git log --graph & tree command for box-drawing character alignment.
✔ Explored UTF-8 box characters (│ ├ └ ─) for structured branching.
✔ Reviewed time-based vs. logical event-based sampling logic for implementing switching.
✔ Studied grid-based painting algorithms to properly align and render spans in a structured format.

Next Steps:

Implement box-drawing characters for structured TUI visualization (inspired by git log --graph).
Add --sample logical vs. --sample 1s switching for time-based vs. logical event ordering.
Test and refine span rendering for readability, hierarchy correctness, and terminal compatibility.

Timeline:

🎯 Targeting completion by the next sprint meeting - 10th March, 2025

Terminal 1: React-Ink Visualization

Preview (https://asciinema.org/a/W1yuT5ZngCE8AFkG1VG9yXfVX) - start from the 10th Second
Runs cli.tsx, which uses React-Ink to display spans in a tree-like, real-time interface.
Shows each span (e.g., User Request, Order Processing, etc.) as a vertical hierarchy, updating every second with any new or completed spans.

Terminal 2: Simple Tail-Style Output

Preview (https://asciinema.org/a/dqgzPEHVjRvERp44RnA3F00kp) - start from the 20th second
Runs simple-cli.tsx, which prints raw JSON logs of the spans, similar to tail -f on a log file.
Displays span data in an array (one array entry per span) and updates every second to reflect newly created or completed spans.

Terminal 3: Test Script (`asciinemaTest.ts`)

Preview (https://asciinema.org/a/g6um0fQMdHV3co6ThtviRh6oN)
Generates spans by calling logger.info(), which under the hood calls openSpan and closeSpan.
Simulates real operations like "User Request", "Order Processing", and "Payment Processing", each with delayed completions.
Provides the tracing data that Terminals 1 and 2 observe in real-time.

CMCDragonkai · 2025-03-04T08:38:29Z

@abhishek.mehta you should start to write out the tasks in this issue. Plus your progress should be in the associated feature-branch PR.

CMCDragonkai · 2025-03-04T08:40:11Z

BTW your viz shows using ts-node, we already moved away from that, we use tsx. Actually have a look at ESM migrated repos and start writing your scripts following how we write ESM like code and scripts/executables. See benches as an example.

CMCDragonkai · 2025-03-04T08:42:11Z

Any forking should have a \ to fork out if you're using pure ASCII.

CMCDragonkai added the development Standard development label Jul 19, 2022

CMCDragonkai mentioned this issue Jul 19, 2022

Structured JSON Logging #14

Merged

15 tasks

CMCDragonkai removed their assignment Sep 1, 2024

Abby010 self-assigned this Feb 16, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Integrate Tracing (derived from OpenTelemetry) #15

Integrate Tracing (derived from OpenTelemetry) #15

CMCDragonkai commented Jul 19, 2022 •

edited

Loading

CMCDragonkai commented Jul 19, 2022

CMCDragonkai commented Jul 19, 2022 •

edited

Loading

Abby010 commented Feb 18, 2025

Abby010 commented Feb 18, 2025

Abby010 commented Feb 18, 2025

Abby010 commented Feb 18, 2025

Abby010 commented Feb 18, 2025

Abby010 commented Feb 18, 2025

CMCDragonkai commented Feb 18, 2025

CMCDragonkai commented Feb 18, 2025

CMCDragonkai commented Feb 18, 2025

CMCDragonkai commented Feb 18, 2025

Abby010 commented Feb 18, 2025

CMCDragonkai commented Feb 18, 2025

CMCDragonkai commented Feb 18, 2025

CMCDragonkai commented Feb 18, 2025

CMCDragonkai commented Feb 18, 2025

CMCDragonkai commented Feb 19, 2025

CMCDragonkai commented Feb 19, 2025

CMCDragonkai commented Feb 19, 2025

Abby010 commented Feb 19, 2025

valyala commented Feb 20, 2025

CMCDragonkai commented Feb 20, 2025

CMCDragonkai commented Feb 20, 2025

Abby010 commented Feb 23, 2025

CMCDragonkai commented Feb 25, 2025

Abby010 commented Mar 3, 2025

CMCDragonkai commented Mar 4, 2025

CMCDragonkai commented Mar 4, 2025

CMCDragonkai commented Mar 4, 2025

Integrate Tracing (derived from OpenTelemetry) #15

Integrate Tracing (derived from OpenTelemetry) #15

Comments

CMCDragonkai commented Jul 19, 2022 • edited Loading

Specification

Additional context

Tasks

CMCDragonkai commented Jul 19, 2022

CMCDragonkai commented Jul 19, 2022 • edited Loading

Abby010 commented Feb 18, 2025

Abby010 commented Feb 18, 2025

Abby010 commented Feb 18, 2025

Abby010 commented Feb 18, 2025

Abby010 commented Feb 18, 2025

Abby010 commented Feb 18, 2025

CMCDragonkai commented Feb 18, 2025

CMCDragonkai commented Feb 18, 2025

CMCDragonkai commented Feb 18, 2025

CMCDragonkai commented Feb 18, 2025

Abby010 commented Feb 18, 2025

Functional Requirements

Non Functional Requirements

CMCDragonkai commented Feb 18, 2025

CMCDragonkai commented Feb 18, 2025

CMCDragonkai commented Feb 18, 2025

CMCDragonkai commented Feb 18, 2025

CMCDragonkai commented Feb 19, 2025

CMCDragonkai commented Feb 19, 2025

CMCDragonkai commented Feb 19, 2025

Abby010 commented Feb 19, 2025

valyala commented Feb 20, 2025

CMCDragonkai commented Feb 20, 2025

CMCDragonkai commented Feb 20, 2025

Abby010 commented Feb 23, 2025

CMCDragonkai commented Feb 25, 2025

Abby010 commented Mar 3, 2025

Status Update

Current Progress:

Research Done:

Next Steps:

Timeline:

Terminal 1: React-Ink Visualization

Terminal 2: Simple Tail-Style Output

Terminal 3: Test Script (asciinemaTest.ts)

CMCDragonkai commented Mar 4, 2025

CMCDragonkai commented Mar 4, 2025

CMCDragonkai commented Mar 4, 2025

CMCDragonkai commented Jul 19, 2022 •

edited

Loading

CMCDragonkai commented Jul 19, 2022 •

edited

Loading

Terminal 3: Test Script (`asciinemaTest.ts`)