Skip to content

Commit

Permalink
Magnetic images for APM module
Browse files Browse the repository at this point in the history
  • Loading branch information
rcastley committed Nov 8, 2024
1 parent 2dd90e7 commit a0d784b
Show file tree
Hide file tree
Showing 15 changed files with 43 additions and 37 deletions.
Original file line number Diff line number Diff line change
@@ -1,8 +1,8 @@
---
title: 2. APM Service Dashboard
title: 2. APM Service View
weight: 2
---
{{% notice title="Service Dashboard" style="info" %}}
{{% notice title="Service View" style="info" %}}

As a service owners you can use the service view in Splunk APM to get a complete view of your service health in a single pane of glass. The service view includes a service-level indicator (SLI) for availability, dependencies, request, error, and duration (RED) metrics, runtime metrics, infrastructure metrics, Tag Spotlight, endpoints, and logs for a selected service. You can also quickly navigate to code profiling and memory profiling for your service from the service view.

Expand All @@ -12,19 +12,16 @@ As a service owners you can use the service view in Splunk APM to get a complete

{{% notice title="Exercise" style="green" icon="running" %}}

* Check the **Time** box, you can see that the dashboards only show data relevant to the time it took for the APM trace we selected to complete (note that the charts are static).
* Check the **Time** box, you can see that the dashboards only show data relevant to the time it took for the APM trace we previosuly selected to complete (note that the charts are static).
* In the **Time** box change the timeframe to **-1h**.
* The Single Value charts, **Request rate**, **Request latency (p90)** and **Error rate** will start updating every 10 seconds showing that we still have a large number of errors occurring.
* These charts are very useful to quickly identify performance issues. You can use this dashboard to keep an eye on the health of your service or use it as a base for a custom one.
* We want to use some of these charts in a later exercise:
* In the **Request rate** Single Value chart (**2**), click the **...** and select **Copy**. Note that you now have a **1** before the **+** at the top right of the page (**3**), indicating you have a copied chart to the clipboard.
* In the **Request rate** line chart (**4**), either click on the **Add to clipboard** indicator that appeared (just at the **(4)** in the screenshot) to add it to the clipboard or use the **...** and select **Add to clipboard**.
* Note that you now have **2** before the **+** on the top right of the page. (**3**)
* These charts are very useful to quickly identify performance issues. You can use this dashboard to keep an eye on the health of your service.
* Scroll down the page and expand **Infrastructure Metrics**. Here you will see the metrics for the Host and Pod.
* **Runtime Metrics** are not available as profiling data is not available for services written in Node.js.
* Now let's go back to the explore view, you can hit the back button in your Browser

{{% /notice %}}

![APM Explore](../images/apm-explore.png)
![APM Explore](../images/apm-business-workflow.png)

{{% notice title="Exercise" style="green" icon="running" %}}

Expand Down
24 changes: 16 additions & 8 deletions content/en/s4r/6-apm/3-apm-tag-spotlight.md
Original file line number Diff line number Diff line change
Expand Up @@ -5,28 +5,36 @@ weight: 3

{{% notice title="Exercise" style="green" icon="running" %}}

* To view the tags for the **paymentservice** click on the **paymentservice** and then click on **Tag Spotlight** in the right-hand side functions pane (you may need to scroll down depending upon your screen resolution).
* Once in **Tag Spotlight** ensure the toggle **Show tags with no values** is off.
* To view the tags for the **paymentservice** click on the **paymentservice** and then click on **Tag Spotlight** in the right-hand side functions pane (you may need to scroll down depending upon your screen resolution).* Once in **Tag Spotlight** ensure the toggle **Show tags with no values** is off.

{{% /notice %}}

![APM Tag Spotlight](../images/apm-tag-spotlight.png)

There are two views available in **Tag Spotlight**. The default is **Request/Errors** and the other is **Latency**.
The views in **Tag Spotlight** are configurable for both the chart and cards. The view defaults to **Requests & Errors**.

Request/Error charts display the total number of requests, errors, and root cause errors. The Latency charts display p50, p90, and p99 latency. These values are based on Troubleshooting MetricSets (TMS), which Splunk APM generates for every indexed span tag. These are known as RED metrics (request, error, and duration).
It is also possible to configure which tag metrics are displayed in the cards. It is possible to select any combinations of:

* Requests
* Errors
* Root cause errors
* P50 Latency
* P90 Latency
* P99 Latency

Also ensure that the **Show tags with no values** toggle is unchecked.

{{% notice title="Exercise" style="green" icon="running" %}}

{{< tabs >}}
{{% tab title="Question" %}}
**Which chart exposes the tag that identifies what the problem is?**
**Which card exposes the tag that identifies what the problem is?**
{{% /tab %}}
{{% tab title="Answer" %}}
**The *version* chart. The number of requests against `v350.10` matches the number of errors.**
**The *version* card. The number of requests against `v350.10` matches the number of errors i.e. 100%**
{{% /tab %}}
{{< /tabs >}}

* Now that we have identified the version of the **paymentservice** that is causing the issue, let's see if we can find out more information about the error. Click on **← Tag Spotlight** at the top of the page to get back to the Service Map.

{{% /notice %}}

Now that we have identified the version of the **paymentservice** that is causing the issue, let's see if we can find out more information about the error. Click on **← Tag Spotlight** at the top of the page to get back to the Service Map.
13 changes: 4 additions & 9 deletions content/en/s4r/6-apm/4-apm-service-breakdown.md
Original file line number Diff line number Diff line change
Expand Up @@ -5,17 +5,18 @@ weight: 4

{{% notice title="Exercise" style="green" icon="running" %}}

* Select the **paymentservice** in the Service Map.
* In the right-hand pane click on the {{% button style="grey" %}}Breakdown{{% /button %}}.
* Select `tenant.level` in the list. This is a tag that exposes the customers' status and can be useful to see trends related to customer status.
* Back in the Service Map Click on **gold** to select it.
* Select `tenant.level` in the list.
* Back in the Service Map click on **gold**.
* Click on {{% button style="grey" %}}Breakdown{{% /button %}} and select `version`, this is the tag that exposes the service version.
* Repeat this for **silver** and **bronze**.
{{< tabs >}}
{{% tab title="Question" %}}
**What can you conclude from what you are seeing?**
{{% /tab %}}
{{% tab title="Answer" %}}
**Every tenant is being impacted by `v350.10`**
**Every `tenant.level` is being impacted by `v350.10`**
{{% /tab %}}
{{< /tabs >}}

Expand All @@ -25,12 +26,6 @@ You will now see the **paymentservice** broken down into three services, **gold*

![APM Service Breakdown](../images/apm-service-breakdown.png)

{{% notice title="Exercise" style="green" icon="running" %}}

* Click on the outer main box that surrounds the 3 red circles, the box will become highlighted.

{{% /notice %}}

{{% notice title="Span Tags" style="info" %}}
Using span tags to break down services is a very powerful feature. It allows you to see how your services are performing for different customers, different versions, different regions, etc. In this exercise, we have determined that `v350.10` of the **paymentservice** is causing problems for all our customers.
{{% /notice %}}
Expand Down
12 changes: 8 additions & 4 deletions content/en/s4r/6-apm/5-apm-trace-analyzer.md
Original file line number Diff line number Diff line change
Expand Up @@ -14,6 +14,8 @@ Splunk Observability Cloud provides several tools for exploring application moni

* With the outer box of the **paymentservice** selected, in the right-hand pane, click on **Traces**.
* To ensure we are using **Trace Analyzer** make sure the button {{% button %}}Switch to Classic View{{% /button %}} is showing. If it is not, click on {{% button style="blue" %}}Switch to Trace Analyzer{{% /button %}}.
* Set **Time Range** to **Last 15 minutes**.
* Ensure the **Sample Ratio** is set to `1:1` and **not** `1:10`.

{{% /notice %}}

Expand All @@ -31,9 +33,9 @@ The **Trace & error count** view shows the total traces and traces with errors i

The **Trace Duration** view shows a heatmap of traces by duration. The heatmap represents 3 dimensions of data:

1. Time on the x-axis
2. Trace duration on the y-axis
3. The traces (or requests) per second are represented by the heatmap shades
* Time on the x-axis
* Trace duration on the y-axis
* The traces (or requests) per second are represented by the heatmap shades

You can use your mouse to select an area on the heatmap, to focus on a specific time period and trace duration range.

Expand All @@ -49,7 +51,9 @@ You can use your mouse to select an area on the heatmap, to focus on a specific

{{% /notice %}}

We have now filtered down to the exact trace where you encountered a poor user experience with a very long checkout wait. A secondary benefit to viewing this trace is that the trace will be accessible for up to 13 months. This will allow developers to come back to this issue at a later stage and still view this trace for example.
We have now filtered down to the exact trace where you encountered a poor user experience with a very long checkout wait.

A secondary benefit to viewing this trace is that the trace will be accessible for up to 13 months. This will allow developers to come back to this issue at a later stage and still view this trace for example.

{{% notice title="Exercise" style="green" icon="running" %}}

Expand Down
14 changes: 8 additions & 6 deletions content/en/s4r/6-apm/6-apm-waterfall.md
Original file line number Diff line number Diff line change
Expand Up @@ -15,18 +15,16 @@ Each span in Splunk APM captures a single operation. Splunk APM considers a span

{{< tabs >}}
{{% tab title="Question" %}}
**What is the error message and version being reported in the span metadata?**
**What is the error message and version being reported in the Span Details?**
{{% /tab %}}
{{% tab title="Answer" %}}
**Invalid request and `v350.10`**.
**`Invalid request` and `v350.10`**.
{{% /tab %}}
{{< /tabs >}}

{{% /notice %}}
Now that we have identified the version of the **paymentservice** that is causing the issue, let's see if we can find out more information about the error. This is where **Related Logs** come in.

![Related Logs](../images/apm-related-logs.png)

Related Content relies on specific metadata that allow APM, Infrastructure Monitoring, and Log Observer to pass filters around Observability Cloud. For related logs to work, you need to have the following metadata in your logs:

* `service.name`
Expand All @@ -37,7 +35,11 @@ Related Content relies on specific metadata that allow APM, Infrastructure Monit

{{% notice title="Exercise" style="green" icon="running" %}}

* At the very bottom of the **Trace Waterfall** click on the word **Logs (1)**. This highlights that there are **Related Logs** for this trace.
* Click on the **Logs for trace XXX** entry in the pop-up, this will open the logs for the complete trace in **Log Observer**.
* At the very bottom of the **Trace Waterfall** click on **Logs (1)**. This highlights that there are **Related Logs** for this trace.
* Click on the **Logs for trace xxx** entry in the pop-up, this will open the logs for the complete trace in **Log Observer**.

{{% /notice %}}

![Related Logs](../images/apm-related-logs.png)

Next, let's find out more about the error in the logs.
Binary file modified content/en/s4r/6-apm/images/apm-business-workflow.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file modified content/en/s4r/6-apm/images/apm-related-logs.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file modified content/en/s4r/6-apm/images/apm-service-breakdown.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file modified content/en/s4r/6-apm/images/apm-service-dashboard.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file modified content/en/s4r/6-apm/images/apm-service.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file modified content/en/s4r/6-apm/images/apm-tag-spotlight.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file modified content/en/s4r/6-apm/images/apm-trace-analyzer-heat-map.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file modified content/en/s4r/6-apm/images/apm-trace-analyzer.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file modified content/en/s4r/6-apm/images/apm-trace-by-duration.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file modified content/en/s4r/6-apm/images/apm-trace-waterfall.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.

0 comments on commit a0d784b

Please sign in to comment.