Skip to content

Commit

Permalink
Merge branch 'main' into aswanson-nr-patch-2
Browse files Browse the repository at this point in the history
  • Loading branch information
zstix authored Oct 31, 2023
2 parents bfd9e80 + bb5a5bd commit 01676b0
Show file tree
Hide file tree
Showing 65 changed files with 783 additions and 135 deletions.
4 changes: 2 additions & 2 deletions alert-policies/amazon-sagemaker/HighModelInvocationErrors.yml
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,7 @@ description: |+
type: STATIC
nrql:
query: "SELECT count(`aws.sagemaker.InvocationModelErrors`) as 'Query' FROM Metric"
query: "SELECT sum(`aws.sagemaker.InvocationModelErrors`) as 'Query' FROM Metric"

# Function used to aggregate the NRQL query value(s) for comparison to the terms.threshold (Default: SINGLE_VALUE)
valueFunction: SINGLE_VALUE
Expand All @@ -24,4 +24,4 @@ terms:

# Duration after which a violation automatically closes
# Time in seconds; 300 - 2592000 (Default: 86400 [1 day])
violationTimeLimitSeconds: 86400
violationTimeLimitSeconds: 86400
27 changes: 27 additions & 0 deletions alert-policies/nvidia-gpu/HighMemoryUtilization.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,27 @@
name: High GPU Memory Utilization

description: |+
This alert is triggered when the Nvidia GPU memory utilization is above 90%.
type: STATIC
nrql:
query: "SELECT latest(utilization.memory.percent) FROM NvidiaGpuSample"

# Function used to aggregate the NRQL query value(s) for comparison to the terms.threshold (Default: SINGLE_VALUE)
valueFunction: SINGLE_VALUE

# List of Critical and Warning thresholds for the condition
terms:
- priority: CRITICAL
# Operator used to compare against the threshold.
operator: ABOVE
# Value that triggers a violation
threshold: 90
# Time in seconds; 120 - 3600
thresholdDuration: 300
# How many data points must be in violation for the duration
thresholdOccurrences: ALL

# Duration after which a violation automatically closes
# Time in seconds; 300 - 2592000 (Default: 86400 [1 day])
violationTimeLimitSeconds: 86400
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file not shown.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file not shown.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
File renamed without changes
File renamed without changes
File renamed without changes
File renamed without changes
File renamed without changes
File renamed without changes
File renamed without changes
File renamed without changes
File renamed without changes
File renamed without changes
190 changes: 190 additions & 0 deletions dashboards/nvidia-gpu/nvidia-gpu.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,190 @@
{
"name": "Nvidia GPU Monitoring",
"description": null,
"pages": [
{
"name": "Nvidia GPU Monitoring",
"description": null,
"widgets": [
{
"visualization": {
"id": "viz.markdown"
},
"layout": {
"column": 1,
"row": 1,
"height": 1,
"width": 4
},
"title": "",
"rawConfiguration": {
"text": "[![NVIDIA SMI](https://logos-download.com/wp-content/uploads/2016/10/Nvidia_logo.png)](https://developer.nvidia.com/nvidia-system-management-interface)\n"
}
},
{
"visualization": {
"id": "viz.billboard"
},
"layout": {
"column": 5,
"row": 1,
"height": 3,
"width": 2
},
"title": "Current Clock Speeds",
"rawConfiguration": {
"dataFormatters": [],
"nrqlQueries": [
{
"accountId": 123,
"query": "FROM NvidiaGpuSample SELECT latest(clocks.current.memory.MHz) as 'Memory MHz', latest(clocks.current.graphics.MHz) as 'Graphics MHz', latest(clocks.current.video.MHz) as 'Video MHz', latest(clocks.current.sm.MHz) as 'SM MHz' "
}
],
"thresholds": []
}
},
{
"visualization": {
"id": "viz.line"
},
"layout": {
"column": 7,
"row": 1,
"height": 3,
"width": 6
},
"title": "Current Clock MHz",
"rawConfiguration": {
"legend": {
"enabled": true
},
"nrqlQueries": [
{
"accountId": 123,
"query": "FROM NvidiaGpuSample SELECT latest(clocks.current.memory.MHz) as 'Memory MHz', latest(clocks.current.graphics.MHz) as 'Graphics MHz', latest(clocks.current.video.MHz) as 'Video MHz', latest(clocks.current.sm.MHz) as 'SM MHz' TIMESERIES"
}
],
"yAxisLeft": {
"zero": true
}
}
},
{
"visualization": {
"id": "viz.bar"
},
"layout": {
"column": 1,
"row": 2,
"height": 2,
"width": 4
},
"title": "Select GPU",
"rawConfiguration": {
"facet": {
"showOtherSeries": false
},
"nrqlQueries": [
{
"accountId": 123,
"query": "FROM NvidiaGpuSample SELECT latest(name) FACET pci.device_id, hostname "
}
]
}
},
{
"visualization": {
"id": "viz.billboard"
},
"layout": {
"column": 1,
"row": 4,
"height": 3,
"width": 2
},
"title": "Temps",
"rawConfiguration": {
"dataFormatters": [],
"nrqlQueries": [
{
"accountId": 123,
"query": "FROM NvidiaGpuSample SELECT latest(temperature.gpu) as 'GPU Temp', latest(temperature.memory) as 'Memory Temp', latest(fan.speed.percent) as 'Fan speed %'"
}
],
"thresholds": []
}
},
{
"visualization": {
"id": "viz.billboard"
},
"layout": {
"column": 3,
"row": 4,
"height": 3,
"width": 2
},
"title": "Power Usage",
"rawConfiguration": {
"dataFormatters": [],
"nrqlQueries": [
{
"accountId": 123,
"query": "FROM NvidiaGpuSample SELECT latest(power.draw.watts) as 'Power Draw Watts', latest(`power.limit.watts`) as 'Power Limit Watts', latest(power.draw.watts)/latest(`power.limit.watts`) * 100 as 'Power usage %'"
}
],
"thresholds": []
}
},
{
"visualization": {
"id": "viz.billboard"
},
"layout": {
"column": 5,
"row": 4,
"height": 3,
"width": 2
},
"title": "Memory Usage",
"rawConfiguration": {
"dataFormatters": [],
"nrqlQueries": [
{
"accountId": 123,
"query": "FROM NvidiaGpuSample SELECT latest(memory.free.MiB) as 'Memory Free MiB', latest(memory.used.MiB) as 'Memory Used MiB', latest(memory.total.MiB) as 'Memory Total MiB'"
}
],
"thresholds": []
}
},
{
"visualization": {
"id": "viz.line"
},
"layout": {
"column": 7,
"row": 4,
"height": 3,
"width": 6
},
"title": "Utilization",
"rawConfiguration": {
"legend": {
"enabled": true
},
"nrqlQueries": [
{
"accountId": 123,
"query": "FROM NvidiaGpuSample SELECT latest(memory.used.MiB/memory.total.MiB) * 100 as 'Memory Used %', latest(utilization.gpu.percent) as 'GPU Utilization %', latest(power.draw.watts)/latest(`power.limit.watts`)*100 as 'Power Usage %', latest(fan.speed.percent) as 'Fan Speed %' TIMESERIES"
}
],
"yAxisLeft": {
"zero": true
}
}
}
]
}
]
}
Binary file added dashboards/nvidia-gpu/nvidia-gpu.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
7 changes: 5 additions & 2 deletions data-sources/dojo/config.yml
Original file line number Diff line number Diff line change
Expand Up @@ -4,6 +4,9 @@ description: |
With New Relic, monitor your Dojo solution to get full visibility into the core web vitals performance of your application or website.
install:
primary:
link:
url: https://docs.newrelic.com/docs/infrastructure/host-integrations/host-integrations-list/dojo-io-integration/
nerdlet:
nerdletId: marketplace.install-data-source
nerdletState:
dataSourceId: dojo
requiresAccount: false
icon: logo.png
8 changes: 5 additions & 3 deletions data-sources/flutter-web/config.yml
Original file line number Diff line number Diff line change
Expand Up @@ -5,9 +5,11 @@ description: |
icon: logo.png
install:
primary:
link:
url: https://docs.newrelic.com/docs/infrastructure/host-integrations/host-integrations-list/browser-monitoring-integrations/flutter-web-integration/

nerdlet:
nerdletId: marketplace.install-data-source
nerdletState:
dataSourceId: flutter-web
requiresAccount: false
keywords:
- flutter
- flutter web application
Expand Down
2 changes: 1 addition & 1 deletion data-sources/jfrog-platform/config.yml
Original file line number Diff line number Diff line change
Expand Up @@ -4,5 +4,5 @@ description: Observe JFrog Platform and gain insights using New Relic Observabil
install:
primary:
link:
url: https://github.com/jfrog/log-analytics-newrelic
url: https://jfrog.com/help/r/jfrog-platform-administration-documentation/new-relic
icon: icon.svg
7 changes: 5 additions & 2 deletions data-sources/jira-errors-inbox/config.yml
Original file line number Diff line number Diff line change
Expand Up @@ -3,6 +3,9 @@ displayName: Atlassian Jira for Errors Inbox
description: Integrated errors inbox with Atlassian Jira (cloud) to easily create tickets for your errors.
install:
primary:
link:
url: https://docs.newrelic.com/docs/errors-inbox/errors-inbox/#jira
nerdlet:
nerdletId: marketplace.install-data-source
nerdletState:
dataSourceId: jira-errors-inbox
requiresAccount: false
icon: logo.svg
7 changes: 5 additions & 2 deletions data-sources/nextcloud/config.yml
Original file line number Diff line number Diff line change
Expand Up @@ -4,6 +4,9 @@ description: |
Use our infrastructure agent and the Prometheus open metric integration to monitor the performance of the processes on your Nextcloud server.
install:
primary:
link:
url: https://docs.newrelic.com/docs/infrastructure/host-integrations/host-integrations-list/nextcloud-integration/
nerdlet:
nerdletId: marketplace.install-data-source
nerdletState:
dataSourceId: nextcloud
requiresAccount: false
icon: logo.png
19 changes: 19 additions & 0 deletions data-sources/nvidia-gpu/config.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,19 @@
id: nvidia-gpu
displayName: Nvidia GPUs
description: |
Monitor Nvidia GPUs based on the Nvidia SMI utility.
install:
primary:
link:
url: https://docs.newrelic.com/docs/infrastructure/host-integrations/host-integrations-list/nvidia-gpu-integration/

icon: logo.png

keywords:
- infrastructure
- nvidia
- gpu

categoryTerms:
- infrastructure
Binary file added data-sources/nvidia-gpu/logo.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
17 changes: 17 additions & 0 deletions data-sources/shopify/config.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,17 @@
id: shopify
displayName: Shopify
description: |
Shopify is a website building tool. You can use New Relic's browser agent to monitor your websites created in Shopify.
icon: logo.png
install:
primary:
link:
url: https://docs.newrelic.com/docs/browser/browser-monitoring/installation/install-browser-monitoring-agent/
keywords:
- traffic
- browser
- agent
- web app
- newrelic partner
categoryTerms:
- newrelic partner
Binary file added data-sources/shopify/logo.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
13 changes: 0 additions & 13 deletions install/dojo/install.yml

This file was deleted.

15 changes: 0 additions & 15 deletions install/flutter-web/install.yml

This file was deleted.

2 changes: 1 addition & 1 deletion install/third-party/jfrog-platform/install.yml
Original file line number Diff line number Diff line change
Expand Up @@ -11,4 +11,4 @@ target:
install:
mode: link
destination:
url: https://github.com/jfrog/log-analytics-newrelic
url: https://jfrog.com/help/r/jfrog-platform-administration-documentation/new-relic
Binary file modified quickstarts/datazoom/logo.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading

0 comments on commit 01676b0

Please sign in to comment.