Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Issue with NestJs operator autoinstrumentation #2510

Open
f3l1x98 opened this issue Jan 12, 2024 · 9 comments
Open

Issue with NestJs operator autoinstrumentation #2510

f3l1x98 opened this issue Jan 12, 2024 · 9 comments

Comments

@f3l1x98
Copy link

f3l1x98 commented Jan 12, 2024

Component(s)

operator, instrumentation

Describe the issue you're reporting

I hope this is the correct place for the question and somebody can help me.

I am currently trying to setup tracing for an application which consists of four NestJs applications.
The traces are exported to a data-prepper instance which then sends them to OpenSearch.
I have gotten the opentelemetry operator and autoinstrumentation to work in the sense that everything deploys, starts and traces are collected/send.

Issue

My issue now is that out of the four applications only two report traces from other instrumentations than the @opentelemetry/instrumentation-fs (like for example @opentelemetry/instrumentation-http).
The others only report fs traces (quite a lot of them), even though one of them is the main backend which calls the other applications.

I have also checked this with the logging exporter and got the same result so the issue should not be data-prepper/opensearch.

Used versions:

  • OpenTelemetry Operator: 0.90.0 (and later tried 0.91.0)

Configurations

apiVersion: opentelemetry.io/v1alpha1
kind: OpenTelemetryCollector
metadata:
  name: opentelemetry-collector-sidecar-agent
spec:
  mode: sidecar
  config: |
    receivers:
      otlp:
        protocols:
          grpc:
          http:
    exporters:
      otlp/data-prepper:
        endpoint: {{ .Values.dataprepper.host }}
        tls:
          insecure: true
          insecure_skip_verify: true
    service:
      telemetry:
        logs:
          level: "debug"
      pipelines:
        traces:
          receivers: [otlp]
          exporters: [otlp/data-prepper]
apiVersion: opentelemetry.io/v1alpha1
kind: Instrumentation
metadata:
  name: {{ .Values.instrumentation.nodejs }}
spec:
  propagators:
    - tracecontext
    - baggage
  sampler:
    type: parentbased_traceidratio
    argument: "1"
  nodejs:
apiVersion: v1
kind: Service
metadata:
  name: {{ .Chart.Name }}-service
spec:
  selector:
    tier: {{ .Chart.Name }}

  ports:
    - protocol: TCP
      port: 3001
      name: http
---

apiVersion: apps/v1
kind: Deployment
metadata:
  name: {{ .Chart.Name }}-deployment
spec:
  replicas: {{ .Values.replicaCount }}
  selector:
    matchLabels:
      tier: {{ .Chart.Name }}
  template:
    metadata:
      labels:
        app: backend
        tier: {{ .Chart.Name }}
      annotations:
        sidecar.opentelemetry.io/inject: "{{ .Values.opentelemetry.injectSidecar }}"
        instrumentation.opentelemetry.io/inject-nodejs: "{{ .Values.opentelemetry.autoInstrumentNodejs }}"
    spec:
      containers:
        - name: {{ .Chart.Name }}
          image: "{{ .Values.image.name }}:{{ .Values.image.tag }}"
          imagePullPolicy: {{ .Values.image.pullPolicy }}
          resources:
            requests:
              cpu: "{{ .Values.resources.requests.cpu }}"
              memory: "{{ .Values.resources.requests.memory }}"
            limits:
              cpu: "{{ .Values.resources.limits.cpu }}"
              memory: "{{ .Values.resources.limits.memory }}"

          livenessProbe:
            failureThreshold: 3
            httpGet:
              path: /health
              port: 3001
              scheme: HTTP
            initialDelaySeconds: 60
            periodSeconds: 60
            successThreshold: 1
            timeoutSeconds: 10

          readinessProbe:
            failureThreshold: 3
            httpGet:
              path: /health
              port: 3001
              scheme: HTTP
            initialDelaySeconds: 60
            periodSeconds: 60
            successThreshold: 1
            timeoutSeconds: 10
@f3l1x98
Copy link
Author

f3l1x98 commented Jan 12, 2024

Update

I have now tried the same using the sdk manually and that works as expected (getting http, controller, sql, etc traces).

However I would really like to use the CRD if possible.

@iblancasa
Copy link
Contributor

Can this be related to #2497?

@f3l1x98
Copy link
Author

f3l1x98 commented Jan 16, 2024

Can this be related to #2497?

I don't think so, after all I do get some traces from the autoinstrumentation, just not nearly all.
To me it looks like the instrumentation is not setup correctly and some parts are initialized after the application thus the instrumentation for (for example http) fails.
I have tried the CRD again with the image that has the opentelemetry packages installed but got the same result.

@f3l1x98
Copy link
Author

f3l1x98 commented Jan 16, 2024

Well I have enabled the debug logs of the Intsrumentation CRD and by the looks of it the instrumentation is mostly correctly initialized (at least http instrumentation succeeds), however the resulting spans are never sent to the otc-collector.
part of backend logs.

@opentelemetry/api: Registered a global for diag v1.7.0.
@opentelemetry/api: Registered a global for trace v1.7.0.
@opentelemetry/api: Registered a global for context v1.7.0.
@opentelemetry/api: Registered a global for propagation v1.7.0.
@opentelemetry/api: Registered a global for metrics v1.7.0.
@opentelemetry/instrumentation-fs Applying patch for fs
@opentelemetry/instrumentation-nestjs-core Applying create patch for @nestjs/core@9.0.5
@opentelemetry/instrumentation-nestjs-core Applying create patch for @nestjs/core@9.0.5
maxQueueSize reached, dropping spans
@opentelemetry/instrumentation-nestjs-core Patching @nestjs/core@9.0.5
@opentelemetry/instrumentation-winston Applying patch for winston@3.11.0
@opentelemetry/instrumentation-http Applying patch for http@20.10.0
@opentelemetry/instrumentation-http Applying patch for https@20.10.0
@opentelemetry/instrumentation-fs Applying patch for fs/promises
Applying patch for net module
Applying patch for express@4.18.1
Accessing resource attributes before async attributes settled
{"context":"NestFactory","level":"info","message":"Starting Nest application...","span_id":"3d190c80a1fe565d","timestamp":"2024-01-16T10:45:09.812Z","trace_flags":"01","trace_id":"5c948d80e0d3a83bcd3bb2e2d7e079f4"}

[...]

@opentelemetry/instrumentation-http http instrumentation incomingRequest
wrap lookup callback function and starts span
executing wrapped lookup callback function
executing original lookup callback function
@opentelemetry/instrumentation-http http instrumentation incomingRequest
wrap lookup callback function and starts span
executing wrapped lookup callback function
executing original lookup callback function
items to be sent [
  {
    resource: Resource {
      _attributes: [Object],
      asyncAttributesPending: false,
      _syncAttributes: [Object],
      _asyncAttributesPromise: [Promise]
    },
    scopeMetrics: [ [Object] ]
  }
]
Service request {
  resourceMetrics: [
    { resource: [Object], schemaUrl: undefined, scopeMetrics: [Array] }
  ]
}
{"stack":"Error: PeriodicExportingMetricReader: metrics export failed (error Error: 12 UNIMPLEMENTED: unknown service opentelemetry.proto.collector.metrics.v1.MetricsService)\n    at doExport (/otel-auto-instrumentation-nodejs/node_modules/@opentelemetry/sdk-metrics/build/src/export/PeriodicExportingMetricReader.js:76:23)\n    at process.processTicksAndRejections (node:internal/process/task_queues:95:5)\n    at async PeriodicExportingMetricReader._doRun (/otel-auto-instrumentation-nodejs/node_modules/@opentelemetry/sdk-metrics/build/src/export/PeriodicExportingMetricReader.js:84:13)\n    at async PeriodicExportingMetricReader._runOnce (/otel-auto-instrumentation-nodejs/node_modules/@opentelemetry/sdk-metrics/build/src/export/PeriodicExportingMetricReader.js:55:13)","message":"PeriodicExportingMetricReader: metrics export failed (error Error: 12 UNIMPLEMENTED: unknown service opentelemetry.proto.collector.metrics.v1.MetricsService)","name":"Error"}

The otc-collector sidecar only received 512 different spans of the instrumentation-fs

The only errors I can find during instrumentation init seem to be AWS related (after all it is not running in AWS).
If you want I can upload the whole logs (>5000 lines).

@mderazon
Copy link

instrumentation-fs is emitting a lot of traces (possibly when the node.js service starts up and reads node_modules dir) and is overall recommended to be turned off. I am not sure how that can be done through otel operator.
See more here:
#1283 (comment)

@pavolloffay
Copy link
Member

@mderazon can it be done by setting an env. variable?

@iblancasa
Copy link
Contributor

@mderazon I think the issue you're talking about is different from the original created by @f3l1x98. Can you create a new issue for better tracking?

@mderazon
Copy link

mderazon commented Jan 26, 2024

@iblancasa I wasn't sure I should create a new issue because this is written in the docs clearly:

At the moment, there is no way to opt-in to only specific packages or disable specific packages. If you don't want to use a package included by the default image you must either supply your own image that includes only the packages you want or use manual instrumentation.

However, it seems to be that the general recommendation from the node-instrumentation team is to disable instrumentation-fs.

I have also mentioned it in this issue: #1283 (comment). We can reopen that issue or I can create a new one, what would you prefer ?

@f3l1x98
Copy link
Author

f3l1x98 commented Feb 18, 2024

Update

I have now switched to using a custom image for the NodeJs instrumentation CRD.
This custom image is based upon the official one, but with the autoinstrumentation.ts replaced with the code from my manual instrumentation:

const { NodeTracerProvider } = require('@opentelemetry/sdk-trace-node');
const {
  getNodeAutoInstrumentations,
} = require('@opentelemetry/auto-instrumentations-node');
const {
  OTLPTraceExporter,
} = require('@opentelemetry/exporter-trace-otlp-http');
const { Resource } = require('@opentelemetry/resources');
const {
  SemanticResourceAttributes,
} = require('@opentelemetry/semantic-conventions');
const { SimpleSpanProcessor } = require('@opentelemetry/sdk-trace-base');
const { registerInstrumentations } = require('@opentelemetry/instrumentation');

const exporter = new OTLPTraceExporter();

const provider = new NodeTracerProvider({
  resource: new Resource({
    [SemanticResourceAttributes.SERVICE_NAME]: process.env.OTEL_SERVICE_NAME,
  }),
});
// Export spans to opentelemetry collector
provider.addSpanProcessor(new SimpleSpanProcessor(exporter));

provider.register();
registerInstrumentations({
  instrumentations: [
    getNodeAutoInstrumentations({
      // disable fs
      '@opentelemetry/instrumentation-fs': {
        enabled: false,
      },
    }),
  ],
});

Using this custom image everything works, so it looks like the issue is with the official NodeJs autoinstrumentation image.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants