Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug]: Empty results custom resources detail field in K8sGPT operator deployment with LocalAI backend #433

Open
3 of 4 tasks
ghost opened this issue Apr 30, 2024 · 27 comments

Comments

@ghost
Copy link

ghost commented Apr 30, 2024

Checklist

  • I've searched for similar issues and couldn't find anything matching
  • I've included steps to reproduce the behavior

Affected Components

  • K8sGPT (CLI)
  • K8sGPT Operator

K8sGPT Version

v0.1.3

Kubernetes Version

v1.25.0

Host OS and its Version

MacOS 14.4.1

Steps to reproduce

  1. Create a local K8s lab environment
  2. Install K8sGPT Operator using Helm
  3. Install LocalAI using Helm
  4. Install K8sGPT Operator custom resource
  5. Download and install a supported model such as: koala-7B.ggmlv3.q5_1.bin
  6. Deploy a broken workload

Expected behaviour

Results custom resources detail field should provide information or suggestion on how to fix the broken workload

Actual behaviour

Empty results custom resources detail field

Additional Information

Results custom resource shown here:


──> k get results --all-namespaces -o yaml                                                                                                                                                                                                                                                                                                                                           ──(Tue,Apr30)─┘
apiVersion: v1
items:
- apiVersion: core.k8sgpt.ai/v1alpha1
  kind: Result
  metadata:
    creationTimestamp: "2024-04-25T18:22:05Z"
    generation: 1
    labels:
      k8sgpts.k8sgpt.ai/backend: localai
      k8sgpts.k8sgpt.ai/name: k8sgpt-localai
      k8sgpts.k8sgpt.ai/namespace: k8sgpt-operator-system
    name: defaultbrokenpod
    namespace: k8sgpt-operator-system
    resourceVersion: "18921"
    uid: 4abe4470-5937-4954-8714-f8ec88782085
  spec:
    backend: localai
    details: ""
    error:
    - text: Back-off pulling image "nginx:x.y.z
    kind: Pod
    name: default/broken-pod
    parentObject: ""
  status:
    lifecycle: historical
- apiVersion: core.k8sgpt.ai/v1alpha1
  kind: Result
  metadata:
    creationTimestamp: "2024-04-25T19:04:37Z"
    generation: 1
    labels:
      k8sgpts.k8sgpt.ai/backend: localai
      k8sgpts.k8sgpt.ai/name: k8sgpt-localai
      k8sgpts.k8sgpt.ai/namespace: k8sgpt-operator-system
    name: defaulthelloworld5
    namespace: k8sgpt-operator-system
    resourceVersion: "23901"
    uid: c3b85108-2749-4f98-ae90-4aae580cb37b
  spec:
    backend: localai
    details: ""
    error:
    - sensitive:
      - masked: UE9J
        unmasked: app
      - masked: eEVsVmtOMGFXd0o=
        unmasked: hello-world
      text: Service has no endpoints, expected label app=hello-world
    kind: Service
    name: default/hello-world-5
    parentObject: ""
  status:
    lifecycle: historical
kind: List
metadata:
  resourceVersion: ""


LocalAI pod logs shown here:


──> k logs local-ai-85cc4f5bc-j8dxw -n local-ai -c local-ai --follow                                                                                                               
@@@@@
Skipping rebuild
@@@@@
If you are experiencing issues with the pre-compiled builds, try setting REBUILD=true
If you are still experiencing issues with the build, try setting CMAKE_ARGS and disable the instructions set as needed:
CMAKE_ARGS="-DLLAMA_F16C=OFF -DLLAMA_AVX512=OFF -DLLAMA_AVX2=OFF -DLLAMA_FMA=OFF"
see the documentation at: https://localai.io/basics/build/index.html
Note: See also https://github.com/go-skynet/LocalAI/issues/288
@@@@@
CPU info:
CPU: no AVX    found
CPU: no AVX2   found
CPU: no AVX512 found
@@@@@
2:17PM INF Starting LocalAI using 4 threads, with models path: /models
2:17PM INF LocalAI version: v1.23.0 (688f1504636810bbe40cffa2c88fe78f0ff09dc9)
2:17PM DBG Model: gpt-3.5-turbo (config: {PredictionOptions:{Model:open-llama-13b-open-instruct.ggmlv3.q3_K_M.bin Language: N:0 TopP:0.7 TopK:80 Temperature:0.2 Maxtokens:0 Echo:false Batch:0 F16:false IgnoreEOS:false RepeatPenalty:0 Keep:0 MirostatETA:0 MirostatTAU:0 Mirostat:0 FrequencyPenalty:0 TFZ:0 TypicalP:0 Seed:0 NegativePrompt: RopeFreqBase:0 RopeFreqScale:0 NegativePromptScale:0} Name:gpt-3.5-turbo StopWords:[] Cutstrings:[] TrimSpace:[] ContextSize:1024 F16:true NUMA:false Threads:0 Debug:false Roles:map[] Embeddings:false Backend: TemplateConfig:{Chat:chat ChatMessage: Completion:completion Edit: Functions:} MirostatETA:0 MirostatTAU:0 Mirostat:0 NGPULayers:0 MMap:true MMlock:false LowVRAM:false TensorSplit: MainGPU: ImageGenerationAssets: PromptCachePath: PromptCacheAll:false PromptCacheRO:false Grammar: PromptStrings:[] InputStrings:[] InputToken:[] functionCallString: functionCallNameString: FunctionsConfig:{DisableNoAction:false NoActionFunctionName: NoActionDescriptionName:} SystemPrompt:})
2:17PM DBG Model: koala (config: {PredictionOptions:{Model:koala-7B.ggmlv3.q5_1.bin Language: N:0 TopP:0.7 TopK:80 Temperature:0.2 Maxtokens:0 Echo:false Batch:0 F16:false IgnoreEOS:false RepeatPenalty:0 Keep:0 MirostatETA:0 MirostatTAU:0 Mirostat:0 FrequencyPenalty:0 TFZ:0 TypicalP:0 Seed:0 NegativePrompt: RopeFreqBase:0 RopeFreqScale:0 NegativePromptScale:0} Name:koala StopWords:[] Cutstrings:[] TrimSpace:[] ContextSize:1024 F16:false NUMA:false Threads:0 Debug:false Roles:map[assistant:GPT: system:SYSTEM: user:USER:] Embeddings:false Backend:llama TemplateConfig:{Chat:koala-chat ChatMessage: Completion:koala-completion Edit: Functions:} MirostatETA:0 MirostatTAU:0 Mirostat:0 NGPULayers:0 MMap:false MMlock:false LowVRAM:false TensorSplit: MainGPU: ImageGenerationAssets: PromptCachePath: PromptCacheAll:false PromptCacheRO:false Grammar: PromptStrings:[] InputStrings:[] InputToken:[] functionCallString: functionCallNameString: FunctionsConfig:{DisableNoAction:false NoActionFunctionName: NoActionDescriptionName:} SystemPrompt:})
2:17PM DBG Extracting backend assets files to /tmp/localai/backend_data
2:17PM DBG Config overrides map[f16:true mmap:true parameters:map[model:open-llama-13b-open-instruct.ggmlv3.q3_K_M.bin]]
2:17PM DBG Checking "open-llama-13b-open-instruct.ggmlv3.q3_K_M.bin" exists and matches SHA
2:17PM DBG File "open-llama-13b-open-instruct.ggmlv3.q3_K_M.bin" already exists. Skipping download
2:17PM DBG Prompt template "completion" written
2:17PM DBG Prompt template "chat" written
2:17PM DBG Written config file /models/gpt-3.5-turbo.yaml

 ┌───────────────────────────────────────────────────┐ 
 │                   Fiber v2.48.0                   │ 
 │               http://127.0.0.1:8080/               │ 
 │       (bound on host 0.0.0.0 and port 8080)       │ 
 │                                                   │ 
 │ Handlers ............ 32  Processes ........... 1 │ 
 │ Prefork ....... Disabled  PID ................ 20 │ 
 └───────────────────────────────────────────────────┘


K8sGPT operator controller manager pod logs shown here:


──> k logs k8sgpt-operator-controller-manager-6b87cf974f-q8jwk -n k8sgpt-operator-system -c manager --follow                                                             ──(Tue,Apr30)─┘
2024-04-30T14:17:23Z    INFO    controller-runtime.metrics      Metrics server is starting to listen    {"addr": "127.0.0.1:8080"}
2024-04-30T14:17:23Z    INFO    setup   starting manager
2024-04-30T14:17:23Z    INFO    Starting server {"kind": "health probe", "addr": "[::]:8081"}
2024-04-30T14:17:23Z    INFO    starting server {"path": "/metrics", "kind": "metrics", "addr": "127.0.0.1:8080"}
I0430 14:17:23.945181       1 leaderelection.go:250] attempting to acquire leader lease k8sgpt-operator-system/ea9c19f7.k8sgpt.ai...
I0430 14:17:38.977473       1 leaderelection.go:260] successfully acquired lease k8sgpt-operator-system/ea9c19f7.k8sgpt.ai
2024-04-30T14:17:38Z    DEBUG   events  k8sgpt-operator-controller-manager-6b87cf974f-q8jwk_fca71f3d-da5c-4f26-a258-b8e229c65a9d became leader  {"type": "Normal", "object": {"kind":"Lease","namespace":"k8sgpt-operator-system","name":"ea9c19f7.k8sgpt.ai","uid":"a9df876b-8a32-44f2-b255-3fb6674ee5fa","apiVersion":"coordination.k8s.io/v1","resourceVersion":"361803"}, "reason": "LeaderElection"}
2024-04-30T14:17:38Z    INFO    Starting EventSource    {"controller": "k8sgpt", "controllerGroup": "core.k8sgpt.ai", "controllerKind": "K8sGPT", "source": "kind source: *v1alpha1.K8sGPT"}
2024-04-30T14:17:38Z    INFO    Starting Controller     {"controller": "k8sgpt", "controllerGroup": "core.k8sgpt.ai", "controllerKind": "K8sGPT"}
2024-04-30T14:17:39Z    INFO    Starting workers        {"controller": "k8sgpt", "controllerGroup": "core.k8sgpt.ai", "controllerKind": "K8sGPT", "worker count": 1}
Creating new client for 10.96.236.145:8080
Connection established between 10.96.236.145:8080 and localhost with time out of 1 seconds.
Remote Address : 10.96.236.145:8080 
K8sGPT address: 10.96.236.145:8080
Checking if defaultbrokenpod is still relevant
Checking if defaulthelloworld5 is still relevant
Finished Reconciling k8sGPT
Creating new client for 10.96.236.145:8080
Connection established between 10.96.236.145:8080 and localhost with time out of 1 seconds.
Remote Address : 10.96.236.145:8080 
K8sGPT address: 10.96.236.145:8080
Checking if defaulthelloworld5 is still relevant
Checking if defaultbrokenpod is still relevant
Finished Reconciling k8sGPT
....
...

K8sGPT localAI pod logs here:


──> k logs k8sgpt-localai-84b5d6bd78-vxzp5 -n k8sgpt-operator-system --follow                                                                                                      ──(Tue,Apr30)─┘
{"level":"info","ts":1714486623.6705356,"caller":"server/server.go:126","msg":"binding metrics to 8081"}
{"level":"info","ts":1714486623.6716375,"caller":"server/server.go:92","msg":"binding api to 8080"}
[controller-runtime] log.SetLogger(...) was never called; logs will not be displayed.
Detected at:
        >  goroutine 67 [running]:
        >  runtime/debug.Stack()
        >       /usr/local/go/src/runtime/debug/stack.go:24 +0x64
        >  sigs.k8s.io/controller-runtime/pkg/log.eventuallyFulfillRoot()
        >       /go/pkg/mod/sigs.k8s.io/controller-runtime@v0.16.3/pkg/log/log.go:60 +0xf4
        >  sigs.k8s.io/controller-runtime/pkg/log.(*delegatingLogSink).WithName(0x40002f9b40, {0x265a1b6, 0x14})
        >       /go/pkg/mod/sigs.k8s.io/controller-runtime@v0.16.3/pkg/log/deleg.go:147 +0x34
        >  github.com/go-logr/logr.Logger.WithName({{0x300ae08, 0x40002f9b40}, 0x0}, {0x265a1b6?, 0x0?})
        >       /go/pkg/mod/github.com/go-logr/logr@v1.4.1/logr.go:345 +0x40
        >  sigs.k8s.io/controller-runtime/pkg/client.newClient(0x0?, {0x0, 0x0, {0x0, 0x0}, 0x0, {0x0, 0x0}, 0x0})
        >       /go/pkg/mod/sigs.k8s.io/controller-runtime@v0.16.3/pkg/client/client.go:122 +0xb4
        >  sigs.k8s.io/controller-runtime/pkg/client.New(0x0?, {0x0, 0x0, {0x0, 0x0}, 0x0, {0x0, 0x0}, 0x0})
        >       /go/pkg/mod/sigs.k8s.io/controller-runtime@v0.16.3/pkg/client/client.go:103 +0x54
        >  github.com/k8sgpt-ai/k8sgpt/pkg/kubernetes.NewClient({0x0, 0x0}, {0x0, 0x0})
        >       /workspace/pkg/kubernetes/kubernetes.go:62 +0x1b0
        >  github.com/k8sgpt-ai/k8sgpt/pkg/analysis.NewAnalysis({0x4000800618, 0x7}, {0x4000800680, 0x7}, {0x0, 0x0, 0x0}, {0x0, 0x0}, 0x0, ...)
        >       /workspace/pkg/analysis/analysis.go:86 +0xb0
        >  github.com/k8sgpt-ai/k8sgpt/pkg/server.(*handler).Analyze(0x485b150?, {0x3001ee8, 0x40008315c0}, 0x40006822d0)
        >       /workspace/pkg/server/analyze.go:23 +0xc4
        >  buf.build/gen/go/k8sgpt-ai/k8sgpt/grpc/go/schema/v1/schemav1grpc._ServerService_Analyze_Handler.func1({0x3001ee8?, 0x40008315c0?}, {0x24dd5e0?, 0x40006822d0?})
        >       /go/pkg/mod/buf.build/gen/go/k8sgpt-ai/k8sgpt/grpc/go@v1.3.0-20240213144542-6e830f3fdf19.2/schema/v1/schemav1grpc/server-service_grpc.pb.go:134 +0xd0
        >  github.com/k8sgpt-ai/k8sgpt/pkg/server.(*Config).Serve.logInterceptor.func1({0x3001ee8, 0x40008315c0}, {0x24dd5e0, 0x40006822d0}, 0x400060e940, 0x400052c6f0)
        >       /workspace/pkg/server/log.go:19 +0x70
        >  buf.build/gen/go/k8sgpt-ai/k8sgpt/grpc/go/schema/v1/schemav1grpc._ServerService_Analyze_Handler({0x22f72e0, 0x0}, {0x3001ee8, 0x40008315c0}, 0x400087a600, 0x40007e26f0)
        >       /go/pkg/mod/buf.build/gen/go/k8sgpt-ai/k8sgpt/grpc/go@v1.3.0-20240213144542-6e830f3fdf19.2/schema/v1/schemav1grpc/server-service_grpc.pb.go:136 +0x148
        >  google.golang.org/grpc.(*Server).processUnaryRPC(0x4000ae0000, {0x3001ee8, 0x4000831530}, {0x30117e0, 0x4000848340}, 0x400062e6c0, 0x4000830e10, 0x4882800, 0x0)
        >       /go/pkg/mod/google.golang.org/grpc@v1.62.0/server.go:1383 +0xb40
        >  google.golang.org/grpc.(*Server).handleStream(0x4000ae0000, {0x30117e0, 0x4000848340}, 0x400062e6c0)
        >       /go/pkg/mod/google.golang.org/grpc@v1.62.0/server.go:1794 +0xb10
        >  google.golang.org/grpc.(*Server).serveStreams.func2.1()
        >       /go/pkg/mod/google.golang.org/grpc@v1.62.0/server.go:1027 +0x8c
        >  created by google.golang.org/grpc.(*Server).serveStreams.func2 in goroutine 66
        >       /go/pkg/mod/google.golang.org/grpc@v1.62.0/server.go:1038 +0x13c
{"level":"info","ts":1714486659.634016,"caller":"server/log.go:50","msg":"request completed","duration_ms":20,"method":"/schema.v1.ServerService/Analyze","request":"backend:\"localai\" anonymize:true language:\"english\" max_concurrency:10 output:\"json\"","remote_addr":"10.244.2.26:50800"}
{"level":"info","ts":1714486689.7802212,"caller":"server/log.go:50","msg":"request completed","duration_ms":15,"method":"/schema.v1.ServerService/Analyze","request":"backend:\"localai\" anonymize:true language:\"english\" max_concurrency:10 output:\"json\"","remote_addr":"10.244.2.26:57512"}
{"level":"info","ts":1714486719.8170424,"caller":"server/log.go:50","msg":"request completed","duration_ms":10,"method":"/schema.v1.ServerService/Analyze","request":"backend:\"localai\" anonymize:true language:\"english\" max_concurrency:10 output:\"json\"","remote_addr":"10.244.2.26:47968"}

K8sGPT Custom Resource configuration shown here (tested both AI enabled and disabled):


apiVersion: core.k8sgpt.ai/v1alpha1
kind: K8sGPT
metadata:
  name: k8sgpt-localai
  namespace: k8sgpt-operator-system
spec:
  ai:
    # enabled: true
    enabled: false
    # model: gpt-3.5-turbo
    # model: open-llama-13b-open-instruct.ggmlv3.q3_K_M
    # model: koala-7B.ggmlv3.q8_0.bin
    model: koala-7B.ggmlv3.q5_1.bin
    backend: localai
    baseUrl: http://local-ai.local-ai.svc.cluster.local:8080/v1 # http://local-ai.<namespace_local_ai_was_installed_in>.svc.cluster.local:8080/v1
    # anonymized: false
    # language: english
  noCache: false
  version: v0.3.29
  # filters:
  #   - Ingress
  # sink:
  #   type: slack
  #   webhook: <webhook-url>
  # extraOptions:
  #   backstage:
  #     enabled: true
  
@ghost
Copy link
Author

ghost commented Apr 30, 2024

Local lab set up shown here:


──> podman container ls                                                                                                                                                                                                                                                                                                                                                      ──(Tue,Apr30)─┘
CONTAINER ID  IMAGE                           COMMAND     CREATED     STATUS      PORTS                      NAMES
984b6edb6559  docker.io/kindest/node:v1.25.0              5 days ago  Up 5 days   127.0.0.1:54906->6443/tcp  sandbox-control-plane
137c77669306  docker.io/kindest/node:v1.25.0              5 days ago  Up 5 days                              sandbox-worker
85d1753d8a47  docker.io/kindest/node:v1.25.0              5 days ago  Up 5 days                              sandbox-worker2


@ghost
Copy link
Author

ghost commented Apr 30, 2024

Downloaded models directory current content shown here:


──> k exec -it pod/local-ai-85cc4f5bc-j8dxw -n local-ai -c local-ai -- sh                                                                                                          ──(Tue,Apr30)─┘
# ls -l /models
total 21791480
-rw-r--r--. 1 root root        149 Apr 30 14:17 chat.tmpl
-rw-r--r--. 1 root root         11 Apr 30 14:17 completion.tmpl
-rw-r--r--. 1 root root 3785248281 Apr 25 18:09 ggml-gpt4all-j_f5d8f27287d3
-rw-r--r--. 1 root root        220 Apr 30 14:17 gpt-3.5-turbo.yaml
-rw-r--r--. 1 root root 5055128192 Apr 25 21:43 koala-7B.ggmlv3.q5_1.bin
-rw-r--r--. 1 root root 7160799872 Apr 25 20:52 koala-7B.ggmlv3.q8_0.bin
-rw-r--r--. 1 root root         42 Apr 25 22:05 koala-chat.tmpl
-rw-r--r--. 1 root root         11 Apr 25 22:05 koala-completion.tmpl
-rw-r--r--. 1 root root        259 Apr 25 22:05 koala.yaml
-rw-r--r--. 1 root root 6313255552 Apr 25 18:15 open-llama-13b-open-instruct.ggmlv3.q3_K_M.bin


@ghost
Copy link
Author

ghost commented Apr 30, 2024

LocalAI image information from Helm values shown here:


──> grep -inr 'image:' example-values.yaml -A 2                                                                                                                                                                                                                                                                                                                              ──(Tue,Apr30)─┘
example-values.yaml:5:  image: 
example-values.yaml-6-    repository: quay.io/go-skynet/local-ai
example-values.yaml-7-    tag: v1.23.0

@ghost
Copy link
Author

ghost commented Apr 30, 2024

Notes:

  • Same issue with LocalAI image tag: v1.30.0

  • I'll try to build latest ARM64 locally and see if latest version tag: v2.13.0 resolves issue or not

@JuHyung-Son
Copy link
Contributor

does this work in other version?

@atul86244
Copy link

Facing same issue with Amazon bedrock and latest version of k8sgpt and k8sgpt operator. Details field in the results object is empty, however I am able to get results from Amazon bedrock using the k8sgpt cli.

@JuHyung-Son
Copy link
Contributor

@atul86244 do you have any logs in operator?

@scaldarola
Copy link

@aparandian could you share the values you used to install the localai deployment?

@atul86244
Copy link

@JuHyung-Son yes, here are the logs:

% kubectl logs release-k8sgpt-operator-controller-manager-78d67f44c6-brh2g -n k8sgpt-operator-system -f
2024-05-31T19:56:25Z	INFO	controller-runtime.metrics	Metrics server is starting to listen	{"addr": "127.0.0.1:8080"}
2024-05-31T19:56:25Z	INFO	setup	starting manager
2024-05-31T19:56:25Z	INFO	Starting server	{"kind": "health probe", "addr": "[::]:8081"}
2024-05-31T19:56:25Z	INFO	starting server	{"path": "/metrics", "kind": "metrics", "addr": "127.0.0.1:8080"}
I0531 19:56:25.325562       1 leaderelection.go:250] attempting to acquire leader lease k8sgpt-operator-system/ea9c19f7.k8sgpt.ai...
I0531 19:56:25.330620       1 leaderelection.go:260] successfully acquired lease k8sgpt-operator-system/ea9c19f7.k8sgpt.ai
2024-05-31T19:56:25Z	DEBUG	events	release-k8sgpt-operator-controller-manager-78d67f44c6-brh2g_ddc3f564-b160-4264-8800-647ccbb6fe7a became leader	{"type": "Normal", "object": {"kind":"Lease","namespace":"k8sgpt-operator-system","name":"ea9c19f7.k8sgpt.ai","uid":"40763948-7e35-48d7-9fa1-410a87330b9e","apiVersion":"coordination.k8s.io/v1","resourceVersion":"584"}, "reason": "LeaderElection"}
2024-05-31T19:56:25Z	INFO	Starting EventSource	{"controller": "k8sgpt", "controllerGroup": "core.k8sgpt.ai", "controllerKind": "K8sGPT", "source": "kind source: *v1alpha1.K8sGPT"}
2024-05-31T19:56:25Z	INFO	Starting Controller	{"controller": "k8sgpt", "controllerGroup": "core.k8sgpt.ai", "controllerKind": "K8sGPT"}
2024-05-31T19:56:25Z	INFO	Starting workers	{"controller": "k8sgpt", "controllerGroup": "core.k8sgpt.ai", "controllerKind": "K8sGPT", "worker count": 1}
Finished Reconciling k8sGPT
Finished Reconciling k8sGPT
Creating new client for 10.96.219.101:8080
Connection established between 10.96.219.101:8080 and localhost with time out of 1 seconds.
Remote Address : 10.96.219.101:8080
K8sGPT address: 10.96.219.101:8080
Finished Reconciling k8sGPT
Creating new client for 10.96.219.101:8080
Connection established between 10.96.219.101:8080 and localhost with time out of 1 seconds.
Remote Address : 10.96.219.101:8080
K8sGPT address: 10.96.219.101:8080
Finished Reconciling k8sGPT
Creating new client for 10.96.219.101:8080
Connection established between 10.96.219.101:8080 and localhost with time out of 1 seconds.
Remote Address : 10.96.219.101:8080
K8sGPT address: 10.96.219.101:8080
Finished Reconciling k8sGPT
Creating new client for 10.96.219.101:8080
Connection established between 10.96.219.101:8080 and localhost with time out of 1 seconds.
Remote Address : 10.96.219.101:8080
K8sGPT address: 10.96.219.101:8080
Finished Reconciling k8sGPT
Creating new client for 10.96.219.101:8080
Connection established between 10.96.219.101:8080 and localhost with time out of 1 seconds.
Remote Address : 10.96.219.101:8080
K8sGPT address: 10.96.219.101:8080
% kubectl logs k8sgpt-sample-6fbc586686-zfxcl -n k8sgpt-operator-system -f
{"level":"info","ts":1717185510.1938887,"caller":"server/server.go:126","msg":"binding metrics to 8081"}
{"level":"info","ts":1717185510.1945617,"caller":"server/server.go:92","msg":"binding api to 8080"}
{"level":"info","ts":1717185532.1869886,"caller":"server/log.go:50","msg":"request completed","duration_ms":16,"method":"/schema.v1.ServerService/Analyze","request":"backend:\"amazonbedrock\" explain:true anonymize:true nocache:true language:\"english\" max_concurrency:10 output:\"json\"","remote_addr":"10.244.0.5:33194"}
[controller-runtime] log.SetLogger(...) was never called; logs will not be displayed.
Detected at:
	>  goroutine 93 [running]:
	>  runtime/debug.Stack()
	>  	/usr/local/go/src/runtime/debug/stack.go:24 +0x64
	>  sigs.k8s.io/controller-runtime/pkg/log.eventuallyFulfillRoot()
	>  	/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.16.3/pkg/log/log.go:60 +0xf4
	>  sigs.k8s.io/controller-runtime/pkg/log.(*delegatingLogSink).WithName(0x400094c880, {0x2832f01, 0x14})
	>  	/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.16.3/pkg/log/deleg.go:147 +0x34
	>  github.com/go-logr/logr.Logger.WithName({{0x30913e8, 0x400094c880}, 0x0}, {0x2832f01?, 0x0?})
	>  	/go/pkg/mod/github.com/go-logr/logr@v1.4.1/logr.go:345 +0x40
	>  sigs.k8s.io/controller-runtime/pkg/client.newClient(0x0?, {0x0, 0x0, {0x0, 0x0}, 0x0, {0x0, 0x0}, 0x0})
	>  	/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.16.3/pkg/client/client.go:122 +0xb4
	>  sigs.k8s.io/controller-runtime/pkg/client.New(0x40008b32f8?, {0x0, 0x0, {0x0, 0x0}, 0x0, {0x0, 0x0}, 0x0})
	>  	/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.16.3/pkg/client/client.go:103 +0x54
	>  github.com/k8sgpt-ai/k8sgpt/pkg/kubernetes.NewClient({0x0, 0x0}, {0x0, 0x0})
	>  	/workspace/pkg/kubernetes/kubernetes.go:62 +0x1b0
	>  github.com/k8sgpt-ai/k8sgpt/pkg/analysis.NewAnalysis({0x4000baf3e0, 0xd}, {0x4000baf3f0, 0x7}, {0x0, 0x0, 0x0}, {0x0, 0x0}, 0x1, ...)
	>  	/workspace/pkg/analysis/analysis.go:86 +0xb0
	>  github.com/k8sgpt-ai/k8sgpt/pkg/server.(*handler).Analyze(0x4a044b0?, {0x3088070, 0x4000b10c60}, 0x4000bb7680)
	>  	/workspace/pkg/server/analyze.go:23 +0xc8
	>  buf.build/gen/go/k8sgpt-ai/k8sgpt/grpc/go/schema/v1/schemav1grpc._ServerService_Analyze_Handler.func1({0x3088070?, 0x4000b10c60?}, {0x26a62c0?, 0x4000bb7680?})
	>  	/go/pkg/mod/buf.build/gen/go/k8sgpt-ai/k8sgpt/grpc/go@v1.3.0-20240406062209-1cc152efbf5c.3/schema/v1/schemav1grpc/server-service_grpc.pb.go:134 +0xd0
	>  github.com/k8sgpt-ai/k8sgpt/pkg/server.(*Config).Serve.logInterceptor.func1({0x3088070, 0x4000b10c60}, {0x26a62c0, 0x4000bb7680}, 0x4000157200, 0x4000b14678)
	>  	/workspace/pkg/server/log.go:19 +0x70
	>  buf.build/gen/go/k8sgpt-ai/k8sgpt/grpc/go/schema/v1/schemav1grpc._ServerService_Analyze_Handler({0x2490da0, 0x0}, {0x3088070, 0x4000b10c60}, 0x400050c200, 0x4000514900)
	>  	/go/pkg/mod/buf.build/gen/go/k8sgpt-ai/k8sgpt/grpc/go@v1.3.0-20240406062209-1cc152efbf5c.3/schema/v1/schemav1grpc/server-service_grpc.pb.go:136 +0x148
	>  google.golang.org/grpc.(*Server).processUnaryRPC(0x400064e400, {0x3088070, 0x4000b10bd0}, {0x30980a0, 0x40005f9520}, 0x4000b0e7e0, 0x4000c6bbc0, 0x4a2c3c0, 0x0)
	>  	/go/pkg/mod/google.golang.org/grpc@v1.62.1/server.go:1386 +0xb58
	>  google.golang.org/grpc.(*Server).handleStream(0x400064e400, {0x30980a0, 0x40005f9520}, 0x4000b0e7e0)
	>  	/go/pkg/mod/google.golang.org/grpc@v1.62.1/server.go:1797 +0xb10
	>  google.golang.org/grpc.(*Server).serveStreams.func2.1()
	>  	/go/pkg/mod/google.golang.org/grpc@v1.62.1/server.go:1027 +0x8c
	>  created by google.golang.org/grpc.(*Server).serveStreams.func2 in goroutine 109
	>  	/go/pkg/mod/google.golang.org/grpc@v1.62.1/server.go:1038 +0x13c
{"level":"info","ts":1717185562.333694,"caller":"server/log.go:50","msg":"request completed","duration_ms":16,"method":"/schema.v1.ServerService/Analyze","request":"backend:\"amazonbedrock\" explain:true anonymize:true nocache:true language:\"english\" max_concurrency:10 output:\"json\"","remote_addr":"10.244.0.5:38614"}
{"level":"info","ts":1717185592.3782623,"caller":"server/log.go:50","msg":"request completed","duration_ms":12,"method":"/schema.v1.ServerService/Analyze","request":"backend:\"amazonbedrock\" explain:true anonymize:true nocache:true language:\"english\" max_concurrency:10 output:\"json\"","remote_addr":"10.244.0.5:57486"}
{"level":"info","ts":1717185622.4107533,"caller":"server/log.go:50","msg":"request completed","duration_ms":4,"method":"/schema.v1.ServerService/Analyze","request":"backend:\"amazonbedrock\" explain:true anonymize:true nocache:true language:\"english\" max_concurrency:10 output:\"json\"","remote_addr":"10.244.0.5:42980"}
{"level":"info","ts":1717185652.4536068,"caller":"server/log.go:50","msg":"request completed","duration_ms":12,"method":"/schema.v1.ServerService/Analyze","request":"backend:\"amazonbedrock\" explain:true anonymize:true nocache:true language:\"english\" max_concurrency:10 output:\"json\"","remote_addr":"10.244.0.5:36968"}
{"level":"info","ts":1717185682.4772131,"caller":"server/log.go:50","msg":"request completed","duration_ms":9,"method":"/schema.v1.ServerService/Analyze","request":"backend:\"amazonbedrock\" explain:true anonymize:true nocache:true language:\"english\" max_concurrency:10 output:\"json\"","remote_addr":"10.244.0.5:50030"}
{"level":"info","ts":1717185712.5117087,"caller":"server/log.go:50","msg":"request completed","duration_ms":11,"method":"/schema.v1.ServerService/Analyze","request":"backend:\"amazonbedrock\" explain:true anonymize:true nocache:true language:\"english\" max_concurrency:10 output:\"json\"","remote_addr":"10.244.0.5:54928"}

@AlexsJones
Copy link
Member

Will take a look at this

@JuHyung-Son
Copy link
Contributor

can you actually call llm request directly??

im not familiar with localai, i got this message

ccurl http://XXX:8080/v1/completions -H "Content-Type: application/json" -d '{
     "model": "phi-2",
     "prompt": "A long time ago in a galaxy far, far away",
     "temperature": 0.7
   }'
{"error":{"code":500,"message":"could not load model - all backends returned error: [llama-cpp]: could not load model: rpc error: code = Canceled desc = \n[llama-ggml]: could not load model: rpc error: code = Unknown desc = failed loading model\n[gpt4all]: could not load model: rpc error: code = Unknown desc = failed loading model\n[llama-cpp-fallback]: could not load model: rpc error: code = Canceled desc = \n[stablediffusion]: could not load model: rpc error: code = Unknown desc = stat /build/models/gpt-4: no such file or directory\n[piper]: could not load model: rpc error: code = Unknown desc = unsupported model type /build/models/gpt-4 (should end with .onnx)\n[rwkv]: could not load model: rpc error: code = Unavailable desc = error reading from server: read tcp 127.0.0.1:49400-\u003e127.0.0.1:33727: read: connection reset by peer\n[whisper]: could not load model: rpc error: code = Unknown desc = stat /build/models/gpt-4: no such file or directory\n[huggingface]: could not load model: rpc error: code = Unknown desc = no huggingface token provided\n[bert-embeddings]: could not load model: rpc error: code = Unknown desc = failed loading model\n[/build/backend/python/vllm/run.sh]: grpc process not found: /tmp/localai/backend_data/backend-assets/grpc/build/backend/python/vllm/run.sh. some backends(stablediffusion, tts) require LocalAI compiled with GO_TAGS\n[/build/backend/python/transformers-musicgen/run.sh]: grpc process not found: /tmp/localai/backend_data/backend-assets/grpc/build/backend/python/transformers-musicgen/run.sh. some backends(stablediffusion, tts) require LocalAI compiled with GO_TAGS\n[/build/backend/python/bark/run.sh]: grpc process not found: /tmp/localai/backend_data/backend-assets/grpc/build/backend/python/bark/run.sh. some backends(stablediffusion, tts) require LocalAI compiled with GO_TAGS\n[/build/backend/python/sentencetransformers/run.sh]: grpc process not found: /tmp/localai/backend_data/backend-assets/grpc/build/backend/python/sentencetransformers/run.sh. some backends(stablediffusion, tts) require LocalAI compiled with GO_TAGS\n[/build/backend/python/transformers/run.sh]: grpc process not found: /tmp/localai/backend_data/backend-assets/grpc/build/backend/python/transformers/run.sh. some backends(stablediffusion, tts) require LocalAI compiled with GO_TAGS\n[/build/backend/python/diffusers/run.sh]: grpc process not found: /tmp/localai/backend_data/backend-assets/grpc/build/backend/python/diffusers/run.sh. some backends(stablediffusion, tts) require LocalAI compiled with GO_TAGS\n[/build/backend/python/exllama2/run.sh]: grpc process not found: /tmp/localai/backend_data/backend-assets/grpc/build/backend/python/exllama2/run.sh. some backends(stablediffusion, tts) require LocalAI compiled with GO_TAGS\n[/build/backend/python/coqui/run.sh]: grpc process not found: /tmp/localai/backend_data/backend-assets/grpc/build/backend/python/coqui/run.sh. some backends(stablediffusion, tts) require LocalAI compiled with GO_TAGS\n[/build/backend/python/parler-tts/run.sh]: grpc process not found: /tmp/localai/backend_data/backend-assets/grpc/build/backend/python/parler-tts/run.sh. some backends(stablediffusion, tts) require LocalAI compiled with GO_TAGS\n[/build/backend/python/autogptq/run.sh]: grpc process not found: /tmp/localai/backend_data/backend-assets/grpc/build/backend/python/autogptq/run.sh. some backends(stablediffusion, tts) require LocalAI compiled with GO_TAGS\n[/build/backend/python/petals/run.sh]: grpc process not found: /tmp/localai/backend_data/backend-assets/grpc/build/backend/python/petals/run.sh. some backends(stablediffusion, tts) require LocalAI compiled with GO_TAGS\n[/build/backend/python/rerankers/run.sh]: grpc process not found: /tmp/localai/backend_data/backend-assets/grpc/build/backend/python/rerankers/run.sh. some backends(stablediffusion, tts) require LocalAI compiled with GO_TAGS\n[/build/backend/python/exllama/run.sh]: grpc process not found: /tmp/localai/backend_data/backend-assets/grpc/build/backend/python/exllama/run.sh. some backends(stablediffusion, tts) require LocalAI compiled with GO_TAGS\n[/build/backend/python/vall-e-x/run.sh]: grpc process not found: /tmp/localai/backend_data/backend-assets/grpc/build/backend/python/vall-e-x/run.sh. some backends(stablediffusion, tts) require LocalAI compiled with GO_TAGS\n[/build/backend/python/mamba/run.sh]: grpc process not found: /tmp/localai/backend_data/backend-assets/grpc/build/backend/python/mamba/run.sh. some backends(stablediffusion, tts) require LocalAI compiled with GO_TAGS\n[/build/backend/python/openvoice/run.sh]: grpc process not found: /tmp/localai/backend_data/backend-assets/grpc/build/backend/python/openvoice/run.sh. some backends(stablediffusion, tts) require LocalAI compiled with GO_TAGS\n[/build/backend/python/sentencetransformers/run.sh]: grpc process not found: /tmp/localai/backend_data/backend-assets/grpc/build/backend/python/sentencetransformers/run.sh. some backends(stablediffusion, tts) require LocalAI compiled with GO_TAGS","type":""}}%                                                                                                               

@AlexsJones
Copy link
Member

I could not reproduce this with:

  • k8sgpt/k8sgpt-operator 0.1.6 0.0.26
  • OpenAI backend
    Will try with bedrock

@JuHyung-Son
Copy link
Contributor

it seems localai backend related

@AlexsJones
Copy link
Member

Facing same issue with Amazon bedrock and latest version of k8sgpt and k8sgpt operator. Details field in the results object is empty, however I am able to get results from Amazon bedrock using the k8sgpt cli.

I will test this now..

@cf250024
Copy link

cf250024 commented Jun 24, 2024

As JuHyung-Son mentioned, it could be LocalAI related.
I setup localAI and k8sgpt CLI is working fine with backend local-ai.

However, the "k8sgpt-sample-localai" service for K8sGPT CRD uses 8080 port as well. Since I have used 8080 for local-ai, not sure where I can specify the port for K8sGPT CRD so that it won't conflict....

Is here hardcoded?

	Spec: corev1.ServiceSpec{
		Selector: map[string]string{
			"app": config.Name,
		},
		Ports: []corev1.ServicePort{
			{
				Port: 8080,
			},
		},
	},

@JuHyung-Son
Copy link
Contributor

JuHyung-Son commented Jun 30, 2024

@cf250024
nice catch!
k8sgpt port is hardcoded. It would be better if port is assignable.

and if local-ai server is deployed on same namespace with k8sgpt, it should use other port. otherwise, port does not matter

@arbreezy
Copy link
Member

arbreezy commented Jul 8, 2024

@cf250024 nice catch! k8sgpt port is hardcoded. It would be better if port is assignable.

and if local-ai server is deployed on same namespace with k8sgpt, it should use other port. otherwise, port does not matter

nice first issue and worth creating a gh issue :)

@atul86244
Copy link

Hi @AlexsJones, as discussed, here is the AI spec I am using for bedrock:

spec:
  ai:
    anonymized: true
    backOff:
      enabled: true
      maxRetries: 5
    backend: amazonbedrock
    enabled: true
    language: english
    model: anthropic.claude-v2
    region: us-west-2
    secret:
      name: bedrock-secret

@AlexsJones
Copy link
Member

AlexsJones commented Jul 12, 2024

I wanted to report back my own experience.

Setup

Kind local k8s
Client Version: v1.29.1
Kustomize Version: v5.0.4-0.20230601165947-6ce0bf390ce3
Server Version: v1.29.2

K8sGPT object

apiVersion: core.k8sgpt.ai/v1alpha1
  kind: K8sGPT
  metadata:
    annotations:
      kubectl.kubernetes.io/last-applied-configuration: |
        {"apiVersion":"core.k8sgpt.ai/v1alpha1","kind":"K8sGPT","metadata":{"annotations":{},"name":"k8sgpt-sample","namespace":"k8sgpt-operator-system"},"spec":{"ai":{"backend":"amazonbedrock","enabled":true,"model":"anthropic.claude-v2","region":"eu-central-1","secret":{"name":"bedrock-sample-secret"}},"noCache":false,"repository":"ghcr.io/k8sgpt-ai/k8sgpt","version":"v0.3.29"}}
    creationTimestamp: "2024-07-12T08:31:35Z"
    finalizers:
    - k8sgpt.ai/finalizer
    generation: 3
    name: k8sgpt-sample
    namespace: k8sgpt-operator-system
    resourceVersion: "24634"
    uid: dd813b2e-c4f9-4f40-85f1-0b162799d395
  spec:
    ai:
      anonymized: true
      backOff:
        enabled: true
        maxRetries: 5
      backend: amazonbedrock
      enabled: true
      language: english
      model: anthropic.claude-v2
      region: eu-central-1
      secret:
        name: bedrock-sample-secret
    repository: ghcr.io/k8sgpt-ai/k8sgpt
    version: **v0.3.29**

This was using the operator, version

    helm.sh/chart: k8sgpt-operator-0.1.6                                                                                             │
│     app.kubernetes.io/name: k8sgpt-operator                                                                                          │
│     app.kubernetes.io/instance: release                                                                                              │
│     app.kubernetes.io/version: "0.0.26"

I installed the operator into Kind, then followed the README.md on getting Bedrock installed.

e.g.

kubectl create secret generic bedrock-sample-secret --from-literal=AWS_ACCESS_KEY_ID="$(echo $AWS_ACCESS_KEY_ID)" --from-literal=AWS_SECRET_ACCESS_KEY="$(echo $AWS_SECRET_ACCESS_KEY)" -n k8sgpt-operator-system
kubectl apply -f - << EOF\napiVersion: core.k8sgpt.ai/v1alpha1\nkind: K8sGPT\nmetadata:\n  name: k8sgpt-sample\n  namespace: k8sgpt-operator-system\nspec:\n  ai:\n    enabled: true\n    secret:\n     name: bedrock-sample-secret\n    model: anthropic.claude-v2\n    region: eu-central-1\n    backend: amazonbedrock\n  noCache: false\n  repository: ghcr.io/k8sgpt-ai/k8sgpt\n  version: v0.3.29\nEOF

I was then able to see populated results

apiVersion: v1
items:
- apiVersion: core.k8sgpt.ai/v1alpha1
  kind: Result
  metadata:
    creationTimestamp: "2024-07-12T08:32:18Z"
    generation: 1
    labels:
      k8sgpts.k8sgpt.ai/backend: amazonbedrock
      k8sgpts.k8sgpt.ai/name: k8sgpt-sample
      k8sgpts.k8sgpt.ai/namespace: k8sgpt-operator-system
    name: argocdargocdapplicationcontroller
    namespace: k8sgpt-operator-system
    resourceVersion: "24810"
    uid: fddefa55-596f-48b8-8e43-19b0c286a198
  spec:
    backend: amazonbedrock
    details: |2-
       **Error: The StatefulSet is trying to use a service called osz17/osz17-application-controller that does not exist.
      Solution: 1. Check if the service osz17/osz17-application-controller exists with kubectl get service. 2. If it does not exist, create it with kubectl create service. 3. If it does exist, check that the name and namespace match what the StatefulSet expects.**
    error:
    - sensitive:
      - masked: LG9zZzE3
        unmasked: argocd
      - masked: bD56W3JpPnA4TC0lQ1E5TWpTSkJYNCxnNXZIVmE=
        unmasked: argocd-application-controller
      text: StatefulSet uses the service argocd/argocd-application-controller which
        does not exist.
    kind: StatefulSet
    name: argocd/argocd-application-controller
    parentObject: ""
  status:
    lifecycle: historical
kind: List
metadata:
  resourceVersion: ""
Screenshot 2024-07-12 at 09 33 09 ind screen shots Screenshot 2024-07-12 at 09 33 03 Screenshot 2024-07-12 at 09 33 01

I'd recommend trying with similar settings to me

apiVersion: core.k8sgpt.ai/v1alpha1
kind: K8sGPT
metadata:
  name: k8sgpt-sample
  namespace: k8sgpt-operator-system
spec:
  ai:
    **enabled: true**
    secret:
     name: bedrock-sample-secret
    model: anthropic.claude-v2
    region: eu-central-1
    backend: amazonbedrock
  noCache: false
  repository: ghcr.io/k8sgpt-ai/k8sgpt
  version: v0.3.29

@atul86244
Copy link

@AlexsJones , I think the issue here is that my org provides temporary creds for aws so passing the AWS_ACCESS_KEY_ID & AWS_SECRET_ACCESS_KEY to the secret is not working. I tried adding AWS_SESSION_TOKEN to the secrets but that didn't help either.

Errors from the logs below:

Finished Reconciling k8sGPT with error: failed to call Analyze RPC: rpc error: code = Unknown desc = failed while calling AI provider amazonbedrock: UnrecognizedClientException: The security token included in the request is invalid.
	status code: 403, request id: a94edbbc-035f-4ff6-9949-1c20d410decf
2024-07-16T20:48:49Z	ERROR	Reconciler error	{"controller": "k8sgpt", "controllerGroup": "core.k8sgpt.ai", "controllerKind": "K8sGPT", "K8sGPT": {"name":"k8sgpt-sample","namespace":"k8sgpt-operator-system"}, "namespace": "k8sgpt-operator-system", "name": "k8sgpt-sample", "reconcileID": "8aa095dd-6ce4-4fdc-ac8b-1b6dbbe563bb", "error": "failed to call Analyze RPC: rpc error: code = Unknown desc = failed while calling AI provider amazonbedrock: UnrecognizedClientException: The security token included in the request is invalid.\n\tstatus code: 403, request id: a94edbbc-035f-4ff6-9949-1c20d410decf"}
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).reconcileHandler
	/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.15.0/pkg/internal/controller/controller.go:324
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem
	/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.15.0/pkg/internal/controller/controller.go:265
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2.2
	/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.15.0/pkg/internal/controller/controller.go:226
Creating new client for 10.96.210.22:8080
Connection established between 10.96.210.22:8080 and localhost with time out of 1 seconds.
Remote Address : 10.96.210.22:8080
K8sGPT address: 10.96.210.22:8080
2024-07-16T20:48:49Z	ERROR	Reconciler error	{"controller": "k8sgpt", "controllerGroup": "core.k8sgpt.ai", "controllerKind": "K8sGPT", "K8sGPT": {"name":"k8sgpt-sample","namespace":"k8sgpt-operator-system"}, "namespace": "k8sgpt-operator-system", "name": "k8sgpt-sample", "reconcileID": "cb7b032e-b160-45eb-98b8-121004bde8c6", "error": "failed to call Analyze RPC: rpc error: code = Unknown desc = failed while calling AI provider amazonbedrock: UnrecognizedClientException: The security token included in the request is invalid.\n\tstatus code: 403, request id: 3a5d5bce-6b33-4eaa-aac8-fc572c22d2a5"}

Tried IRSA too but couldn't find the right config to use for K8sGPT, tried below setting with IRSA but it doesn't work:

apiVersion: core.k8sgpt.ai/v1alpha1
kind: K8sGPT
metadata:
  name: k8sgpt-sample
  namespace: k8sgpt-operator-system
spec:
  ai:
    enabled: true
    #secret:
     #name: bedrock-sample-secret
    model: anthropic.claude-v2
    region: us-west-2
    backend: amazonbedrock
  noCache: false
  repository: ghcr.io/k8sgpt-ai/k8sgpt
  version: v0.3.29
Finished Reconciling k8sGPT with error: secret is required for amazonbedrock backend
2024-07-16T20:17:27Z	ERROR	Reconciler error	{"controller": "k8sgpt", "controllerGroup": "core.k8sgpt.ai", "controllerKind": "K8sGPT", "K8sGPT": {"name":"k8sgpt-sample","namespace":"k8sgpt-operator-system"}, "namespace": "k8sgpt-operator-system", "name": "k8sgpt-sample", "reconcileID": "226ecdda-cfd1-4430-9061-3035025595ff", "error": "secret is required for amazonbedrock backend"}
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).reconcileHandler
	/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.15.0/pkg/internal/controller/controller.go:324
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem
	/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.15.0/pkg/internal/controller/controller.go:265
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2.2
	/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.15.0/pkg/internal/controller/controller.go:226
Finished Reconciling k8sGPT with error: secret is required for amazonbedrock backend
2024-07-16T20:17:27Z	ERROR	Reconciler error	{"controller": "k8sgpt", "controllerGroup": "core.k8sgpt.ai", "controllerKind": "K8sGPT", "K8sGPT": {"name":"k8sgpt-sample","namespace":"k8sgpt-operator-system"}, "namespace": "k8sgpt-operator-system", "name": "k8sgpt-sample", "reconcileID": "fed7b585-ee10-49a2-aabd-8187c1d72b1b", "error": "secret is required for amazonbedrock backend"}
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).reconcileHandler
	/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.15.0/pkg/internal/controller/controller.go:324
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem
	/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.15.0/pkg/internal/controller/controller.go:265
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2.2
	/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.15.0/pkg/internal/controller/controller.go:226

Do we have any specific k8sgpt setting which should be used with IRSA?

@AlexsJones
Copy link
Member

The AWS go-sdk does support a few different modes of identity management. For AWS_SESSION_TOKEN to work, you would need to set that on the k8gpt-deployment yaml a secret mounted as an env, is that what you tried?

@atul86244
Copy link

The AWS go-sdk does support a few different modes of identity management. For AWS_SESSION_TOKEN to work, you would need to set that on the k8gpt-deployment yaml a secret mounted as an env, is that what you tried?

Yes, this is what I tried, the changes done on the k8gpt-deployment yaml keeps getting overridden by K8sGPT object so it never picks up the changes. Not sure if the support for AWS_SESSION_TOKEN needs to be added here https://github.com/k8sgpt-ai/k8sgpt-operator/blob/main/pkg/resources/k8sgpt.go#L297:L304

@AlexsJones
Copy link
Member

The AWS go-sdk does support a few different modes of identity management. For AWS_SESSION_TOKEN to work, you would need to set that on the k8gpt-deployment yaml a secret mounted as an env, is that what you tried?

Yes, this is what I tried, the changes done on the k8gpt-deployment yaml keeps getting overridden by K8sGPT object so it never picks up the changes. Not sure if the support for AWS_SESSION_TOKEN needs to be added here https://github.com/k8sgpt-ai/k8sgpt-operator/blob/main/pkg/resources/k8sgpt.go#L297:L304

The downside of this approach is that on average the session key would need rotation every twelve hours

@AlexsJones
Copy link
Member

I will open a branch to provide support for IRSA, you can test it out for me and let me know if it helps!

@angelaaaaaaaw
Copy link

Tested operator with eks pod identity emplemented for bedrock as backend, however still get results as empty

set up

eks 1.28 
Client Version: v1.28.11-eks-1552ad0
Kustomize Version: v5.0.4-0.20230601165947-6ce0bf390ce3
Server Version: v1.28.11-eks-db838b0

Latest operator 0.1.7 where EKS Pod identity is supported

k8sgpt bedrock config

apiVersion: core.k8sgpt.ai/v1alpha1
kind: K8sGPT
metadata:
  name: k8sgpt-bedrock
  namespace: k8sgpt-operator-system
spec:
  ai:
    enabled: true
    model: anthropic.claude-v2
    region: us-west-2
    backend: amazonbedrock
    language: english
  noCache: false
  repository: ghcr.io/k8sgpt-ai/k8sgpt
  version: v0.3.40

sa

k get sa -n k8sgpt-operator-system k8sgpt  -o yaml
apiVersion: v1
kind: ServiceAccount
metadata:
  annotations:
    eks.amazonaws.com/role-arn: arn:aws:iam::123456789:role/k8sgpt-bedrock
    meta.helm.sh/release-name: release
    meta.helm.sh/release-namespace: k8sgpt-operator-system
  creationTimestamp: "2024-08-16T12:57:10Z"
  labels:
    app.kubernetes.io/component: rbac
    app.kubernetes.io/created-by: k8sgpt-operator
    app.kubernetes.io/instance: release
    app.kubernetes.io/managed-by: Helm
    app.kubernetes.io/name: k8sgpt-operator
    app.kubernetes.io/part-of: k8sgpt-operator
    app.kubernetes.io/version: 0.0.26
    helm.sh/chart: k8sgpt-operator-0.1.7
  name: k8sgpt
  namespace: k8sgpt-operator-system

log from k8sgpt bedrock backend

kubectl -n k8sgpt-operator-system logs -l app=k8sgpt-bedrock  -f
{"level":"info","ts":1723981298.5366411,"caller":"server/server.go:126","msg":"binding metrics to 8081"}
{"level":"info","ts":1723981298.537016,"caller":"server/server.go:92","msg":"binding api to 8080"}





{"level":"info","ts":1723981344.4827738,"caller":"server/log.go:50","msg":"request completed","duration_ms":23022,"method":"/schema.v1.ServerService/Analyze","request":"backend:\"amazonbedrock\"  anonymize:true  language:\"english\"  max_concurrency:10  output:\"json\"","remote_addr":"192.168.114.216:54102"}
[controller-runtime] log.SetLogger(...) was never called; logs will not be displayed.
Detected at:
	>  goroutine 196 [running]:
	>  runtime/debug.Stack()
	>  	/usr/local/go/src/runtime/debug/stack.go:24 +0x5e
	>  sigs.k8s.io/controller-runtime/pkg/log.eventuallyFulfillRoot()
	>  	/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.18.4/pkg/log/log.go:60 +0xcd
	>  sigs.k8s.io/controller-runtime/pkg/log.(*delegatingLogSink).WithName(0xc00086b340, {0x2faa847, 0x14})
	>  	/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.18.4/pkg/log/deleg.go:147 +0x3e
	>  github.com/go-logr/logr.Logger.WithName({{0x382fb60, 0xc00086b340}, 0x0}, {0x2faa847?, 0x0?})
	>  	/go/pkg/mod/github.com/go-logr/logr@v1.4.2/logr.go:345 +0x36
	>  sigs.k8s.io/controller-runtime/pkg/client.newClient(0x0?, {0x0, 0x0, {0x0, 0x0}, 0x0, {0x0, 0x0}, 0x0})
	>  	/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.18.4/pkg/client/client.go:129 +0xf1
	>  sigs.k8s.io/controller-runtime/pkg/client.New(0xa?, {0x0, 0x0, {0x0, 0x0}, 0x0, {0x0, 0x0}, 0x0})
	>  	/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.18.4/pkg/client/client.go:110 +0x7d
	>  github.com/k8sgpt-ai/k8sgpt/pkg/kubernetes.NewClient({0x0, 0x0}, {0x0, 0x0})
	>  	/workspace/pkg/kubernetes/kubernetes.go:62 +0x1d8
	>  github.com/k8sgpt-ai/k8sgpt/pkg/analysis.NewAnalysis({0xc000fa5dd0, 0xd}, {0xc000fa5dc8, 0x7}, {0x0, 0x0, 0x0}, {0x0, 0x0}, {0x0, ...}, ...)
	>  	/workspace/pkg/analysis/analysis.go:89 +0xc5
	>  github.com/k8sgpt-ai/k8sgpt/pkg/server.(*handler).Analyze(0x5322080?, {0x38265b8, 0xc0012c7c20}, 0xc0006f8820)
	>  	/workspace/pkg/server/analyze.go:22 +0x13c
	>  buf.build/gen/go/k8sgpt-ai/k8sgpt/grpc/go/schema/v1/schemav1grpc._ServerService_Analyze_Handler.func1({0x38265b8?, 0xc0012c7c20?}, {0x2e1b800?, 0xc0006f8820?})
	>  	/go/pkg/mod/buf.build/gen/go/k8sgpt-ai/k8sgpt/grpc/go@v1.4.0-20240720172138-1b9bcd834f17.2/schema/v1/schemav1grpc/server-service_grpc.pb.go:138 +0xcb
	>  github.com/k8sgpt-ai/k8sgpt/pkg/server.(*Config).Serve.logInterceptor.func1({0x38265b8, 0xc0012c7c20}, {0x2e1b800, 0xc0006f8820}, 0xc000bebba0, 0xc000e227f8)
	>  	/workspace/pkg/server/log.go:19 +0xa3
	>  buf.build/gen/go/k8sgpt-ai/k8sgpt/grpc/go/schema/v1/schemav1grpc._ServerService_Analyze_Handler({0x2bcbf80, 0x0}, {0x38265b8, 0xc0012c7c20}, 0xc00087e480, 0xc0008f18f0)
	>  	/go/pkg/mod/buf.build/gen/go/k8sgpt-ai/k8sgpt/grpc/go@v1.4.0-20240720172138-1b9bcd834f17.2/schema/v1/schemav1grpc/server-service_grpc.pb.go:140 +0x143
	>  google.golang.org/grpc.(*Server).processUnaryRPC(0xc0001af400, {0x38265b8, 0xc0012c7b90}, {0x3836e40, 0xc000e7af00}, 0xc000e93d40, 0xc000b46840, 0x534b240, 0x0)
	>  	/go/pkg/mod/google.golang.org/grpc@v1.64.1/server.go:1379 +0xdf8
	>  google.golang.org/grpc.(*Server).handleStream(0xc0001af400, {0x3836e40, 0xc000e7af00}, 0xc000e93d40)
	>  	/go/pkg/mod/google.golang.org/grpc@v1.64.1/server.go:1790 +0xe8b
	>  google.golang.org/grpc.(*Server).serveStreams.func2.1()
	>  	/go/pkg/mod/google.golang.org/grpc@v1.64.1/server.go:1029 +0x8b
	>  created by google.golang.org/grpc.(*Server).serveStreams.func2 in goroutine 218
	>  	/go/pkg/mod/google.golang.org/grpc@v1.64.1/server.go:1040 +0x125





{"level":"info","ts":1723981397.697785,"caller":"server/log.go:50","msg":"request completed","duration_ms":23013,"method":"/schema.v1.ServerService/Analyze","request":"backend:\"amazonbedrock\"  anonymize:true  language:\"english\"  max_concurrency:10  output:\"json\"","remote_addr":"192.168.114.216:51674"}
{"level":"info","ts":1723981450.9457319,"caller":"server/log.go:50","msg":"request completed","duration_ms":23009,"method":"/schema.v1.ServerService/Analyze","request":"backend:\"amazonbedrock\"  anonymize:true  language:\"english\"  max_concurrency:10  output:\"json\"","remote_addr":"192.168.114.216:39094"}
{"level":"info","ts":1723981504.265283,"caller":"server/log.go:50","msg":"request completed","duration_ms":23010,"method":"/schema.v1.ServerService/Analyze","request":"backend:\"amazonbedrock\"  anonymize:true  language:\"english\"  max_concurrency:10  output:\"json\"","remote_addr":"192.168.114.216:43154"}

no details shows on results

kubectl get results -n k8sgpt-operator-system -o json | grep details 
                "details": "",
                "details": "",
                "details": "",
                "details": "",
                "details": "",
                "details": "",

a detailed json output of result

kubectl get results -n k8sgpt-operator-system defaulttest -o json 
{
    "apiVersion": "core.k8sgpt.ai/v1alpha1",
    "kind": "Result",
    "metadata": {
        "creationTimestamp": "2024-08-18T11:52:10Z",
        "generation": 1,
        "labels": {
            "k8sgpts.k8sgpt.ai/backend": "amazonbedrock",
            "k8sgpts.k8sgpt.ai/name": "k8sgpt-bedrock",
            "k8sgpts.k8sgpt.ai/namespace": "k8sgpt-operator-system"
        },
        "name": "defaulttest",
        "namespace": "k8sgpt-operator-system",
        "resourceVersion": "119423775",
        "uid": "0c74c9de-6d62-44db-9590-d9fb8b13eef7"
    },
    "spec": {
        "backend": "amazonbedrock",
        "details": "",
        "error": [
            {
                "text": "Back-off pulling image \"ngixn2\""
            }
        ],
        "kind": "Pod",
        "name": "default/test",
        "parentObject": ""
    },
    "status": {
        "lifecycle": "historical"
    }
}

@JuHyung-Son
Copy link
Contributor

@angelaaaaaaaw thanks for detail set up. Can you add k8sgpt operator logs too?

@amsuggs37
Copy link

amsuggs37 commented Sep 6, 2024

I've also seen this very problem using local-ai as the backend.

I noticed that @AlexsJones reported everything was working as expected using k8sgpt-operator version 0.1.6.
I was using k8sgpt-operator version 0.1.7 and when I downgraded to 0.1.6 things started working for me as well.

I noticed that the logs from version 0.1.6 are different from 0.1.7. Below are my logs using k8sgpt-operator version 0.1.6:

{"level":"info","ts":1725629382.1934505,"caller":"server/log.go:50","msg":"request completed","duration_ms":13330,"method":"/schema.v1.ServerService/Analyze","request":"backend: \"localai\"  explain:true nocache:true  language:\"english\"  max_concurrency:10  output:\"json\"","remote_addr":"10.42.1.42:59100"}

Notice how in the log above, the explain:true parameter is present. That is NOT the case with the latest version 0.1.7.
It seems to me that perhaps something has changed between versions 0.1.6 and 0.1.7 that omits the use of the explain:true when "spec.ai.enabled" is set to true on the K8sGPT Custom Resource.

My K8sGPT Custom Resource and LocalAI setup both remained constant between my two tests. The only thing that changed, and fixed my issue, was downgrading the operator from 0.1.7 to 0.1.6.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

8 participants