Memory leaks #26130

RealFatCat · 2022-08-04T14:32:31Z

Community Note

Please vote on this issue by adding a 👍 reaction to the original issue to help the community and maintainers prioritize this request
Please do not leave "+1" or other comments that do not add relevant new information or questions, they generate extra noise for issue followers and do not help prioritize the request
If you are interested in working on this issue or have submitted a pull request, please leave a comment

Terraform CLI and Terraform AWS Provider Version

$ terraform --version
Terraform v1.2.6
on linux_amd64

terraform-provider-aws_v4.24.0_x5

Affected Resource(s)

We are creating lots of the resources, to create ec2 instances, eks clusters, etc.
So, for ec2 instance, for example:

aws_instance
aws_internet_gateway
aws_network_interface
aws_route
aws_route_table
aws_route_table_association
aws_security_group
aws_security_group_rule
aws_subnet
aws_vpc

Expected Behavior

No memory leaks.

Actual Behavior

Memory leaks.

Steps to Reproduce

Provider should be running.
Constatntly run commands like:
terraform apply -refresh-only -auto-approve -input=false -lock=false -json
terraform plan -refresh=false -input=false -lock=false -json

Important Factoids

At first, I'm not quite sure, that this is the right place for this issue, but I'll try to explain what we are doing.

We run crossplane provider-jet-aws in k8s cluster.
tl;dr; crossplane-provider transforms k8s manifests of aws resources to terraform configs and applies them.

In the pod, crossplane-provider starts terraform-provider-aws.
After that crossplane-provider just runs terraform CLI commands, like terraform init, terraform plan, terraform apply, etc.

So, terraform-provider-aws is constantly running.

After some time (4-5 hours), the pod is killed with OOM, and the reason is terraform-provider-aws.

For example, current ps axu in the pod:

$ ps aux
PID   USER     TIME  COMMAND
    1 1001     16:25 crossplane-provider -d -s 30s --terraform-version 1.2.6 --terraform-provider-version 4.24.0 --terraform-provider-source hashicorp/aws --max-reconcile-rate 200 --leader-election
15793 1001      0:00 sh
26435 1001      0:00 sh
26522 1001      0:00 sh
26696 1001      0:01 terraform apply -refresh-only -auto-approve -input=false -lock=false -json
26708 1001      0:01 terraform apply -refresh-only -auto-approve -input=false -lock=false -json
26721 1001      0:01 terraform apply -refresh-only -auto-approve -input=false -lock=false -json
26726 1001      0:01 terraform apply -refresh-only -auto-approve -input=false -lock=false -json
26744 1001      0:00 terraform plan -refresh=false -input=false -lock=false -json
26758 1001      0:00 terraform plan -refresh=false -input=false -lock=false -json
26768 1001      0:00 terraform apply -refresh-only -auto-approve -input=false -lock=false -json
26782 1001      0:00 terraform apply -refresh-only -auto-approve -input=false -lock=false -json
26790 1001      0:00 terraform apply -refresh-only -auto-approve -input=false -lock=false -json
26808 1001      0:00 terraform apply -refresh-only -auto-approve -input=false -lock=false -json
26820 1001      0:00 ps aux
28712 1001      2h25 /terraform/provider-mirror/registry.terraform.io/hashicorp/aws/4.24.0/linux_amd64/terraform-provider-aws_v4.24.0_x5

According to the pid of terraform-provider-aws, something already went wrong, and the process was restarted by the crossplane-provider.

Anyway, here is RssAnon of terraform-provider-aws:

$ cat /proc/28712/status  |grep -i rssanon
RssAnon:	 6974484 kB

Also, I've managed to run terraform-provider-aws with pprof, so here some files.

heap.gz

I've found some issues with memory leaks in grpc-go repo, but they are all closed. So, I decided to post here.

Thanks in advance for any help.

The text was updated successfully, but these errors were encountered:

RealFatCat · 2022-10-10T14:15:14Z

Ok, it seems like I understand where this leaks come from.

Crossplane runs terraform-provider-aws in test mode to keep it running.
It means, that c.process is not set, and the value is nil here:
https://github.com/hashicorp/go-plugin/blob/master/client.go#L860

When client.Kill() runs, it almost immediately checks c.process and in our case, just returns:
https://github.com/hashicorp/go-plugin/blob/master/client.go#L414

So, connections are not closed => memory leaks at provider.

The easy way to reproduce is to run terraform-provider-aws --debug, and constantly run terraform plan against provider in debug mode.

github-actions · 2024-09-29T17:42:06Z

Marking this issue as stale due to inactivity. This helps our maintainers find and focus on the active issues. If this issue receives no comments in the next 30 days it will automatically be closed. Maintainers can also remove the stale label.

If this issue was automatically closed and you feel this issue should be reopened, we encourage creating a new issue linking back to this one for added context. Thank you!

github-actions · 2024-12-05T02:21:03Z

I'm going to lock this issue because it has been closed for 30 days ⏳. This helps our maintainers find and focus on the active issues.
If you have found a problem that seems similar to this, please open a new issue and complete the issue template so we can capture all the details necessary to investigate further.

github-actions bot added needs-triage Waiting for first response or review from a maintainer. service/ec2 Issues and PRs that pertain to the ec2 service. service/vpc Issues and PRs that pertain to the vpc service. labels Aug 4, 2022

justinretzolk added provider Pertains to the provider itself, rather than any interaction with AWS. and removed needs-triage Waiting for first response or review from a maintainer. labels Aug 30, 2022

This was referenced Oct 10, 2022

client.Kill() should close connections in Test mode hashicorp/go-plugin#215

Open

Using "Test":true in TF_REATTACH_PROVIDER cause memory leaks crossplane/terrajet#305

Open

muvaf mentioned this issue Oct 18, 2022

Re-enable shared TF provider usage crossplane-contrib/provider-upjet-aws#86

Closed

ulucinar mentioned this issue Mar 9, 2023

Load tests with provider-aws crossplane-contrib/provider-upjet-aws#576

Closed

github-actions bot added the stale Old or inactive issues managed by automation, if no further action taken these will get closed. label Sep 29, 2024

github-actions bot closed this as not planned Won't fix, can't repro, duplicate, stale Nov 3, 2024

github-actions bot locked as resolved and limited conversation to collaborators Dec 5, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Memory leaks #26130

Memory leaks #26130

RealFatCat commented Aug 4, 2022

RealFatCat commented Oct 10, 2022

github-actions bot commented Sep 29, 2024

github-actions bot commented Dec 5, 2024

Memory leaks #26130

Memory leaks #26130

Comments

RealFatCat commented Aug 4, 2022

Community Note

Terraform CLI and Terraform AWS Provider Version

Affected Resource(s)

Expected Behavior

Actual Behavior

Steps to Reproduce

Important Factoids

RealFatCat commented Oct 10, 2022

github-actions bot commented Sep 29, 2024

github-actions bot commented Dec 5, 2024