Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CloudWatchAgent on Windows fails with "imds retry client will retry 1 times" #871

Closed
YoungLee9853 opened this issue Sep 29, 2023 · 13 comments
Labels
os/windows Windows

Comments

@YoungLee9853
Copy link

Describe the bug
In EC2 UserData, I am trying to execute "& 'C:\Program Files\Amazon\AmazonCloudWatchAgent\amazon-cloudwatch-agent-ctl.ps1' -a fetch-config -m ec2 -s -c file:C:\tmp\cloud-watch-agent-config.json" to install the cloud watch agent configuration on Windows Server 2022 host on start up. The UserData fails with "PS>TerminatingError(config-downloader.exe): "The running command stopped because the preference variable "ErrorActionPreference" or common parameter is set to Stop: 2023/09/29 01:07:56 I! imds retry client will retry 1 times" message.

Steps to reproduce

  1. Launch new Windows Server EC2 host with UserData including "& 'C:\Program Files\Amazon\AmazonCloudWatchAgent\amazon-cloudwatch-agent-ctl.ps1' -a fetch-config -m ec2 -s -c file:C:\tmp\cloud-watch-agent-config.json""
  2. Wait for the user data script to fail...
  • To see the exact failure message, you will need to trap the exception and keep the transcript.

What did you expect to see?
AmazonCloudWatchAgent boots up with the provided configuration.

What did you see instead?
User data script fails.

What version did you use?
Version: (e.g., v1.247350.0, etc)

What config did you use?
Config: (e.g. the agent json config file)

Environment
Windows Server 2022

Additional context
Add any other context about the problem here.

@YoungLee9853
Copy link
Author

YoungLee9853 commented Sep 29, 2023

Looking at previous issues, it seems like the issue is related to #516 .

Related - https://github.com/aws/amazon-cloudwatch-agent/blob/main/internal/retryer/imdsretryer.go#L30

@sethAmazon
Copy link
Contributor

Can you post your ami. We have tests for win-2022 where the agent does start.

@AllanBenson001
Copy link

Same here. This worked last week.

AMI ID - ami-00c896faf296575ab

$config = @"
{
    "logs": {
        "logs_collected": {
            "windows_events": {
                "collect_list": [
                    {
                        "event_format": "xml",
                        "event_levels": [
                            "VERBOSE",
                            "INFORMATION",
                            "WARNING",
                            "ERROR",
                            "CRITICAL"
                        ],
                        "event_name": "Application",
                        "log_group_name": "/my/logs",
                        "log_stream_name": "{instance_id}/event-logs/application"
                    }
                ]
            }
        }
    }
}
"@

$installDirectory = "c:\temp\cw"
$downloadDirectory = $installDirectory 
$logsDirectory = $installDirectory 
    
New-Item -ItemType "directory" -Path $installDirectory

Set-Location -Path $installDirectory

$config | Set-Content -Path "$installDirectory/config.json"

Write-host "Installing Cloudwatch Agent"
$cwAgentInstaller = "$downloadDirectory\amazon-cloudwatch-agent.msi"
$cwAgentInstallPath = "C:\Program Files\Amazon\AmazonCloudWatchAgent"
Invoke-WebRequest "https://s3.amazonaws.com/amazoncloudwatch-agent/windows/amd64/latest/amazon-cloudwatch-agent.msi" -OutFile $cwAgentInstaller
Start-Process -FilePath msiexec -Args "/i $cwAgentInstaller /l*v $logsDirectory\installCWAgentLog.log /qn" -Verb RunAs -Wait

Write-host "Load config"
& "$cwAgentInstallPath\amazon-cloudwatch-agent-ctl.ps1" -a fetch-config -m ec2 -s -c file:"$installDirectory/config.json"

Output:

Load config
****** processing amazon-cloudwatch-agent ******
I! Trying to detect region from ec2
config-downloader.exe : 2023/10/02 11:52:44 I! imds retry client will retry 1 times
At C:\Program Files\Amazon\AmazonCloudWatchAgent\amazon-cloudwatch-agent-ctl.ps1:304 char:9
+         & $CWAProgramFiles\config-downloader.exe --output-dir "${JSON ...
+         ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
    + CategoryInfo          : NotSpecified: (2023/10/02 11:5...l retry 1 times:String) [], RemoteException
    + FullyQualifiedErrorId : NativeCommandError

@4wuyan
Copy link

4wuyan commented Oct 3, 2023

I am also seeing the same.

I noticed:

  1. C:\'Program Files'\Amazon\AmazonCloudWatchAgent\amazon-cloudwatch-agent-ctl.ps1 -a fetch-config -m ec2 -c file:\temp\cloudwatch-agent-config.json works fine when I later connect to the EC2 and run this command manually. But it will fail when running as a part of user data.
  2. It only starts to happen this week. The same AMI that used to be fine last week is not ok this week. When I inspect the good EC2 from last week, I find the cloudwatch agent downloaded from https://s3.amazonaws.com/amazoncloudwatch-agent/windows/amd64/latest/amazon-cloudwatch-agent.msi in my user data script is a different version.

Last week, https://s3.amazonaws.com/amazoncloudwatch-agent/windows/amd64/latest/amazon-cloudwatch-agent.msi returns Amazon CloudWatch Agent 1.300026.3 (2023-08-21), while this week, it's Amazon CloudWatch Agent 1.300028.1 (2023-09-18). Hence, I suspect it's an issue related to the latest version(s) of cloudwatch agent.

PS C:\Program Files\Amazon\AmazonCloudWatchAgent> cat .\CWAGENT_VERSION
1.300026.3b189

PS C:\Program Files\Amazon\AmazonCloudWatchAgent> cat .\RELEASE_NOTES -head 20
========================================================================
Amazon CloudWatch Agent 1.300026.3 (2023-08-21)
========================================================================

Bug fixes:
* Fix credential chain for new components
* Fix metric renaming for Windows Performance Counters
* Fix log stream name translation for EMF on ECS
* Reduce RPM installation time

========================================================================
Amazon CloudWatch Agent 1.300026.2 (2023-08-10)
========================================================================

Bug fixes:
* Fix EMF log corruption when multiple clients are sending concurrently
* Drop invalid EMF logs
* Allow environment variables in OTEL config
* Revert credential chain when running as a service to prioritize instance role
PS C:\Program Files\Amazon\AmazonCloudWatchAgent> cat .\CWAGENT_VERSION
1.300028.1b210

PS C:\Program Files\Amazon\AmazonCloudWatchAgent> cat .\RELEASE_NOTES -head 20
========================================================================
Amazon CloudWatch Agent 1.300028.1 (2023-09-18)
========================================================================

Bug fixes:
* Fix windows event logs to start only once

========================================================================
Amazon CloudWatch Agent 1.300028.0 (2023-09-11)
========================================================================

Bug fixes:
* Fix file pattern matching to support glob wildcard characters (!{})
* Use LogStreamName instead of ServiceName in token replacement for Prometheus
* Add fallback shared config files for credential ordering to maintain previous AWS SDK behavior
* Drop unsupported NaN, Inf, and out of range values

Enhancements:
* Try using IMDSv2 only first before using client with fallback
* Add support for configurable IMDS retries in the common-config.toml

References:

@ryanwilliams83
Copy link

ryanwilliams83 commented Oct 3, 2023

I put this together to demonstrate the problem and assist with troubleshooting.

https://github.com/ryanwilliams83/CloudWatchAgent-871

image

MSI Package Version 1.4.37882 (https://github.com/ryanwilliams83/CloudWatchAgent-871/raw/main/assets/amazon-cloudwatch-agent-1.4.37882.msi)
image

MSI Package Version 1.4.37884 + latest
image

@sethAmazon
Copy link
Contributor

@AllanBenson001

Is this issue only when running this command in user data or when running in powershell. I ran this command in powershell after agent started and it worked. I want to make this issue only happens when starting with user data.

@sethAmazon
Copy link
Contributor

Okay so I was able to reproduce and fix the issue. Can you please take this version of the agent until we finish the patch release.

@sethAmazon
Copy link
Contributor

wget https://cloudwatch-agent-integration-bucket.s3.us-west-2.amazonaws.com/integration-test/packaging/98f45c9376c43f03d4968ee57d1b464b884f2303/amazon-cloudwatch-agent.msi

@okankoAMZ okankoAMZ added the os/windows Windows label Oct 4, 2023
@okankoAMZ
Copy link
Contributor

Closing the issue since root cause is found, the issue is mitigated.

@4wuyan
Copy link

4wuyan commented Oct 5, 2023

Thanks for the effort! Glad to see it's fixed.

BTW, for those who are interested in why it happens only in user data execution, but not in interactive powershell:

I believe it's a long lasting issue when powershell redirects error stream.

User data is executed in this way, with both 1> and 2>:

C:\Windows\System32\WindowsPowerShell\v1.0\powershell.exe -Command {
  $env:EC2Launch_Execution_Mode = 'attached';
  . 'C:\Windows\system32\config\systemprofile\AppData\Local\Temp\EC2Launch123\UserScript.ps1' 1> 'C:\Windows\system32\config\systemprofile\AppData\Local\Temp\EC2Launch123\output.tmp' 2> 'C:\Windows\system32\config\systemprofile\AppData\Local\Temp\EC2Launch123\err.tmp';
  exit $LASTEXITCODE
}

And I do find if I manually run it in powershell with 2>, it fails too.

PS C:\temp> C:\'Program Files'\Amazon\AmazonCloudWatchAgent\amazon-cloudwatch-agent-ctl.ps1 -a fetch-config -m ec2 -c file:\temp\cloudwatch-agent-config.json 2>tmp
****** processing amazon-cloudwatch-agent ******
I! Trying to detect region from ec2
D! [EC2] Found active network interface
Successfully fetched the config and saved in C:\ProgramData\Amazon\AmazonCloudWatchAgent\Configs\file_cloudwatch-agent-config.json.tmp
config-downloader.exe : 2023/10/05 12:41:18 I! imds retry client will retry 1 times
At C:\Program Files\Amazon\AmazonCloudWatchAgent\amazon-cloudwatch-agent-ctl.ps1:304 char:9
+         & $CWAProgramFiles\config-downloader.exe --output-dir "${JSON ...
+         ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
    + CategoryInfo          : NotSpecified: (2023/10/05 12:4...l retry 1 times:String) [], RemoteException
    + FullyQualifiedErrorId : NativeCommandError

@AllanBenson001
Copy link

When will this fix be released so that I can start using the latest version again?

@ymtaye
Copy link
Contributor

ymtaye commented Oct 11, 2023

The fix for this issue is currently released in the latest CloudWatch Agent, please try retrieving it by using the link below. Thanks!
https://amazoncloudwatch-agent.s3.amazonaws.com/windows/amd64/latest/amazon-cloudwatch-agent.msi

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
os/windows Windows
Projects
None yet
Development

No branches or pull requests

7 participants