Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug]: Terraform apply command freezes during AWS provider initialization #39523

Open
law opened this issue Sep 27, 2024 · 31 comments
Open

[Bug]: Terraform apply command freezes during AWS provider initialization #39523

law opened this issue Sep 27, 2024 · 31 comments
Labels
bug Addresses a defect in current functionality. go Pull requests that update Go code needs-triage Waiting for first response or review from a maintainer. prioritized Part of the maintainer teams immediate focus. To be addressed within the current quarter. regression Pertains to a degraded workflow resulting from an upstream patch or internal enhancement.

Comments

@law
Copy link

law commented Sep 27, 2024

Terraform Core Version

1.5.7

AWS Provider Version

5.69.0

Affected Resource(s)

n/a

Expected Behavior

'terraform apply' should continue, and ask me for confirmation before applying changes

Actual Behavior

When running `terraform apply, the process freezes during the initialization of the AWS provider. The command does not complete and requires manual termination.

Relevant Error/Panic Output Snippet

2024-09-27T13:46:40.034-0600 [DEBUG] provider.terraform-provider-aws_v5.69.0_x5: assertion failed [arm_interval().contains(address)]: code fragment does not contain the given arm address
2024-09-27T13:46:40.034-0600 [DEBUG] provider.terraform-provider-aws_v5.69.0_x5: (CodeFragmentMetadata.cpp:48 instruction_extents_for_arm_address)

Terraform Configuration Files

https://gist.github.com/law/62b9c75214c18a015c37f16285a13ba4

Steps to Reproduce

  1. Run TF_LOG=debug terraform apply (or terragrunt init) when using the above provider
  2. Observe that the .terraform/providers/registry.terraform.io/hashicorp/aws/5.69.0/darwin_amd64/terraform-provider-aws_v5.69.0_x5 process hangs indefinitely

Debug Output

https://gist.github.com/law/5271d0e0cd052d438a194eb50c11da63

Panic Output

No response

Important Factoids

No response

References

No response

Would you like to implement a fix?

None

@law law added the bug Addresses a defect in current functionality. label Sep 27, 2024
Copy link

Community Note

Voting for Prioritization

  • Please vote on this issue by adding a 👍 reaction to the original post to help the community and maintainers prioritize this request.
  • Please see our prioritization guide for information on how we prioritize.
  • Please do not leave "+1" or other comments that do not add relevant new information or questions, they generate extra noise for issue followers and do not help prioritize the request.

Volunteering to Work on This Issue

  • If you are interested in working on this issue, please leave a comment.
  • If this would be your first contribution, please review the contribution guide.

@github-actions github-actions bot added the needs-triage Waiting for first response or review from a maintainer. label Sep 27, 2024
@law
Copy link
Author

law commented Sep 27, 2024

I forgot to add, reverting to provider 5.68.0 works like a champ.

@alexpodr
Copy link

alexpodr commented Sep 27, 2024

I have the same issue with v5.69.0.
And with v5.68.0 - all Ok.

UPD: Chip - Apple M1 Pro

@elaigor
Copy link

elaigor commented Sep 30, 2024

What we found for our team is that only x86 provider did not work as it should. Mac with intel or arm doesn't matter, if you use 5.69.0_x86 (with Rosseta on arm) there will be some kind of problems.

@jotasixto
Copy link

jotasixto commented Sep 30, 2024

I have the same issue with v5.69.0(hashicorp/aws/5.69.0/darwin_amd64/terraform-provider-aws_v5.69.0_x5) and chip Apple M3 with Rosetta 2.

@JnMik
Copy link

JnMik commented Sep 30, 2024

Same issue on Apple M3

@LozanoMatheus
Copy link

LozanoMatheus commented Sep 30, 2024

It seems to be an issue only with Apple M chips, the version v5.69.0 works fine on my Linux/amd64.

Also, it works fine when I use the same version in the CDKTF. Quite odd.

@cailen
Copy link

cailen commented Sep 30, 2024

I was trying to figure this out for 3 hours today what it was. Was convinced it was a provider and here we are. Thanks for whoever reported this.

@cailen
Copy link

cailen commented Oct 1, 2024

@cailen
Copy link

cailen commented Oct 1, 2024

It looks like the darwin_arm64 version is not being properly downloaded?
Screenshot 2024-10-01 at 7 28 07 AM

@cailen
Copy link

cailen commented Oct 1, 2024

This led me down quite the rabbit hole wondering why on Apple M2 I am only getting amd64. It turns out (probably during one of the many upgrades from 0.11/0.12 darwin_arm64-unsupported versions), I may have set the architecture to amd64 and it downloaded not only the old version but the version we upgrade to (1.5.7) in amd64.

% export TFENV_ARCH=arm64

% tfenv install 1.5.7
Terraform v1.5.7 is already installed

% terraform version      
Terraform v1.5.7
on darwin_amd64
+ provider registry.terraform.io/datadog/datadog v3.44.1
+ provider registry.terraform.io/hashicorp/aws v5.68.0
+ provider registry.terraform.io/hashicorp/http v3.4.5
+ provider registry.terraform.io/hashicorp/null v3.2.3
+ provider registry.terraform.io/hashicorp/random v3.6.3
+ provider registry.terraform.io/hashicorp/time v0.12.1
+ provider registry.terraform.io/mongodb/mongodbatlas v1.20.0

Your version of Terraform is out of date! The latest version
is 1.9.5. You can update by downloading from https://www.terraform.io/downloads.html

% tfenv uninstall 1.5.7
Uninstall Terraform v1.5.7
Terraform v1.5.7 is successfully uninstalled

% terraform version    
version '1.5.7' is not installed (set by /repo/.terraform-version). Installing now as TFENV_AUTO_INSTALL==true
Installing Terraform v1.5.7
Downloading release tarball from https://releases.hashicorp.com/terraform/1.5.7/terraform_1.5.7_darwin_arm64.zip
######################################################################################################################## 100.0%
Downloading SHA hash file from https://releases.hashicorp.com/terraform/1.5.7/terraform_1.5.7_SHA256SUMS
Not instructed to use Local PGP (/opt/homebrew/Cellar/tfenv/3.0.0/use-{gpgv,gnupg}) & No keybase install found, skipping OpenPGP signature verification
Archive:  /var/folders/6w/zd8qsqzn1r7g4b00h3rk79v80000gp/T/tfenv_download.XXXXXX.9YGFyexnCK/terraform_1.5.7_darwin_arm64.zip
  inflating: /opt/homebrew/Cellar/tfenv/3.0.0/versions/1.5.7/terraform  
Installation of terraform v1.5.7 successful. To make this your default version, run 'tfenv use 1.5.7'
Terraform v1.5.7
on darwin_arm64
+ provider registry.terraform.io/hashicorp/aws v5.68.0


Your version of Terraform is out of date! The latest version
is 1.9.5. You can update by downloading from https://www.terraform.io/downloads.html

terraform init -upgrade   

Initializing the backend...
Upgrading modules...
...

Initializing provider plugins...
- Finding hashicorp/aws versions matching ">= 2.23.0, >= 3.35.0, >= 4.0.0, >= 4.10.0, ~> 5.0"...

Terraform has been successfully initialized!

And now a darwin arm64 copy is showing up:

Screenshot 2024-10-01 at 7 43 09 AM

If using the amd64 version, I've always had issues where, like the other PRs I've linked, I've had to set GODEBUG=asyncpreemptoff=1

@jotasixto
Copy link

jotasixto commented Oct 1, 2024

This led me down quite the rabbit hole wondering why on Apple M2 I am only getting amd64. It turns out (probably during one of the many upgrades from 0.11/0.12 darwin_arm64-unsupported versions), I may have set the architecture to amd64 and it downloaded not only the old version but the version we upgrade to (1.5.7) in amd64.

% export TFENV_ARCH=arm64

% tfenv install 1.5.7
Terraform v1.5.7 is already installed

% terraform version      
Terraform v1.5.7
on darwin_amd64
+ provider registry.terraform.io/datadog/datadog v3.44.1
+ provider registry.terraform.io/hashicorp/aws v5.68.0
+ provider registry.terraform.io/hashicorp/http v3.4.5
+ provider registry.terraform.io/hashicorp/null v3.2.3
+ provider registry.terraform.io/hashicorp/random v3.6.3
+ provider registry.terraform.io/hashicorp/time v0.12.1
+ provider registry.terraform.io/mongodb/mongodbatlas v1.20.0

Your version of Terraform is out of date! The latest version
is 1.9.5. You can update by downloading from https://www.terraform.io/downloads.html

% tfenv uninstall 1.5.7
Uninstall Terraform v1.5.7
Terraform v1.5.7 is successfully uninstalled

% terraform version    
version '1.5.7' is not installed (set by /repo/.terraform-version). Installing now as TFENV_AUTO_INSTALL==true
Installing Terraform v1.5.7
Downloading release tarball from https://releases.hashicorp.com/terraform/1.5.7/terraform_1.5.7_darwin_arm64.zip
######################################################################################################################## 100.0%
Downloading SHA hash file from https://releases.hashicorp.com/terraform/1.5.7/terraform_1.5.7_SHA256SUMS
Not instructed to use Local PGP (/opt/homebrew/Cellar/tfenv/3.0.0/use-{gpgv,gnupg}) & No keybase install found, skipping OpenPGP signature verification
Archive:  /var/folders/6w/zd8qsqzn1r7g4b00h3rk79v80000gp/T/tfenv_download.XXXXXX.9YGFyexnCK/terraform_1.5.7_darwin_arm64.zip
  inflating: /opt/homebrew/Cellar/tfenv/3.0.0/versions/1.5.7/terraform  
Installation of terraform v1.5.7 successful. To make this your default version, run 'tfenv use 1.5.7'
Terraform v1.5.7
on darwin_arm64
+ provider registry.terraform.io/hashicorp/aws v5.68.0


Your version of Terraform is out of date! The latest version
is 1.9.5. You can update by downloading from https://www.terraform.io/downloads.html

terraform init -upgrade   

Initializing the backend...
Upgrading modules...
...

Initializing provider plugins...
- Finding hashicorp/aws versions matching ">= 2.23.0, >= 3.35.0, >= 4.0.0, >= 4.10.0, ~> 5.0"...

Terraform has been successfully initialized!

And now a darwin arm64 copy is showing up:

Screenshot 2024-10-01 at 7 43 09 AM If using the amd64 version, I've always had issues where, like the other PRs I've linked, I've had to set `GODEBUG=asyncpreemptoff=1`

I’m using an Apple M3 with tfenv configured to the amd64 architecture in my TFENV_ARCH environment variable. This is because, in the projects I work on with my team, not everyone uses Apple M3; most of them are on Linux. To stay aligned with the rest of the team, I always configure my Terraform binary to work in amd64. Therefore, in my case, I’m on Apple M3 and using Rosetta 2. This issue only started happening with the latest version of the AWS provider (5.69.0). With version 5.68.0, everything works perfectly, only get the error that @law mentioned with version 5.69.0.

@cailen I tried running the command with the environment variable you suggested (GODEBUG=asyncpreemptoff=1) along with TF_LOG="debug", as shown below:

TF_LOG="debug" GODEBUG=asyncpreemptoff=1 terraform plan

While this reduced the number of log lines, the issue persists and fails at the same elements. Below is an excerpt from the logs:

...
2024-10-01T14:33:27.861+0200 [DEBUG] created provider logger: level=debug
2024-10-01T14:33:27.861+0200 [INFO]  provider: configuring client automatic mTLS
2024-10-01T14:33:27.869+0200 [DEBUG] provider: starting plugin: path=.terraform/providers/registry.terraform.io/hashicorp/aws/5.69.0/darwin_amd64/terraform-provider-aws_v5.69.0_x5 args=[.terraform/providers/registry.terraform.io/hashicorp/aws/5.69.0/darwin_amd64/terraform-provider-aws_v5.69.0_x5]
2024-10-01T14:33:27.876+0200 [DEBUG] provider: plugin started: path=.terraform/providers/registry.terraform.io/hashicorp/aws/5.69.0/darwin_amd64/terraform-provider-aws_v5.69.0_x5 pid=51108
2024-10-01T14:33:27.876+0200 [DEBUG] provider: waiting for RPC address: path=.terraform/providers/registry.terraform.io/hashicorp/aws/5.69.0/darwin_amd64/terraform-provider-aws_v5.69.0_x5
2024-10-01T14:34:27.881+0200 [DEBUG] provider: plugin process exited: path=.terraform/providers/registry.terraform.io/hashicorp/aws/5.69.0/darwin_amd64/terraform-provider-aws_v5.69.0_x5 pid=51108 error="signal: killed"
...
2024-10-01T14:34:33.100+0200 [DEBUG] created provider logger: level=debug
2024-10-01T14:34:33.100+0200 [INFO]  provider: configuring client automatic mTLS
2024-10-01T14:34:33.108+0200 [DEBUG] provider: starting plugin: path=.terraform/providers/registry.terraform.io/hashicorp/template/2.2.0/darwin_amd64/terraform-provider-template_v2.2.0_x4 args=[.terraform/providers/registry.terraform.io/hashicorp/template/2.2.0/darwin_amd64/terraform-provider-template_v2.2.0_x4]
2024-10-01T14:34:33.113+0200 [DEBUG] provider: plugin started: path=.terraform/providers/registry.terraform.io/hashicorp/template/2.2.0/darwin_amd64/terraform-provider-template_v2.2.0_x4 pid=52965
2024-10-01T14:34:33.113+0200 [DEBUG] provider: waiting for RPC address: path=.terraform/providers/registry.terraform.io/hashicorp/template/2.2.0/darwin_amd64/terraform-provider-template_v2.2.0_x4
2024-10-01T14:34:34.598+0200 [INFO]  provider.terraform-provider-template_v2.2.0_x4: configuring server automatic mTLS: timestamp=2024-10-01T14:34:34.598+0200
2024-10-01T14:34:34.638+0200 [DEBUG] provider.terraform-provider-template_v2.2.0_x4: plugin address: address=/var/folders/s4/sp8pl52s6ynbzwm57pq5gmkh0000gn/T/plugin458334256 network=unix timestamp=2024-10-01T14:34:34.638+0200
2024-10-01T14:34:34.638+0200 [DEBUG] provider: using plugin: version=5
2024-10-01T14:34:34.692+0200 [DEBUG] provider.stdio: received EOF, stopping recv loop: err="rpc error: code = Unimplemented desc = unknown service plugin.GRPCStdio"
2024-10-01T14:34:34.692+0200 [DEBUG] No provider meta schema returned
2024-10-01T14:34:34.696+0200 [DEBUG] provider: plugin process exited: path=.terraform/providers/registry.terraform.io/hashicorp/template/2.2.0/darwin_amd64/terraform-provider-template_v2.2.0_x4 pid=52965
2024-10-01T14:34:34.696+0200 [DEBUG] provider: plugin exited
...

@cailen
Copy link

cailen commented Oct 1, 2024

@jotasixto The binaries are all made with the same code. I would think you could use the native copies (darwin arm64) to run anything locally on your computer and the other users could use darwin amd64 or whatever other flavor without issue. We do this where I work. Some are still on older Intel Macs. We also run things via Github Actions using Linux Amd64. I'm not at all discounting that it is broken, but you may be better off using the native version for your system unless there is no compatible one (like for very old versions of Terraform).

@jotasixto
Copy link

@jotasixto The binaries are all made with the same code. I would think you could use the native copies (darwin arm64) to run anything locally on your computer and the other users could use darwin amd64 or whatever other flavor without issue. We do this where I work. Some are still on older Intel Macs. We also run things via Github Actions using Linux Amd64. I'm not at all discounting that it is broken, but you may be better off using the native version for your system unless there is no compatible one (like for very old versions of Terraform).

@cailen Unfortunately, some of our legacy projects (which we are currently working on updating) use version 3 of the AWS provider, which doesn't have a compiled version for ARM to download. Therefore, I am unable to work with them locally on my machine. This is why I also have the Terraform binary configured to use darwin_amd64

@cailen
Copy link

cailen commented Oct 1, 2024

@jotasixto makes sense then! Are you sure they don't have ARM copies though? From 3.30.0 there are ARM versions for Darwin. https://releases.hashicorp.com/terraform-provider-aws/3.30.0/. Maybe you are stuck using a version less than 3.30.0, but it may be worth trying to upgrade to the latest 3.x version if you can. The ARM version of Terraform also runs a lot faster.

@jotasixto
Copy link

@jotasixto makes sense then! Are you sure they don't have ARM copies though? From 3.30.0 there are ARM versions for Darwin. https://releases.hashicorp.com/terraform-provider-aws/3.30.0/. Maybe you are stuck using a version less than 3.30.0, but it may be worth trying to upgrade to the latest 3.x version if you can. The ARM version of Terraform also runs a lot faster.

@cailen I apologize for the confusion earlier. I was replying from my phone at the time and recalling from memory, as it had been a few months since I last worked on this issue. Now that I’ve had the chance to check it again on my laptop, I can confirm that the problem was actually related to HashiCorp providers and not AWS providers.

╷
│ Error: Incompatible provider version
│
│ Provider registry.terraform.io/hashicorp/template v2.2.0 does not have a package available for your current platform, darwin_arm64.
│
│ Provider releases are separate from Terraform CLI releases, so not all providers are available for all platforms. Other versions of this provider may have different platforms supported.
╵

╷
│ Error: Incompatible provider version
│
│ Provider registry.terraform.io/hashicorp/local v1.4.0 does not have a package available for your current platform, darwin_arm64.
│
│ Provider releases are separate from Terraform CLI releases, so not all providers are available for all platforms. Other versions of this provider may have different platforms supported.
╵

╷
│ Error: Incompatible provider version
│
│ Provider registry.terraform.io/hashicorp/null v2.1.2 does not have a package available for your current platform, darwin_arm64.
│
│ Provider releases are separate from Terraform CLI releases, so not all providers are available for all platforms. Other versions of this provider may have different platforms supported.
╵

I should have verified this before responding.

However, this issue is causing a similar kind of blockage in legacy projects, as I can't remove the providers without following the proper migration process. Additionally, using the darwin_arm64 binary in my case is not a viable solution to work on the migration of these projects.

Thank you for your understanding, and I appreciate your suggestion!

@claytonolley
Copy link

Since Terraform Cloud runners are all x64, I lock my arch to that anyway on my M macs... helps with the lock file hashes as well. Maybe this is overkill but until this issue it's been working well for me.

@LozanoMatheus
Copy link

Looks like these projects have had similar errors recently:

If I'm not mistaken, in the past Pulumi use terraform providers in the end, not sure if they still do it for some cases like this. If so, could be it. I'm not sure about stt 🤔

It seems to be an issue only with Apple M chips, the version v5.69.0 works fine on my Linux/amd64.

Also, it works fine when I use the same version in the CDKTF. Quite odd.

And about this, I just double checked and that CDKTF was using v5.59.0, I guess I was a bit too sleepy at that time 😅 . I got the same issue with the version v5.69.0.

@cailen
Copy link

cailen commented Oct 2, 2024

I just unplugged my laptop for the day and noticed how hot my macbook is. Turns out these bad AWS providers from this morning were still running!
Screenshot 2024-10-01 at 8 15 36 PM

@mmadrono
Copy link

mmadrono commented Oct 4, 2024

I have the same problem, I think, with the provider version v5.70.0 and earlier, I use a MAC IOS 15.0.1 - Chip Apple M2 - terraform version 0.14.11 and 1.9.7 and there is no way to run terraform without problems:
image
image
image
image
image
and with this error all the time, and if it does not give this error the process hangs, increasing the consumption of cpu and memory without success, and it is necessary to proceed to kill the process with what entails the blockade in dynamoDB and it is impossible to deploy infra in AWS and it is fundamental for my work.

@rmayore
Copy link

rmayore commented Oct 6, 2024

Same issue
Macbook Pro M1
Terraform v1.5.7
Provider v5.70

Downgrade to v5.68 fixed the issue. Also I want to emphasize on a comment above on how hard the problematic v5.70 providers hit your CPU and memory

@cailen
Copy link

cailen commented Oct 7, 2024

I find it crazy that this has not been triaged yet. What is going on, maintainers? @marcosentino @breathingdust @justinretzolk

@mars64
Copy link

mars64 commented Oct 7, 2024

Just chiming in: back-pinning to v5.68 also seems to have worked for me.

Macbook Pro M3
Terraform v1.8.5
Provider v5.70

And the 5.70 provider binary was indeed running in the background at 90+ usage.

@ewbankkit ewbankkit added regression Pertains to a degraded workflow resulting from an upstream patch or internal enhancement. go Pull requests that update Go code labels Oct 7, 2024
@terraform-aws-provider terraform-aws-provider bot added the prioritized Part of the maintainer teams immediate focus. To be addressed within the current quarter. label Oct 7, 2024
@ewbankkit
Copy link
Contributor

Could anyone who is experiencing this with v5.69.0 or v5.70.0 see if they have the same problem with v5.65.0 or v5.66.0?
These are all versions compiled with Go 1.23 whereas v5.67.0 and v5.68.0 were compiled with Go 1.22.
Thanks.

@mars64
Copy link

mars64 commented Oct 7, 2024

Could anyone who is experiencing this with v5.69.0 or v5.70.0 see if they have the same problem with v5.65.0 or v5.66.0? These are all versions compiled with Go 1.23 whereas v5.67.0 and v5.68.0 were compiled with Go 1.22. Thanks.

In my case (M3, Terraform 1.8.5), provider v5.66 and v5.65 both appear to work as expected -- I can successfully initialize the provider during plan and apply, and the binary appears to exit as expected.

@LozanoMatheus
Copy link

I got the same as mars64 (M3, Terraform 1.9.7, 1.8.5, 17.5) using both providers and it seems all good, but there is one thing I found interesting.

When I tried to run multiple projects in parallel (4x), I did get a similar behaviour as the ones from versions v5.69.0 and v5.70.0. It froze for a little while (~1min) when running init and plan, sometimes it failed as well. But I didn't get any running process once the commands were finished.

My guess is there is some sort of a lock that prevents TF from spawning (too many - ~5 maybe) new processes, so since the versions v5.69.0 and v5.70.0 keep the process running after the command finishes, TF can't run again and cause the freeze. I also saw that the zombie processes took a while to finish when running kill <PID>.

@chmurray-cisco
Copy link

Also impacted me. Running on M1 MacBook. AWS provider 5.69 and 5.70 would never work and were taking 80%+ CPU constantly. Dropping back to 5.68 worked, but another workaround worked too - use the amd64 provider (5.70) and set GODEBUG=asyncpreemptoff=1.

Seems that 5.69+ is currently not good on Apple Silicon.

@lirlirlirlir
Copy link

Same here. M3, terraform 1.7.3, AWS providers 5.69.0 and 5.70.0 both hang during plan and apply (noticed multiple terraform-provider-aws_v5.70.0_x5 processes keep running) as described above. Downgrading to 5.68.0 works well for me. I have the same outcome as @mars64 described with 5.65.0 and 5.66.0.
Using the arm version of terraform is no option for me as well due to the same reasons @jotasixto mentioned, being dependent on other providers which do not support darwin_arm64.

@StvnWthrsp
Copy link

Same here on M1. Terraform 1.9.2, though it doesn't seem related to Terraform version. Both 5.69.0 and 5.70.0 version of the AWS provider will intermittently result in a hang during the plan. Sometimes it returns a timeout, sometimes the entire shell will hang, and sometimes the plan will successfully run. Pinning 5.68.0 or lower does resolve the issue, but there is specific functionality I am looking to use in version 5.69.0+.

@dim13
Copy link

dim13 commented Oct 8, 2024

Also a confirmation on M2 machine with TF 1.9.5: using >=5.69.0 results in timeout. Downgrade to 5.68.0 "resolves" the issue.

At the same time on a different M1 machine with TF 1.9.7 and 5.70.0 works just as expected.

@ewbankkit
Copy link
Contributor

Relates golang/go#68485.
Relates hashicorp/terraform#27350.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Addresses a defect in current functionality. go Pull requests that update Go code needs-triage Waiting for first response or review from a maintainer. prioritized Part of the maintainer teams immediate focus. To be addressed within the current quarter. regression Pertains to a degraded workflow resulting from an upstream patch or internal enhancement.
Projects
None yet
Development

No branches or pull requests