Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

(aws-appsync) Upgrading from v1.70.0 to 1.85.0 resulted in AppSync GraphQL endpoints not being wired-up correctly #12635

Closed
asnaseer-resilient opened this issue Jan 21, 2021 · 19 comments · Fixed by #12973
Assignees
Labels
@aws-cdk/aws-appsync Related to AWS AppSync bug This issue is a bug. effort/small Small work item – less than a day of effort p2

Comments

@asnaseer-resilient
Copy link

We had an existing, working AppSync GraphQL service that was deployed using CDK v1.70.0.

I reviewed all the release notes from v1.70.0 to v1.85.0 yesterday and did not spot anything that would cause a breaking change to this deployment. I therefore upgraded to CDK v1.85.0 yesterday but found that it had not wired-up any of the Query or Mutation Resolvers to the Lambda Data Source that we use.

I then reverted to v1.70.0 but found that it had not fixed the issue I spotted. I finally undeployed and then re-deployed to get the service back to normality.

Could this be a bug in the CDK or have I missed something in the release notes that I am supposed to follow first?

Reproduction Steps

Described above.

What did you expect to happen?

My AppSync GraphQL service to be deployed correctly.

What actually happened?

None of our Query or Mutation Resolvers to the Lambda Data Source that we use were wired-up.

Environment

  • CDK CLI Version : v1.85.0
  • Framework Version: v1.85.0
  • Node.js Version: 12.14.1
  • OS : Run in CircleCI using image circleci/node:12.14.1
  • Language (Version): TypeScript (4.1.3)

Other


This is 🐛 Bug Report

@asnaseer-resilient asnaseer-resilient added bug This issue is a bug. needs-triage This issue or PR still needs to be triaged. labels Jan 21, 2021
@NGL321 NGL321 changed the title Upgrading from v1.70.0 to 1.85.0 resulted in AppSync GraphQL endpoints not being wired-up correctly (aws-appsync) Upgrading from v1.70.0 to 1.85.0 resulted in AppSync GraphQL endpoints not being wired-up correctly Jan 25, 2021
@github-actions github-actions bot added the @aws-cdk/aws-appsync Related to AWS AppSync label Jan 25, 2021
@pierre-vr
Copy link

pierre-vr commented Jan 26, 2021

We encountered the same issue when upgrading from 1.74.0 to 1.83.0. Our resolvers where not attached anymore (as described here) using BaseDataSource.createResolver().

Simply downgrading the version did not solve the issue. So far we had to rewire manually in the AWS console.

Hence, the bug may have been introduced by a version > 1.74.0 and <= 1.83.0.

@BryanPan342
Copy link
Contributor

Hm.. I think this is probably a bug that was introduced in my PR #10111 to allow for appsync functions

To be completely honest, I wrote this code a while before it was merged, but I'm guessing the problem is coming from the fact that the CfnResolverProps.dataSourceName is not getting set properly

interestingly, I'm not getting errors on the tests we have currently.. in fact, the only change is in the name of the Cfn resource because we are creating the resolvers from the GraphqlApi instead of the DataSource itself.

can I ask if dataSourceName is getting set to the right name or if it's undefined (you can check by looking at the CfnTemplate)

if it's undefined there is a problem with the props between DataSource -> GraphqlAPI

if it's not the right name, the problem is with DataSource


also a sample code snippet of how you are creating resolvers would be greatly appreciated!

@asnaseer-resilient
Copy link
Author

@BryanPan342 Unfortunately I do not have the original outputs as I had to rapidly revert to get our service back up and running. A snippet of code that is used for this is shown below:

import * as cdk from '@aws-cdk/core';
import * as cdkAppsync from '@aws-cdk/aws-appsync';
    ...
    const lambdaApiResolverDataSource = graphQlApi.addLambdaDataSource('LambdaApiResolverDataSource', lambdaApiResolverFunction, {
      name: 'DataSource',
      description: `Resolves GraphQL requests for ${product}`
    });

    lambdaApiResolverDataSource.createResolver(
    {
      typeName: 'Mutation',
      fieldName: 'createCallFlow',
      requestMappingTemplate: cdkAppsync.MappingTemplate.lambdaRequest(
        `{
          "parent": $util.toJson($ctx.source),
          "field": $util.toJson($ctx.info.fieldName),
          "origin": "$ctx.request.headers.origin",
          "accessToken": "$ctx.request.headers.authorization",
          "userCustomerId": $util.toJson($ctx.identity.claims["custom:acu_cust_id"]),
          "userEmail": $util.toJson($ctx.identity.claims["email"]),
          "userIdentity": $util.toJson($ctx.identity),
          "arguments": $util.toJson($ctx.arguments)
        }`
      ),
      responseMappingTemplate: cdkAppsync.MappingTemplate.lambdaResult()
    });

@BryanPan342
Copy link
Contributor

@asnaseer-resilient thanks for the code snippet

Ah i think i see the issue.. how many data sources do you have that are named "DataSource"?

@asnaseer-resilient
Copy link
Author

asnaseer-resilient commented Feb 4, 2021

@BryanPan342 Just the one. We have several AppSync GraphQL services deployed. In each of these, there is a single lambda function (named DataSource as above) that resolves all the queries and mutations.

So each CloudFormation stack has a similar setup.

In the "real" code we have actually extracted some common CDK patterns into a helper, so the "real" code looks more like this:

    const lambdaApiResolverDataSource = graphQlApi.addLambdaDataSource('LambdaApiResolverDataSource', lambdaApiResolverFunction, {
      name: 'DataSource',
      description: `Resolves GraphQL requests for ${tags!.product}`
    });

    lambdaApiResolverDataSource.createResolver(CDK_UTILS.graphQlResolver('Mutation', 'createCallFlow'));
    lambdaApiResolverDataSource.createResolver(CDK_UTILS.graphQlResolver('Query', 'getCallFlow'));
    lambdaApiResolverDataSource.createResolver(CDK_UTILS.graphQlResolver('Query', 'listCallFlows'));
    lambdaApiResolverDataSource.createResolver(CDK_UTILS.graphQlResolver('Mutation', 'updateCallFlow'));
    lambdaApiResolverDataSource.createResolver(CDK_UTILS.graphQlResolver('Mutation', 'assignSmartnumbersToCallFlow'));
    lambdaApiResolverDataSource.createResolver(CDK_UTILS.graphQlResolver('Query', 'hasJobCompleted'));
    lambdaApiResolverDataSource.createResolver(CDK_UTILS.graphQlResolver('Mutation', 'deleteCallFlow'));

@BryanPan342
Copy link
Contributor

Hm.. okay the only thing that I can think of that might be messing with them being attached is the fact that the Cfn id is now changed.. these id's are based on the order in which CDK generates them.. similar to like a tree

So because resolvers were generated from DataSources before, the path looked like [Api][DataSource][Resolver] but now since we have made Resolvers stem directly from GraphqlApi, it will mutate the id to be [Api][Resolver]. I'll test out a couple of things and see if there are any changes!

@BryanPan342
Copy link
Contributor

@asnaseer-resilient okay, after some tests i found that because the id's of the components are the same, I actually ran into the following problem:

6:33:38 PM | CREATE_FAILED        | AWS::AppSync::Resolver      | ApiQuerygetTestsResolver025B8E
0A
Only one resolver is allowed per field. (Service: AWSAppSync; Status Code: 400; Error Code: BadR
equestException; Request ID: 8217847d-98c4-428d-ae6d-7370d07708e9; Proxy: null)

Seems to me like the issue lies with the way CloudFormation handles resource creation.

@MrArnoldPalmer what do you think is the best way to combat this? I remember discussing the reasoning behind having the resolvers be descended from GraphqlApi. But we are also at a point where, if we revert back to the old way, we will screw over all the new people who are using cdk w/ appsync.

@asnaseer-resilient
Copy link
Author

@BryanPan342 Could you add a feature flag to cdk.context.json to allow old-style behaviour?

@p0wl
Copy link

p0wl commented Feb 8, 2021

We are also running into this and we need to upgrade to 1.88.0 (from 1.76.0) because of another bug. This bug is blocking us from doing the upgrade right now.

So after reading this, the problem seems to be that Cloudformation wants to attach the new resolvers (=with new name) before the old ones are beeing detached (as they will be deleted in a different step).

So to fix this, I would need to detach all resolvers manually before the deploy, which results in a downtime. I really would want to avoid that. Also, downgrading resulted in some vtl resolves not being attached any longer, so that breaks as well :/

(thanks for your work on resolver functions @BryanPan342 , I was trying to do a pipeline resolver a month back and it was horrible, your change looks great!)

@BryanPan342
Copy link
Contributor

BryanPan342 commented Feb 9, 2021

@asnaseer-resilient we can try it and post to see if the team will take it!

this only problem i see with having a feature flag is that the code shouldn't just live there forever

essentially you are asking for something like this right?

  /**
   * creates a new resolver for this datasource and API using the given properties
   */
  public createResolver(props: BaseResolverProps): Resolver {
    /**
     * @deprecated 
     */
    if (this.node.tryGetContext('appsyncResolverVersion')) {
      return new Resolver(this, `${props.typeName}${props.fieldName}Resolver`, {
        api: this.api,
        dataSource: this,
        ...props,
      });
    }
    return this.api.createResolver({ dataSource: this, ...props });
  }

the more I think about this the more i think we should just return to the old version. Resolvers stem directly from data sources so in terms of hierarchy it should be created from a data source.


i know this might not be the solution for everyone, but a workaround would simply be deleting the CDK stack and redeploying.. this should remove the cloudformation generation issues.

@asnaseer-resilient
Copy link
Author

@BryanPan342 I would obviously prefer it if you reverted to previous behaviour.

However, when you say "...but a workaround would simply be deleting the CDK stack and redeploying..." I think this may not be as simple a workaround as we might think for everyone using the CDK. There may be others who have deployed to live (like us) and cannot afford any downtime caused by having to redeploy their services.

@BryanPan342
Copy link
Contributor

However, when you say "...but a workaround would simply be deleting the CDK stack and redeploying..." I think this may not be as simple a workaround as we might think for everyone using the CDK. There may be others who have deployed to live (like us) and cannot afford any downtime caused by having to redeploy their services.

Apologies if my suggestion sounded condescending! I only added it there primarily for people looking for an immediate solution (for example, people playing with CDK, or not concerned about down)

@MrArnoldPalmer
Copy link
Contributor

MrArnoldPalmer commented Feb 19, 2021

@BryanPan342 I agree with your assessment that at this point we should revert to the old behavior. Resolvers do always reference a single data source so this feels right hierarchy wise as you say.

To sum this up, if we ever replace existing resolvers with new ones (via new IDs), because cloudformation attempts to create these new resolvers before deleting the old ones, AppSync is trying to attach two resolvers to the same query and therefore doesn't attach the new resolver. Then the old ones are deleted as part of the CFN deploy. If that is the case, are users on current versions going to have the same issue if we revert and they upgrade? IE their resolver IDs have changed and they get deployed not attached to anything?

Just as a reminder, the AppSync module is experimental, there will likely be more breaking changes in the future. We will do whatever we can to make minimize outages caused by resource replacement, but we can't guarantee that they will never happen at this point.

@MrArnoldPalmer MrArnoldPalmer added effort/small Small work item – less than a day of effort p2 and removed needs-triage This issue or PR still needs to be triaged. labels Feb 20, 2021
@BryanPan342
Copy link
Contributor

@MrArnoldPalmer almost! pipeline resolvers stem from the GraphQL API. They don't have a base data source. So in order to combat that, I added a function to the GraphqlApi base class to create a resolver from the Graphql API itself.

So for new users there are two cases:

// Resolver creation from data source
ds.createResolver(...) 

// Resolver creation from API
api.createResolver(...)

If the resolver was created from the data source, then yes the user would have to restart their service.

If the resolver was created from the API, they would be fine!

@mergify mergify bot closed this as completed in #12973 Feb 22, 2021
mergify bot pushed a commit that referenced this issue Feb 22, 2021
)

* Revert to old behavior of allowing a data source to create a resolver as UNIT resolvers stem from data sources.
* Maintaining new behavior of having `GraphqlApi` create resolvers as PIPELINE resolvers stem from the `GraphqlApi` itself

Fixes #12635
Fixes #11522

----

*By submitting this pull request, I confirm that my contribution is made under the terms of the Apache-2.0 license*
@github-actions
Copy link

⚠️COMMENT VISIBILITY WARNING⚠️

Comments on closed issues are hard for our team to see.
If you need more assistance, please either tag a team member or open a new issue that references this one.
If you wish to keep having a conversation with other community members under this issue feel free to do so.

@p0wl
Copy link

p0wl commented Feb 22, 2021

Thank you all! Looking forward to the next release 🎉

eladb pushed a commit that referenced this issue Feb 22, 2021
)

* Revert to old behavior of allowing a data source to create a resolver as UNIT resolvers stem from data sources.
* Maintaining new behavior of having `GraphqlApi` create resolvers as PIPELINE resolvers stem from the `GraphqlApi` itself

Fixes #12635
Fixes #11522

----

*By submitting this pull request, I confirm that my contribution is made under the terms of the Apache-2.0 license*
@asnaseer-resilient
Copy link
Author

I upgraded to v1.91.0 and can confirm that the issues have been resolved. Thank you all 👍

@eladb
Copy link
Contributor

eladb commented Mar 1, 2021

@asnaseer-resilient thanks for letting us know

@p0wl
Copy link

p0wl commented Mar 1, 2021

Same here! Works and upgrade was flawless. Now finally migrating to NodeJsLambda 🎉

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
@aws-cdk/aws-appsync Related to AWS AppSync bug This issue is a bug. effort/small Small work item – less than a day of effort p2
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants