Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix involuntary cassandra upserts #3513

Merged
merged 25 commits into from
Aug 21, 2023
Merged

Conversation

fisx
Copy link
Contributor

@fisx fisx commented Aug 16, 2023

https://wearezeta.atlassian.net/browse/WPB-3915

Checklist

  • Re-visit all changed UPDATE commands, and see if UPSERT wasn't intentional. If so, remove the IF EXISTS.
  • Change types of commands that still contain IF EXISTS so they return Row, and fix type errors.
  • Add a new entry in an appropriate subdirectory of changelog.d
  • Read and follow the PR guidelines

@zebot zebot added the ok-to-test Approved for running tests in CI, overrides not-ok-to-test if both labels exist label Aug 16, 2023
Calling side fails with not-found if client doesn't exist yet.
Claims have been working without problems since the beginning of time,
and I didn't fine any real problems here, but I think the
`deleteClaim` actually has update, not upsert semantics: if we delete
a claim, we don't want that to result in a row with `claims = -1`.

Checking the (only) call site suggests both work fine, though.
@fisx fisx marked this pull request as ready for review August 20, 2023 20:48
@fisx fisx requested a review from battermann August 20, 2023 20:48
@fisx
Copy link
Contributor Author

fisx commented Aug 21, 2023

lots of integration test failures, but i think they are unrelated?

@battermann
Copy link
Contributor

I think there are a lot of places where we (primarily clients, but also in our integration tests) rely on the upsert semantic.

So I wouldn't say, that the upserts are necessarily all involuntary.

Even if we fix our integration test accordingly, we can't verify if those changes don't break clients.

I don't think we should proceed with this. Should we maybe rename all functions from updateXY to upsertXY.

@fisx
Copy link
Contributor Author

fisx commented Aug 21, 2023

I don't think we should proceed with this. Should we maybe rename all functions from updateXY to upsertXY.

good idea!

So I wouldn't say, that the upserts are necessarily all involuntary.

i've worked under the assumption that if the update would create an opviously broken row in case of insert, then it's probably a proper update, not an upsert. Is that rule wrong, or did I not apply it carefully enough, or are the test failures unrelated?

(note that i did spot a few upserts, and fixed them in the last few commits.)

@battermann
Copy link
Contributor

i've worked under the assumption that if the update would create an opviously broken row in case of insert, then it's probably a proper update, not an upsert. Is that rule wrong, or did I not apply it carefully enough, or are the test failures unrelated?

Yes, that makes sense. If an insert would lead to corrupted state, that should be the right place to check for existence. The test failures I saw were indeed unrelated, that's why I was more concerned than necessary.

Copy link
Contributor

@battermann battermann left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I made some comments where I am not sure, please review these. Otherwise it looks good!

services/brig/src/Brig/API/OAuth.hs Outdated Show resolved Hide resolved
services/brig/src/Brig/Data/Connection.hs Outdated Show resolved Hide resolved
services/brig/src/Brig/Provider/DB.hs Outdated Show resolved Hide resolved
@@ -42,7 +42,7 @@ import UnliftIO qualified
updateClient :: Bool -> UserId -> ClientId -> Client ()
updateClient add usr cls = do
let q = if add then Cql.addMemberClient else Cql.rmMemberClient
retry x5 $ write (q cls) (params LocalQuorum (Identity usr))
retry x5 . void $ trans (q cls) (params LocalQuorum (Identity usr))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure here for addMemberClient. Is it possible that a user has no clients, and this really is an upsert?

Copy link
Contributor Author

@fisx fisx Aug 21, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not sure, you may be right. 52d4470

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sorry, i missed a spot: ea8595e

services/galley/src/Galley/Cassandra/Queries.hs Outdated Show resolved Hide resolved
@@ -51,7 +51,7 @@ getCustomBackend domain =

setCustomBackend :: MonadClient m => Domain -> CustomBackend -> m ()
setCustomBackend domain CustomBackend {..} = do
retry x5 $ write Cql.updateCustomBackend (params LocalQuorum (backendConfigJsonUrl, backendWebappWelcomeUrl, domain))
retry x5 . void $ trans Cql.updateCustomBackend (params LocalQuorum (backendConfigJsonUrl, backendWebappWelcomeUrl, domain))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure about this either? I cannot recall what this does exactly, but this update has all it needs to be a proper insert, AFAICT.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you're right. damn! 0494cd3

services/gundeck/src/Gundeck/Push/Data.hs Outdated Show resolved Hide resolved
retry x5 $ write updateSSOTeamConfig (params LocalQuorum (ssoTeamConfigStatus, tid))
updateSSOTeamConfig :: PrepQuery W (FeatureStatus, TeamId) ()
updateSSOTeamConfig = "update team_features set sso_status = ? where team_id = ?"
retry x5 . void $ trans updateSSOTeamConfig $ params LocalQuorum (ssoTeamConfigStatus, tid)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure here either?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm pretty sure we don't want to configure deleted or never-created teams, so I think this is fine. But if you don't believe me I'll change it back, this is almost-dead code anyway. :)

@fisx fisx requested a review from battermann August 21, 2023 12:17
find tools services libs integration -name '*.hs' \
  -exec perl -i -pe \
  's/(".*") -- `IF EXISTS`, but that is too expensive/{- `IF EXISTS`, but that needs benchmarking -} $1/ge'
@fisx
Copy link
Contributor Author

fisx commented Aug 21, 2023

got caught by @akshaymankar and @pcapriotti: 'if exists' is dangerous for cassandra performance!

so this PR is less exciting than i was hoping it would get, but at least we have more docs in the code now, so this effort won't likely be spent again in the future without effect.

@fisx fisx merged commit 0cf66be into develop Aug 21, 2023
7 of 9 checks passed
@fisx fisx deleted the WPB-3915-fix-involuntary-upserts branch August 21, 2023 20:35
pcapriotti added a commit that referenced this pull request Sep 27, 2023
* Fix bug: federatorInternal host not set for background-worker (#3516)

* WPB-3916: Filtering out duplicate members when sending defederation notifications (#3515)

* integration: Add test to verify behaviour with offline backends (#3501)

* background-worker: Make push backoff times configurable

* brig/getFederationStatus: Always return NonConnectedBackends as empty when fed policy is AllowAll

* integration: Use separate vHosts for backendA and B.

* integration/RunServices: Add hack to make federation work

* integration: Add test to verify behaviour with offline backends

* helm-var-integration: Workaround bug with federation

* integration-test.sh: Run new integration test suite first

---------

Co-authored-by: Marko Dimjašević <marko.dimjasevic@wire.com>

* Distinguish between update and upsert cassandra commands (#3513)

* Remove billing-team-member-backfill tool (#3520)

* dockerephemeral: Increase nofile ulimits for ES and Fake DynamoDB (#3521)

* [WPB 3842] Federation completeness check (#3514)

* WPB-3842: Improving checks for adding users to a conversation.

Added a check to `ensureAllowed` that checks for full federation
connections for domains in a conversation, including the domains for new
users.

* WPB-3842: Adding the changelog

* WPB-3842: Moving where the extra domain checks are being performed.

Updating integration tests to reflect the updated semantics of
conversation join semantics. Many of them weren't expecting errors
relating to unreachable domains, and had to be updated to reflect this.

* Fix asserted domains in an integration test

* Integration test: assert on non-federating domains

* WPB-3842: Changing parallel testing to sequential testing

---------

Co-authored-by: Marko Dimjašević <marko.dimjasevic@wire.com>

* WPB-3798 incorrect json field names (#3518)

* WPB-3798: Updating code and tests after renaming fields

* WPB-3798: More updates to names after finding more JSON prefix mangling

* WPB-3798: Fixing schema instances for SAML data

* WPB-3798: Fixing instances that had errors, found by tests

* WPB-3798: Adding changelogs

* WPB-3798: PR feedback.

* WPB-3798: Fixing an error with a field called `data'`

The trailing ' would end up in the JSON representation. I've changed it
to use a leading `_` like other structures, and wrote a newtype to
handle the minimal prefix stripping.

Also cleaning up the diff in regards to imports.

* WPB-3798: Cleaning up imports to minimise the diff

* nit-picks (#3519)

* Remove unneeded -Wwarn (re-enabeling -Werror in those modules).

* Makefile: fix hspec_options overloading in .envrc.local.

* integration: Fix testAddingUserNonFullyConnectedFederation and testNotificationsForOfflineBackends (#3529)

* integration: Fix testAddingUserNonFullyConnectedFederation

* integration: Don't allow adding users to conv when one of the pariticipating backends is down

* integration: Add retries to get around problem of federation domain sync threads

* Introduce API v5 (#3527)

* Introduce development version 5

* Specialise API to a specific version

* Use versioned swagger for galley

* Use version swagger for all other services

* Collect all service Swaggers into a typeclass

* Fix swagger integration tests

* Revert any changes to API versions before 5

* Remove promotion of isDevelopmentVersion

* Add CHANGELOG entry

* stern: Optimize RAM usage of /i/users/meta-info (#3522)

* stern: Fetch only the notifications that are needed

* stern: Fetch only the conversations that are needed

* Integration tests: use static ports (#3536)

* [WPB-3799] cannot fetch conversation details after connection request (#3538)

* brig-integration: Fix flaky tests for API.Federation (#3539)

* brig-integration: Don't assume only 1 result in search by display name

Display names are random strings from 2 to 128 characters. If a 2 string name gets generated it is likely that it matches some name generated in another test.

* brig-integration: Mark test not flaky

It didn't fail after runnning it 1000 times.

* Integration suite: Fix bug in local setup: wrong port for nginz http2 (#3543)

* [WPB-662] servantify brig provider bot api (#3540)

* Fix broken "we are hiring" link (#3549)

* Multi-ingress guest links (#3546)

* Check validity of notification IDs (#3550)

* Check validity of notification IDs

* Add CHANGELOG entry

* fixup! Add CHANGELOG entry

* fixup! fixup! Add CHANGELOG entry

* WPB-633 Servantify Brig/Provider.Service API (#3554)

* WPB-1214: Servantify Brig/Provider.Service API

- Moving the routes over to servant, and removing the old routing code.
- Adding new instances to types that needed them for servant.

* WPB-663: Removing a redundant TODO comment, adding changelog

* Fix ES migration script. (#3558)

* Revert "WPB-633 Servantify Brig/Provider.Service API (#3554)"

This reverts commit 3653d56.

* Integration tests: delete all rabbitmq queues during dynamic backends setup phase (#3523)

* [WPB-4406] federator improve logging (#3556)

* Makefile: Avoid executing the hint (#3564)

Backticks execute the command even when they are in quotes.

* Finalise v4 (#3545)

* Remove MLS endpoints from the API

They will be reintroduced when merging the mls branch. These endpoints
are not currently functional on develop, so removing them from here will
reduce the amount of conflicts.

* Finalise v4

* Add CHANGELOG entry

* Add pregenerated swagger for v4

* Delete MLS tests in brig

* Remove more MLS endpoints from v4

* Set default API version to 5 in integration tests

* Update the documentation on API versioning

---------

Co-authored-by: Marko Dimjašević <marko.dimjasevic@wire.com>

* Fix: SCIM user lookup after changing IdP issuer ID (#3473)

* doc: document webapp configuration for multi-ingress environments (#3569)


---------

Co-authored-by: Sven Tennie <sven.tennie@gmail.com>

* [WPB-4361] upgrade jwt-tools (#3559)

* cassandra: Add column and table names in parsing error messages (#3555)

* s/CORS/CSP/ as mentionned by Sven in WPB-2912

* Replace broken integrations with links

see WPB-3599

* replace all instances of example.com with wire.example as per wpb-2621, in charts only

* change back from wire.example to example.com as this was mistakenly commit to develop instead of to the proper branch

* add documentation on creating a first user

* reverting previous commit as sent to wrong branch

* Update sftd docs: include uri scheme in allowOrigin (#3584)

* Update sftd docs: include uri scheme in allowOrigin

* fixup

* WPB-4629 impossible to add users to a conversation if one of the members is from an offline backend (#3585)

* fake-aws-s3 chart: Upgrade to minio 5.0.13 (#3565)

* Disable de-federation to avoid running into a scalability issue (#3582)

https://wearezeta.atlassian.net/browse/WPB-4668

Co-authored-by: Akshay Mankar <akshay@wire.com>

* [WPB-3664] Bug fix: Notify remote backends of their users removed from conversation when reachable again (#3537)

* Formatting

* Test utilities for changing a conv name

* Add a test confirming the bug report

* An action to enqueue notifications concurrently

* Enqueue member removal notification for remotes

* Add a changelog

* Test case formatting

* Migrate test roleUpdateWithRemotesUnavailable

* Migrate test putReceiptModeWithRemotesOk

* Migrate test putReceiptModeWithRemotesUnavailable

* Migrate test testRoleUpdateWithRemotesOk

* Migrate test roleUpdateRemoteMember

* Migrate test putQualifiedConvRenameWithRemotesUnavailable

This one is already covered by testSynchroniseUserRemovalNotification

* Migrate test putQualifiedConvRenameWithRemotesOk

* Migrate test deleteLocalMemberConvLocalQualifiedOk

* Migrate test deleteRemoteMemberConvLocalQualifiedOk

* Migrate test deleteUnavailableRemoteMemberConvLocalQualifiedOk

* Add the copyright header to a test module

* Move a test utility (allPreds)

* Test utility: create a team with members

* Migrate test testAccessUpdateGuestRemoved

* Migrate test messageTimerChangeWithRemotes

* Migrate test messageTimerUnavailableRemotes

* Migrate test testAccessUpdateGuestRemovedRemotesUnavailable

* Migrate test accessUpdateWithRemotes

* Migrate test testAddRemoteMember

* Migrate test testDeleteTeamConversationWithRemoteMembers

* Migrate test testDeleteTeamConversationWithUnavailableRemoteMembers

* Move a test utility (assertLeaveNotification)

* Migrate test "POST /federation/leave-conversation : Success"

* Migrate test "POST /federation/on-user-deleted-conversations : Remove deleted remote user from local conversations"

* Migrate test updateConversationByRemoteAdmin

* Tests: support giving a role when adding

* Use cannon API for notifications when possible

* Use startDynamicBackends when possible

* Fix assertion

* Migrate test testAddRemoteUsersToLocalConv

* Test add member endpoint at version 1

* Add return value to enqueueNotification

* Use cannon assertions in offline backends test

* Check that remote notifications are received

* Test removal of users from unreachable backends

* Use correct domains for default backends

Taking the domains in the `backendA` and `backendB` resources only works
locally.

* fixup! Use cannon assertions in offline backends test

---------

Co-authored-by: Paolo Capriotti <paolo@capriotti.io>
Co-authored-by: Akshay Mankar <akshay@wire.com>

* WPB-4240: Migrate from swagger2 to openapi3 (#3570)


---------

Co-authored-by: Igor Ranieri Elland <54423+elland@users.noreply.github.com>
Co-authored-by: Igor Ranieri <igor@elland.me>

* Remove mocked MLS member add test

* Resolve conflict in pregenerated swagger

* Remove MLS end2end tests

---------

Co-authored-by: Stefan Matting <smatting@users.noreply.github.com>
Co-authored-by: Owen Harvey <owenlharvey@gmail.com>
Co-authored-by: Akshay Mankar <akshay@wire.com>
Co-authored-by: Marko Dimjašević <marko.dimjasevic@wire.com>
Co-authored-by: fisx <mf@zerobuzz.net>
Co-authored-by: Igor Ranieri Elland <54423+elland@users.noreply.github.com>
Co-authored-by: Leif Battermann <leif.battermann@wire.com>
Co-authored-by: Jappie Klooster <jappieklooster@hotmail.com>
Co-authored-by: Leif Battermann <leifbattermann@gmail.com>
Co-authored-by: Thomas Belin <thomasbelin4@gmail.com>
Co-authored-by: Sven Tennie <sven.tennie@gmail.com>
Co-authored-by: Arthur Wolf <wolf.arthur@gmail.com>
Co-authored-by: Igor Ranieri <igor@elland.me>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ok-to-test Approved for running tests in CI, overrides not-ok-to-test if both labels exist
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants