Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Better warning message on cluster sharding registration #24906

Merged

Conversation

richardimaoka
Copy link
Contributor

Closes #24295

[info] [WARN] [04/14/2018 05:23:39.373] [ShardingSystem-akka.actor.default-dispatcher-19] [akka.tcp://ShardingSystem@127.0.0.1:2554/system/sharding/Device] No coordinator found to register. Probably, no seed-nodes configured and manual cluster join not performed? Total [44] buffered messages.

@akka-ci akka-ci added validating PR is currently being validated by Jenkins tested PR that was successfully built and tested by Jenkins and removed validating PR is currently being validated by Jenkins labels Apr 13, 2018
@akka-ci
Copy link

akka-ci commented Apr 13, 2018

Test PASSed.

coordinatorSelection, shardBuffers.totalSize)
actorSelection, shardBuffers.totalSize)
case None ⇒ log.warning(
"No coordinator found to register. Probably, no seed-nodes configured and manual cluster join not performed? Total [{}] buffered messages.",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This can also happen when members are joining but not yet Up - perhaps we could find a simple way to word this that makes that more obvious?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Below cases are possible for coordinatorSelection being None ?

  1. seed-node not configured and manual configuration is not performed
  2. members are joining but not yet Up
  3. general network problem

I think the original issue #24295 intended to make 1. more obvious, and distinguished from the case where coordinatorSelection is Some(). However, I'm not sure how to put these 1 ~ 3 cases into a single warning message with reasonable length... so, what I think could do are:

  • put a lengthy message saying 1, 2, and 3 are possible
  • or, link to a certain section of the doc explaining 1, 2 and 3

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this distinction between None and Some makes it much more clear what the reason might be. Not Up yet should be rare, since coordinator is on the oldest.

One additional hint to the log message of the Some case could be to include something if it's unreachable:

val coordinatorIsUnreachable = cluster.state.unreachable(membersByAge.head)

Copy link
Member

@patriknw patriknw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

suggesting one addition

coordinatorSelection, shardBuffers.totalSize)
actorSelection, shardBuffers.totalSize)
case None ⇒ log.warning(
"No coordinator found to register. Probably, no seed-nodes configured and manual cluster join not performed? Total [{}] buffered messages.",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this distinction between None and Some makes it much more clear what the reason might be. Not Up yet should be rare, since coordinator is on the oldest.

One additional hint to the log message of the Some case could be to include something if it's unreachable:

val coordinatorIsUnreachable = cluster.state.unreachable(membersByAge.head)

@richardimaoka
Copy link
Contributor Author

Hi @patriknw should that be something like this .. f1ff17e? Let me know if I didn't get this correctly.

@akka-ci akka-ci added validating PR is currently being validated by Jenkins tested PR that was successfully built and tested by Jenkins and removed tested PR that was successfully built and tested by Jenkins validating PR is currently being validated by Jenkins labels Apr 29, 2018
@akka-ci
Copy link

akka-ci commented Apr 29, 2018

Test PASSed.

@akka-ci akka-ci added validating PR is currently being validated by Jenkins and removed tested PR that was successfully built and tested by Jenkins labels May 9, 2018
@richardimaoka richardimaoka force-pushed the wip-24295-cluster-sharding-error-richard branch from d45cc70 to 077ed2f Compare May 9, 2018 21:33
@richardimaoka
Copy link
Contributor Author

Added a new commit to this - @patriknw does this look ok?

@akka-ci akka-ci added validating PR is currently being validated by Jenkins needs-attention Indicates a PR validation failure (set by CI infrastructure) and removed validating PR is currently being validated by Jenkins labels May 9, 2018
@akka-ci
Copy link

akka-ci commented May 9, 2018

Test FAILed.

@richardimaoka
Copy link
Contributor Author

argh, will look into the test failure

@akka-ci akka-ci added tested PR that was successfully built and tested by Jenkins and removed needs-attention Indicates a PR validation failure (set by CI infrastructure) labels May 9, 2018
@akka-ci
Copy link

akka-ci commented May 9, 2018

Test PASSed.

@richardimaoka
Copy link
Contributor Author

Test passed, and ready for review now.

The earlier test failure was due to a wrong character I accidentally inserted in a comment, which I removed by git commit --amend.

Copy link
Member

@patriknw patriknw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Copy link
Member

@ktoso ktoso left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

@ktoso ktoso merged commit 989a80d into akka:master May 21, 2018
zbynek001 added a commit to zbynek001/akka.net that referenced this pull request Jun 24, 2018
Aaronontheweb pushed a commit to akkadotnet/akka.net that referenced this pull request Jun 25, 2018
* Provide access to known shard types
akka/akka#23912

* Separate sharding regions and proxies
akka/akka#23472

Fix lookup of coordinator for sharding proxies
akka/akka#23995

* Fix race in ClusterShardingFailureSpec

AFAICT there was nothing ensuring the order of messages when sent to the
shard and the region so first checkthat the passivation has happened
before sending another add in the test
akka/akka#24013

* Better warning message on cluster sharding registration
akka/akka#24906

* entityId => Behavior in ClusterSharding API
mixture of
akka/akka#24053
akka/akka#21809
akka/akka#24470

* sharding tests updated

* headers fixed, docs updated

* ClusterSharding: automatically choose start or startProxy by a node role
akka/akka#23934
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
tested PR that was successfully built and tested by Jenkins
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants