-
Notifications
You must be signed in to change notification settings - Fork 136
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add Stats to DatastreamTaskImpl #855
Add Stats to DatastreamTaskImpl #855
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just curious, why do we want to save these stats in ZK instead of simply reporting this as a task level metric ? Is it because in Ingraphs its hard/not possible to identify a specific task's metrics?
There is a limitation of number of metrics that we can emit from the container. Also, if we want to build a diagnostics command to collect the information from large clusters and analyze the data, it is difficult with the metrics. Also, these metrics are emitted only by the leader and on leader switch, these metrics will not emitted until the datastream is restarted. |
@@ -681,6 +682,11 @@ private void addTaskNodes(String instance, DatastreamTaskImpl task) { | |||
KeyBuilder.datastreamTaskState(_cluster, task.getConnectorType(), task.getDatastreamTaskName()); | |||
_zkclient.ensurePath(taskStatePath); | |||
|
|||
// save the task stats. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Update the task node's directory structure in the method description above. This is a new subdirectory "stats" under the task, correct?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, "stats" directory will be inside "state" directory and will be conditional.
DatastreamTaskImpl newTask) { | ||
PartitionAssignmentStatPerTask stat = PartitionAssignmentStatPerTask.fromJson(((DatastreamTaskImpl) task).getStats()); | ||
if (partitionInfoMap.isEmpty()) { | ||
stat.isThroughputRateLatest = false; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does it make sense to have a timestamp field here instead of having the latest flag, so that we get a sense of the last partition throughput distribution more accurately?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, we can add timestamp. We still need the latest flag, because not all the partition assignments will use Throughput based balancing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I will address it separately.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
gotcha! thanks
...rver/src/main/java/com/linkedin/datastream/server/assignment/LoadBasedPartitionAssigner.java
Outdated
Show resolved
Hide resolved
.../src/test/java/com/linkedin/datastream/server/assignment/TestLoadBasedPartitionAssigner.java
Outdated
Show resolved
Hide resolved
We frequently hear need to get some of the task level metrics for diagnostics that can be retrieved using the brooklin-service end-point. LoadBasedPartitionAssignmentStrategy distributes the partitions evenly based on the load. To be able to debug and validate the distribution, it is important to be able to pull out the metrics at task level and perform offline analytics on the data. This PR exposes a new knob stats that can used to save the task level stats on the zookeeper and can be used to retrieve similar to other end-points.
We frequently hear need to get some of the task level metrics for diagnostics that can be retrieved using the brooklin-service end-point.
LoadBasedPartitionAssignmentStrategy distributes the partitions evenly based on the load. To be able to debug and validate the distribution, it is important to be able to pull out the metrics at task level and perform offline analytics on the data.
This PR exposes a new knob stats that can used to save the task level stats on the zookeeper and can be used to retrieve similar to other end-points.