Add Stats to DatastreamTaskImpl #855

vmaheshw · 2021-10-19T07:50:27Z

We frequently hear need to get some of the task level metrics for diagnostics that can be retrieved using the brooklin-service end-point.

LoadBasedPartitionAssignmentStrategy distributes the partitions evenly based on the load. To be able to debug and validate the distribution, it is important to be able to pull out the metrics at task level and perform offline analytics on the data.

This PR exposes a new knob stats that can used to save the task level stats on the zookeeper and can be used to retrieve similar to other end-points.

Pull latest

surajkn

Just curious, why do we want to save these stats in ZK instead of simply reporting this as a task level metric ? Is it because in Ingraphs its hard/not possible to identify a specific task's metrics?

vmaheshw · 2021-10-20T17:28:05Z

Just curious, why do we want to save these stats in ZK instead of simply reporting this as a task level metric ? Is it because in Ingraphs its hard/not possible to identify a specific task's metrics?

There is a limitation of number of metrics that we can emit from the container. Also, if we want to build a diagnostics command to collect the information from large clusters and analyze the data, it is difficult with the metrics. Also, these metrics are emitted only by the leader and on leader switch, these metrics will not emitted until the datastream is restarted.

surajkn · 2021-10-21T17:33:32Z

datastream-server/src/main/java/com/linkedin/datastream/server/zk/ZkAdapter.java

@@ -681,6 +682,11 @@ private void addTaskNodes(String instance, DatastreamTaskImpl task) {
        KeyBuilder.datastreamTaskState(_cluster, task.getConnectorType(), task.getDatastreamTaskName());
    _zkclient.ensurePath(taskStatePath);

+    // save the task stats.


Update the task node's directory structure in the method description above. This is a new subdirectory "stats" under the task, correct?

No, "stats" directory will be inside "state" directory and will be conditional.

shrinandthakkar · 2021-10-21T20:24:06Z

...rver/src/main/java/com/linkedin/datastream/server/assignment/LoadBasedPartitionAssigner.java

+      DatastreamTaskImpl newTask) {
+    PartitionAssignmentStatPerTask stat = PartitionAssignmentStatPerTask.fromJson(((DatastreamTaskImpl) task).getStats());
+    if (partitionInfoMap.isEmpty()) {
+      stat.isThroughputRateLatest = false;


Does it make sense to have a timestamp field here instead of having the latest flag, so that we get a sense of the last partition throughput distribution more accurately?

Yes, we can add timestamp. We still need the latest flag, because not all the partition assignments will use Throughput based balancing.

I will address it separately.

gotcha! thanks

...rver/src/main/java/com/linkedin/datastream/server/assignment/LoadBasedPartitionAssigner.java

.../src/test/java/com/linkedin/datastream/server/assignment/TestLoadBasedPartitionAssigner.java

We frequently hear need to get some of the task level metrics for diagnostics that can be retrieved using the brooklin-service end-point. LoadBasedPartitionAssignmentStrategy distributes the partitions evenly based on the load. To be able to debug and validate the distribution, it is important to be able to pull out the metrics at task level and perform offline analytics on the data. This PR exposes a new knob stats that can used to save the task level stats on the zookeeper and can be used to retrieve similar to other end-points.

vmaheshw and others added 3 commits November 18, 2019 12:06

Merge pull request #1 from linkedin/master

c31cd4a

Pull latest

Merge branch 'master' of github.com:vmaheshw/Brooklin

306b90a

Add Stats to DatastreamTaskImpl

59bc9a2

vmaheshw marked this pull request as draft October 19, 2021 07:50

Add tests

77d4635

vmaheshw requested review from jzakaryan, shrinandthakkar, atoomula and surajkn October 19, 2021 08:32

vmaheshw marked this pull request as ready for review October 19, 2021 17:54

surajkn reviewed Oct 19, 2021

View reviewed changes

vmaheshw requested a review from surajkn October 20, 2021 21:33

surajkn previously approved these changes Oct 21, 2021

View reviewed changes

shrinandthakkar reviewed Oct 21, 2021

View reviewed changes

jzakaryan previously approved these changes Oct 21, 2021

View reviewed changes

...rver/src/main/java/com/linkedin/datastream/server/assignment/LoadBasedPartitionAssigner.java Outdated Show resolved Hide resolved

.../src/test/java/com/linkedin/datastream/server/assignment/TestLoadBasedPartitionAssigner.java Outdated Show resolved Hide resolved

Address comments

3e9f174

vmaheshw dismissed stale reviews from jzakaryan and surajkn via 3e9f174 October 21, 2021 23:01

vmaheshw requested review from shrinandthakkar, surajkn and jzakaryan October 21, 2021 23:01

shrinandthakkar approved these changes Oct 25, 2021

View reviewed changes

surajkn approved these changes Oct 25, 2021

View reviewed changes

vmaheshw merged commit 7c0aa1d into linkedin:master Oct 25, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add Stats to DatastreamTaskImpl #855

Add Stats to DatastreamTaskImpl #855

vmaheshw commented Oct 19, 2021

surajkn left a comment

vmaheshw commented Oct 20, 2021

surajkn Oct 21, 2021

vmaheshw Oct 21, 2021

shrinandthakkar Oct 21, 2021

vmaheshw Oct 21, 2021

vmaheshw Oct 21, 2021

shrinandthakkar Oct 25, 2021

Add Stats to DatastreamTaskImpl #855

Add Stats to DatastreamTaskImpl #855

Conversation

vmaheshw commented Oct 19, 2021

surajkn left a comment

Choose a reason for hiding this comment

vmaheshw commented Oct 20, 2021

surajkn Oct 21, 2021

Choose a reason for hiding this comment

vmaheshw Oct 21, 2021

Choose a reason for hiding this comment

shrinandthakkar Oct 21, 2021

Choose a reason for hiding this comment

vmaheshw Oct 21, 2021

Choose a reason for hiding this comment

vmaheshw Oct 21, 2021

Choose a reason for hiding this comment

shrinandthakkar Oct 25, 2021

Choose a reason for hiding this comment