Add "cost" hook in tap base #348

MeltyBot · 2022-03-20T17:19:25Z

Migrated from GitLab: https://gitlab.com/meltano/sdk/-/issues/350

Originally created by @laurentS on 2022-03-20 17:19:25

Summary

As a user of a tap, I would like to know how much API "cost" was caused by a tap run.
For instance, the github API has per-hour usage limits, where 1 REST API call costs 1, or a graphql API call has a cost that depends on the number of nodes returned.
Other APIs might charge per call.

At the end of a tap run, I would like to know how much of the resource the tap consumed.

Proposed benefits

With tap-github running in production, we have found it hard to track the reasons behind sudden surges in "quota exceeded" errors. Being able to identify which runs use what would help in understanding the cause of such issues.

For billable APIs/resources, such a feature could also help track actual dollar costs.

Making these values retrievable would allow tracking them in monitoring systems, etc...

Proposal details

As it's not really possible to define how this cost is calculated at the SDK level, it would be great if the SDK provided some method that a tap could overload to calculate and accumulate said cost, something like:

class Tap:
    def calculate_request_cost(self, request, response) -> Integer:
        # return whatever cost we want to track in arbitrary units

This could be called by the SDK after each request returns (as the cost might depend on the content of the response). The SDK would simply keep a sum of all these results, and at the end of the run, the tap would at a minimum log a line like:

Total cost for this run: NNN

or possibly export this value in the state as another metric?

Again, the final method could simply be a no-op by default that each tap can override to implement appropriate behaviour.

Best reasons not to build

By default, these methods can simply do a pass, and therefore have only negligible performance impact. There is no behaviour change related to this, unless the result is exported in state, which might cause issues downstream depending on the target used.

The text was updated successfully, but these errors were encountered:

MeltyBot · 2022-05-30T00:11:07Z

View 4 previous comments from the original issue on GitLab

laurentS · 2022-06-06T08:41:48Z

Just noting that I'm going to work on this

aaronsteers · 2022-06-06T20:53:32Z

@laurentS - Sounds great! Are you okay with a simple log line printed or were you thinking about emitting another machine-readable manner?

In theory, Singer metrics could be used (additionally or instead of the proposed human-readable print) and could be more machine-readable. https://hub.meltano.com/singer/spec#metrics

Another approach is to add to STATE, but since STATE is cumulative of all executions, I'm not sure that's a great fit. (Happy to discuss though.)

aaronsteers · 2022-06-06T20:56:49Z

@laurentS - The one spec consideration is to perhaps introduce a domain label for the cost and to make this support n number of cost domains.

So, there could be a graphql_quota separate from rest_calls quota, and/or estimated_dollar_cost. I'm not tied to any specific implementation but I do think it would be good to consider supporting a variety of cost factors and let the tap developer support a combination of factors if desired.

laurentS · 2022-06-07T05:58:03Z

@aaronsteers about cost, I indeed started with a cost function that can return cost along any arbitrary dimensions ("domains", to reuse your words), in the form of a dict that each tap developer can define. For instance tap-github could return something like {"rest": 0, "graphql": 42, "search": 0} for a single request (42 being provided in the API response). The sdk simply adds up these dicts key by key, so the tap developer just needs to calculate the cost associate with a single request/response, but they can decide what cost metrics are relevant for their tap.

About logging the result, I haven't figured out the best way to do it yet. My thinking is that a single summary value at the end of the tap run would be more useful than one line per API call, ideally broken down per stream. I'd love to print out the result like a metric, in JSON format, so that users can optionally parse this automatically if they want to.

Singer metrics look like a nice way to achieve this, but maybe a human readable log line is enough to get started, and we can always improve on this if there's a need for it?

aaronsteers · 2022-06-07T11:56:00Z

@laurentS - sounds good to me! 👍

edgarrmondragon · 2022-06-21T18:36:03Z

Closed by #704

laurentS mentioned this issue Jun 8, 2022

feat(taps): Add api costs hook #704

Merged

edgarrmondragon closed this as completed Jun 21, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add "cost" hook in tap base #348

Add "cost" hook in tap base #348

MeltyBot commented Mar 20, 2022

MeltyBot commented May 30, 2022

laurentS commented Jun 6, 2022

aaronsteers commented Jun 6, 2022 •

edited

Loading

aaronsteers commented Jun 6, 2022 •

edited

Loading

laurentS commented Jun 7, 2022

aaronsteers commented Jun 7, 2022

edgarrmondragon commented Jun 21, 2022

Add "cost" hook in tap base #348

Add "cost" hook in tap base #348

Comments

MeltyBot commented Mar 20, 2022

Summary

Proposed benefits

Proposal details

Best reasons not to build

MeltyBot commented May 30, 2022

laurentS commented Jun 6, 2022

aaronsteers commented Jun 6, 2022 • edited Loading

aaronsteers commented Jun 6, 2022 • edited Loading

laurentS commented Jun 7, 2022

aaronsteers commented Jun 7, 2022

edgarrmondragon commented Jun 21, 2022

aaronsteers commented Jun 6, 2022 •

edited

Loading

aaronsteers commented Jun 6, 2022 •

edited

Loading