feat: Add opentelemetry-instrumentation-nats package #667

ekosz · 2021-09-19T18:55:37Z

Which problem is this PR solving?

There is no instrumentation for nats.

Short description of the changes

This adds a new instrumentation package for the nats messaging system. Specifically for its nats.js library.

Work In Progress

Currently this PR is a work in progress. This is my first time contributing an OpenTelemetry package and I wanted to get the PR out early in a draft state to make sure I'm on the write path before moving further along.

TODOS

Add to the examples
More unit tests
Finish up the readme
What else have I forgotten?

Open Questions

How does one deal with wrapping generators? The nats.js library exposes its subscriptions via async iterators. The poses a challenge with the ContextAPI#with function. I'm not sure how to combine #with with a yield call. This is causing a test to currently fail as the context is not propagating properly without it.
Am I setting the correct messaging attributes? This seems to be the first contrib package that adds instrumentation to a messaging library. I tried to make sure I followed the messaging standard, but I would love confirmation on that.

linux-foundation-easycla · 2021-09-19T18:55:40Z

The committers are authorized under a signed CLA.

✅ Eric Koslow (45d9e82)

codecov · 2021-09-19T19:15:44Z

Codecov Report

Merging #667 (d34cba0) into main (648ffb7) will decrease coverage by 0.13%.
The diff coverage is n/a.

❗ Current head d34cba0 differs from pull request most recent head aaa58dc. Consider uploading reports for the commit aaa58dc to get more accurate results

@@            Coverage Diff             @@
##             main     #667      +/-   ##
==========================================
- Coverage   96.82%   96.68%   -0.14%     
==========================================
  Files           9       13       +4     
  Lines         630      634       +4     
  Branches      124      124              
==========================================
+ Hits          610      613       +3     
- Misses         20       21       +1

Impacted Files	Coverage Δ
...metry-instrumentation-document-load/src/version.ts	`100.00% <0.00%> (ø)`
...ry-instrumentation-user-interaction/src/version.ts	`100.00% <0.00%> (ø)`
...apackages/auto-instrumentations-web/src/version.ts	`0.00% <0.00%> (ø)`
...web/opentelemetry-plugin-react-load/src/version.ts	`100.00% <0.00%> (ø)`

aricart · 2021-09-22T18:31:59Z

Firstly, let me begin by saying that I appreciate the open-telemetry integration for nats.js! This looks like it’ll work!

As is this will provide some great information, but bear in mind that this will come at the expense of additional effort maintaining this codebase. NATS releases that change internals of the nats.js library (which is shared by different runtimes - such as nats.js, nats.deno and nats.ws) could require changes here. If the module dependency is tightly clamped and you use the public APIs wherever possible this shouldn’t be a significant issue.

You’ll want to be aware of potential performance implications. Because the wrapped library effectively performs callouts outside, it means that depending on the implementation of the tracing, it could interrupt the normal flow of the client, and create hard-to-diagnose issues such as slow-consumers, and/or memory growth that wouldn’t appear in the standard client. You may want to mention this in any corresponding documentation.

I would like to propose that there may be a couple of things that you can do. It would seem reasonable to add a generator to capture simple metrics that can be gathered as needed. Such as the number of messages per specific subject sent or received. Note this metric is only “counts”.

As for observability, with the NATS ecosystem, you have the ability to do some incredible things by creating a simple subscription that looks at all the messages being published and correlating them. This means that observability can be inserted into the system while respecting the data privacy between clients. This has the benefit that the work required for capturing and transmitting this information is not placed on the client being observed but on a separate process, which can then record, report, or summarize whatever metrics are required.

The one drawback is that actual identification of the publishing client will be difficult, as the entire point of NATS messaging is to de-couple producers from consumers. With that said, at least service clients, you can easily identify them, if their reply inbox is coordinated with a will-know client name. Using some other internal metric generated by the client (which could be broadcasted using NATS messages), you could relate specific load on the clients.

All in all, this looks like it’ll do the job and we really appreciate the contribution and hard work here!

ekosz · 2021-09-25T18:18:11Z

Hi @aricart. Thanks for taking a look at this PR! I understand your points completely. And I will definitely update the docs of this library to call out how this may have an affect on performance and/or memory growth.

But the main thing I was attempting to do with this addition was not necessarily add metric emission (to your point you can do that with a sidecar subscription), but instead distributed tracing. The ability to follow a message being emitted in one system and then follow it as it goes to each subscriber and how each of those subscriptions may emit additional messages you can trace/follow. I think tracing is a critical tool required of any large distributed system.

As for reaching into the private libraries/files of nats, you're again absolutely right that this approach is much less stable. But I don't think I could add the tracing I need with just the public API. AFAIK, the public API currently does not export the actual class that has the subscribe/publish/request functions. Instead it only imports the connect function that creates instance of that class. But for OT to instrument those calls it needs access to the class to patch those prototypes. If we wanted to only use public APIs, I think, we would need the nats team to expose more publicly. Maybe some function hooks libraries like OT can use to add additional instrumentation? Until then I think this will need to patch those private APIs. Unless there is a public API I'm missing? Or another idea you may have?

Thank you again for the detailed response!

ekosz · 2021-09-25T18:50:16Z

@aricart Actually, thinking about it, I may only need the connect function after all. If I patch that function, I can return Proxy objects where I hook into those subscribe/publish/request calls. I could also proxy the Message#respond calls as well so I would no longer need to reach into the ProtocalImpl class.

I'll try that out next over the coming days and let you know how it goes.

ekosz · 2021-09-26T17:37:42Z

@aricart I've switch to just using the public apis + proxy objects. Thank you for pushing me in that direction. This should be much more stable now.

ekosz · 2021-09-26T17:41:22Z

@blumamir I still haven't figured out a great way to get around the issue with context.with and yield calls in generators. Do you / anyone on the OpenTelemetry team have a suggestion on what I may try next?

blumamir · 2021-09-27T08:17:06Z

@blumamir I still haven't figured out a great way to get around the issue with context.with and yield calls in generators. Do you / anyone on the OpenTelemetry team have a suggestion on what I may try next?

Hi @ekosz , thanks for taking the time to add this instrumentation. I tried before to patch generators for the SQS messaging system in aws-sdk instrumentation, and had no luck in finding a method that worked for all cases, unfortunately. This is currently an open issue in SQS which I hope we can solve one day.

I'll take a second look to try and think about it again.

Am I setting the correct messaging attributes? This seems to be the first contrib package that adds instrumentation to a messaging library. I tried to make sure I followed the messaging standard, but I would love confirmation on that.

There are instrumentations for kafkajs, amqplib (RabbitMQ), and aws SQS messaging systems hosted here which you can examine for reference. Also, it's worth mentioning that there is a SIG group working on new stable messaging specification which will probably be ready in the following months. The current specification is in alpha and will probably change soon.

vmarchaud · 2021-09-27T19:59:07Z

I still haven't figured out a great way to get around the issue with context.with and yield calls in generators. Do you / anyone on the OpenTelemetry team have a suggestion on what I may try next?

There is an effort to add manual context propagation using attach/detach but its still wip: open-telemetry/opentelemetry-js-api#123. Might worth to try this after its implemented inside the SDK

blumamir · 2021-09-28T06:56:01Z

There is an effort to add manual context propagation using attach/detach but its still wip: open-telemetry/opentelemetry-js-api#123. Might worth to try this after its implemented inside the SDK

The issue I encountered with the approach was that I was not able to properly guarantee that the detach function will be called to restore the context in case of an exception or premature function return statement:

function * numbers() {
    yield 1;
    yield 2;
    yield 3;
}

const iterateOnNumbers = () => {
    try {
        for(const num of numbers()) {
            if(num === 2) {
                throw Error('throws without fully consuming the iterator till done');
            }
            console.log(`process num ${num}`);
        }
    } catch {
        console.log('exception handled');
    }    
    console.log('what do we expect the context here to be?');
}

const f = () => {
    iterateOnNumbers();
    console.log('what do we expect the context here to be?');
}

f();

If patching the generator's iteratornext function, and restoring the context when the iterator is done, the above code will leak the second iteration context to the code executed after the for loop

vmarchaud · 2021-09-28T08:19:16Z

The issue I encountered with the approach was that I was not able to properly guarantee that the detach function will be called to restore the context in case of an exception or premature function return statement:

I never experimented with generators so i can't say for sure but do they create a new async context for each func call ? If so this should be tracked by async hooks

blumamir · 2021-09-28T08:38:17Z

I never experimented with generators so i can't say for sure but do they create a new async context for each func call ? If so this should be tracked by async hooks

Tested it now and unfortunately it's not creating async hooks context

vmarchaud · 2021-09-28T08:50:32Z

Tested it now and unfortunately it's not creating async hooks context

Well if it doesn't create nor an async context or a promise context (with promise_hooks) i don't think we can support it :/

github-actions · 2021-11-29T06:35:33Z

This PR is stale because it has been open 60 days with no activity. Remove stale label or comment or this will be closed in 14 days.

github-actions · 2021-12-13T06:35:51Z

This PR was closed because it has been stale for 14 days with no activity.

neeraj-ec · 2022-12-07T08:48:26Z

Any progress on this one? We are using NATS as inter-service communication for our NestJs microservices and need this for distributed tracing.
Seems like this one is closed due to inactivity.

ekosz force-pushed the feat-add-nats branch from 45d9e82 to fc2262c Compare September 19, 2021 19:02

ekosz force-pushed the feat-add-nats branch 3 times, most recently from d1970e5 to 0bce19a Compare September 19, 2021 21:02

ekosz added 2 commits September 26, 2021 13:36

feat: Add opentelemetry-instrumentation-nats package

6a4f005

fix: switch to only use public api

e299ed3

ekosz force-pushed the feat-add-nats branch from 0ce8c5b to e299ed3 Compare September 26, 2021 17:36

fix: Use new test package

8c3d8fa

ekosz added 3 commits September 27, 2021 16:51

fix: Linting issues

69b69c2

feat(opentelemetry-instrumentation-nats) Add pasing test using callbacks

34a77b4

fix: update documentation

aaa58dc

ekosz mentioned this pull request Nov 9, 2021

aws-sdk sqs does not extract trace info when using promises DataDog/dd-trace-js#1680

Open

github-actions bot added the stale label Nov 29, 2021

github-actions bot closed this Dec 13, 2021

haimrait mentioned this pull request Jun 21, 2022

Instrumentation for nats #753

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: Add opentelemetry-instrumentation-nats package #667

feat: Add opentelemetry-instrumentation-nats package #667

ekosz commented Sep 19, 2021

linux-foundation-easycla bot commented Sep 19, 2021 •

edited

Loading

codecov bot commented Sep 19, 2021 •

edited

Loading

aricart commented Sep 22, 2021

ekosz commented Sep 25, 2021 •

edited

Loading

ekosz commented Sep 25, 2021 •

edited

Loading

ekosz commented Sep 26, 2021

ekosz commented Sep 26, 2021

blumamir commented Sep 27, 2021

vmarchaud commented Sep 27, 2021

blumamir commented Sep 28, 2021 •

edited

Loading

vmarchaud commented Sep 28, 2021

blumamir commented Sep 28, 2021

vmarchaud commented Sep 28, 2021

github-actions bot commented Nov 29, 2021

github-actions bot commented Dec 13, 2021

neeraj-ec commented Dec 7, 2022

feat: Add opentelemetry-instrumentation-nats package #667

feat: Add opentelemetry-instrumentation-nats package #667

Conversation

ekosz commented Sep 19, 2021

Which problem is this PR solving?

Short description of the changes

Work In Progress

linux-foundation-easycla bot commented Sep 19, 2021 • edited Loading

codecov bot commented Sep 19, 2021 • edited Loading

Codecov Report

aricart commented Sep 22, 2021

ekosz commented Sep 25, 2021 • edited Loading

ekosz commented Sep 25, 2021 • edited Loading

ekosz commented Sep 26, 2021

ekosz commented Sep 26, 2021

blumamir commented Sep 27, 2021

vmarchaud commented Sep 27, 2021

blumamir commented Sep 28, 2021 • edited Loading

vmarchaud commented Sep 28, 2021

blumamir commented Sep 28, 2021

vmarchaud commented Sep 28, 2021

github-actions bot commented Nov 29, 2021

github-actions bot commented Dec 13, 2021

neeraj-ec commented Dec 7, 2022

linux-foundation-easycla bot commented Sep 19, 2021 •

edited

Loading

codecov bot commented Sep 19, 2021 •

edited

Loading

ekosz commented Sep 25, 2021 •

edited

Loading

ekosz commented Sep 25, 2021 •

edited

Loading

blumamir commented Sep 28, 2021 •

edited

Loading