-
Notifications
You must be signed in to change notification settings - Fork 500
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: Add opentelemetry-instrumentation-nats package #667
Conversation
|
45d9e82
to
fc2262c
Compare
Codecov Report
@@ Coverage Diff @@
## main #667 +/- ##
==========================================
- Coverage 96.82% 96.68% -0.14%
==========================================
Files 9 13 +4
Lines 630 634 +4
Branches 124 124
==========================================
+ Hits 610 613 +3
- Misses 20 21 +1
|
d1970e5
to
0bce19a
Compare
Firstly, let me begin by saying that I appreciate the open-telemetry integration for nats.js! This looks like it’ll work! As is this will provide some great information, but bear in mind that this will come at the expense of additional effort maintaining this codebase. NATS releases that change internals of the nats.js library (which is shared by different runtimes - such as nats.js, nats.deno and nats.ws) could require changes here. If the module dependency is tightly clamped and you use the public APIs wherever possible this shouldn’t be a significant issue. You’ll want to be aware of potential performance implications. Because the wrapped library effectively performs callouts outside, it means that depending on the implementation of the tracing, it could interrupt the normal flow of the client, and create hard-to-diagnose issues such as slow-consumers, and/or memory growth that wouldn’t appear in the standard client. You may want to mention this in any corresponding documentation. I would like to propose that there may be a couple of things that you can do. It would seem reasonable to add a generator to capture simple metrics that can be gathered as needed. Such as the number of messages per specific subject sent or received. Note this metric is only “counts”. As for observability, with the NATS ecosystem, you have the ability to do some incredible things by creating a simple subscription that looks at all the messages being published and correlating them. This means that observability can be inserted into the system while respecting the data privacy between clients. This has the benefit that the work required for capturing and transmitting this information is not placed on the client being observed but on a separate process, which can then record, report, or summarize whatever metrics are required. The one drawback is that actual identification of the publishing client will be difficult, as the entire point of NATS messaging is to de-couple producers from consumers. With that said, at least service clients, you can easily identify them, if their reply inbox is coordinated with a will-know client name. Using some other internal metric generated by the client (which could be broadcasted using NATS messages), you could relate specific load on the clients. All in all, this looks like it’ll do the job and we really appreciate the contribution and hard work here! |
Hi @aricart. Thanks for taking a look at this PR! I understand your points completely. And I will definitely update the docs of this library to call out how this may have an affect on performance and/or memory growth. But the main thing I was attempting to do with this addition was not necessarily add metric emission (to your point you can do that with a sidecar subscription), but instead distributed tracing. The ability to follow a message being emitted in one system and then follow it as it goes to each subscriber and how each of those subscriptions may emit additional messages you can trace/follow. I think tracing is a critical tool required of any large distributed system. As for reaching into the private libraries/files of nats, you're again absolutely right that this approach is much less stable. But I don't think I could add the tracing I need with just the public API. AFAIK, the public API currently does not export the actual class that has the Thank you again for the detailed response! |
@aricart Actually, thinking about it, I may only need the I'll try that out next over the coming days and let you know how it goes. |
0ce8c5b
to
e299ed3
Compare
@aricart I've switch to just using the public apis + proxy objects. Thank you for pushing me in that direction. This should be much more stable now. |
@blumamir I still haven't figured out a great way to get around the issue with |
Hi @ekosz , thanks for taking the time to add this instrumentation. I tried before to patch generators for the SQS messaging system in I'll take a second look to try and think about it again.
There are instrumentations for |
There is an effort to add manual context propagation using |
The issue I encountered with the approach was that I was not able to properly guarantee that the function * numbers() {
yield 1;
yield 2;
yield 3;
}
const iterateOnNumbers = () => {
try {
for(const num of numbers()) {
if(num === 2) {
throw Error('throws without fully consuming the iterator till done');
}
console.log(`process num ${num}`);
}
} catch {
console.log('exception handled');
}
console.log('what do we expect the context here to be?');
}
const f = () => {
iterateOnNumbers();
console.log('what do we expect the context here to be?');
}
f(); If patching the generator's iterator |
I never experimented with generators so i can't say for sure but do they create a new async context for each func call ? If so this should be tracked by async hooks |
Tested it now and unfortunately it's not creating async hooks context |
Well if it doesn't create nor an async context or a promise context (with |
This PR is stale because it has been open 60 days with no activity. Remove stale label or comment or this will be closed in 14 days. |
This PR was closed because it has been stale for 14 days with no activity. |
Any progress on this one? We are using |
Which problem is this PR solving?
There is no instrumentation for nats.
Short description of the changes
This adds a new instrumentation package for the nats messaging system. Specifically for its nats.js library.
Work In Progress
Currently this PR is a work in progress. This is my first time contributing an OpenTelemetry package and I wanted to get the PR out early in a draft state to make sure I'm on the write path before moving further along.
TODOS
Open Questions
ContextAPI#with
function. I'm not sure how to combine#with
with ayield
call. This is causing a test to currently fail as the context is not propagating properly without it.