-
Notifications
You must be signed in to change notification settings - Fork 246
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Rework the trace pipeline towards statelessness #121
Comments
The most important point of this issue is that some of the symbols are retrieved/reported only once per agent lifetime. This can even be problematic with a stateful backend, if data is removed, manually or via automatic data retention policies. With a stateless protocol like the OTEL protocol, the issue becomes even more dominant. The agent core has been developed with a stateful protocol/backend in mind. So the switch to the stateless OTEL protocol requires changes in regards to caching (mostly symbols). The possibly most important change is to move the caching of symbols out of the agent core into the Consequently, the Possible solutions
@fabled works on a PoC PR to implement point 3 for further discussion and for doing benchmarks. Additional required work
|
Due to legacy reasons, each interpreter kept their own state of which dynamic metadata should be sent to the reporter. Several of these caches would never expire, causing caching issues in the otlp reporter module. This removes the caching state from all interpreters and pushes it to the reporter module. A new reporter API call FrameNeeded is added to query if a specific Frame is in the cache or not. Not all interpreter modules use the call as all the information might be available with little overhead. FrameMetadata is also updated to use the FrameID type for symmetry. Improved are: - reduced memory overhead as per-interpreter caches are removed - reporter module can now control which frames need resolving - fixes otlp to get the frames re-symbolized if its internal lru already forgot about the earlier symbolization information ref open-telemetry#121
Due to legacy reasons, each interpreter kept their own state of which dynamic metadata should be sent to the reporter. Several of these caches would never expire, causing caching issues in the otlp reporter module. This removes the caching state from all interpreters and pushes it to the reporter module. A new reporter API call FrameNeeded is added to query if a specific Frame is in the cache or not. Not all interpreter modules use the call as all the information might be available with little overhead. FrameMetadata is also updated to use the FrameID type for symmetry. Improved are: - reduced memory overhead as per-interpreter caches are removed - reporter module can now control which frames need resolving - fixes otlp to get the frames re-symbolized if its internal lru already forgot about the earlier symbolization information ref open-telemetry#121
Due to legacy reasons, each interpreter kept their own state of which dynamic metadata should be sent to the reporter. Several of these caches would never expire, causing caching issues in the otlp reporter module. This removes the caching state from all interpreters and pushes it to the reporter module. A new reporter API call FrameNeeded is added to query if a specific Frame is in the cache or not. Not all interpreter modules use the call as all the information might be available with little overhead. FrameMetadata is also updated to use the FrameID type for symmetry. Improved are: - reduced memory overhead as per-interpreter caches are removed - reporter module can now control which frames need resolving - fixes otlp to get the frames re-symbolized if its internal lru already forgot about the earlier symbolization information ref open-telemetry#121
Problem
Our trace processing pipeline is currently engineered towards a backend keeps the information that receives around forever in a bunch of places. When information was sent once, it often won't be sent again until agent restart. This is problematic for two reasons:
Affected information
The following information is currently prone to falling out of LRU without a chance of it ever being resent:
Rough outline of a solution
We need to rework the whole trace pipeline to ensure that all of this information is available all the time. There are two possible paths that we can pursue here:
We can probably get rid of
tracehandler
entirely. The caches that it maintainswill likely go away and the remaining few lines can be merged directly into
Tracer
.Sub-issues
The text was updated successfully, but these errors were encountered: