Tags: DataDog/dd-trace-dotnet
Tags
[OTEL] Vendoring OtlpGrpcExportClient And Enabling OTLP Metrics gRPC … …Tests (#7666) ## Summary of changes Added gRPC protocol support to the OTLP metrics exporter by vendoring OpenTelemetry's gRPC transport client. The implementation now supports `grpc` which also was updated to be the default protocol value. ## Reason for change Customers need gRPC support for OTLP metrics export to maintain compatibility with OpenTelemetry ecosystem standards and their existing infrastructure that may require or prefer gRPC transport. ## Implementation details **Vendored dependencies:** - Vendored `OpenTelemetry.Exporter.OpenTelemetryProtocol` v1.13.1(lastest ATM with included fixes) gRPC transport client (`OtlpGrpcExportClient` and related helpers) - Created `OpenTelemetryStubs.cs` with minimal stub implementations to avoid vendoring the entire OpenTelemetry SDK: - `Guard` and `UriExtensions` from `OpenTelemetry.Internal` (Would need to discard deletion after usng vendoring command) - `OtlpExporterOptions` (minimal configuration class) - `OtlpExportProtocol` enum (kept in stub due to namespace issues with the original being in parent namespace `OpenTelemetry.Exporter`) - `OpenTelemetryProtocolExporterEventSource` (stub EventSource for logging) **Core changes:** - Updated `OtlpExporter.cs` to instantiate and use the vendored `OtlpGrpcExportClient` when protocol is `Grpc` - Added 5-byte gRPC message frame prefix (compression flag + message length) to protobuf payloads for gRPC transport - Modified `OtlpMetricsSerializer.cs` to optionally reserve bytes at the start of the buffer for the gRPC frame header **Vendoring infrastructure:** - Added `AddOpenTelemetryUsings` transform to inject common `using` directives needed by vendored OTel files - Configured exclusions to vendor only the gRPC transport client, not the full OTLP exporter ## Test coverage - Updated `OpenTelemetrySdkTests.SubmitsOtlpMetrics` to test both `http/protobuf` and `grpc` protocols - gRPC tests use the `dd-apm-test-agent` container - All tests validate metrics are correctly exported and received by the respective agents ## DLL File Size Difference After building both `master` branch and this branch locally then comparing file size I got: |Target Framework | Master (bytes) | Feature (bytes) | Difference (bytes) | Difference (%) | Impact | |------------------|----------------|-----------------|-------------------|----------------|--------| | net6.0 | 8,252,416 | 8,270,336 | **+17,920** | +0.22% | +17.5 KB | | netcoreapp3.1 | 8,173,056 | 8,190,976 | **+17,920** | +0.22% | +17.5 KB | ## Other Details [APMAPI-1679](https://datadoghq.atlassian.net/browse/APMAPI-1679) [APMAPI-1679]: https://datadoghq.atlassian.net/browse/APMAPI-1679?atlOrigin=eyJpIjoiNWRkNTljNzYxNjVmNDY3MDlhMDU5Y2ZhYzA5YTRkZjUiLCJwIjoiZ2l0aHViLWNvbS1KU1cifQ
Fix for sending duplicate logs when using Agentless Logging in Azure … …Function host (#7383) ## Summary of changes Disables agentless logging (for the Azure Functions Host Process) if we detect that we are instrumenting an Azure Functions host process. Previously this would cause us to duplicate every log sent from the worker process. This bug fix can be reverted by setting the new configuration key `DD_LOGS_DIRECT_SUBMISSION_AZURE_FUNCTIONS_HOST_ENABLED` to `true`. ## Reason for change If customers are running with the Isolated Azure Function model we will be instrumenting two applications, one is the function host process the other is the worker process. If they have direct log submission enabled the function host ends up duplicating the logs from the function process which results in us shipping two nearly identical logs. This behavior isn't ideal as the duplicate log is likely not valuable so we've disabled agentless logging in the azure functions host process. ## Implementation details Added `IsRunningInAzureFunctionsHost()` to `EnvironmentHelpers.cs` which allows for a rough detection of whether we are running on the function host using the following logic: - Is `FUNCTIONS_WORKER_RUNTIME` present AND set to `dotnet-isolated`? - Are both `--functions-worker-id` or `--workerId` NOT present in the command line? If both are true we treat that scenario as being running in the function host - otherwise we are likely the worker process I wasn't able to find a more robust way of checking, but when looking at the various log output that I had the `--functions-worker-id` or `--workerId` seemed to always be called by the function host. ``` [2025-10-03T16:01:42.901Z] Reading functions metadata (Worker) [2025-10-03T16:01:47.176Z] { [2025-10-03T16:01:47.177Z] "ProcessId": 71080, [2025-10-03T16:01:47.178Z] "RuntimeIdentifier": "win-x64", [2025-10-03T16:01:47.179Z] "WorkerVersion": "2.0.0.0", [2025-10-03T16:01:47.180Z] "ProductVersion": "2.0.0\u002Bd8b5fe998a8c92819b8ee41d2569d2525413e9c5", [2025-10-03T16:01:47.181Z] "FrameworkDescription": ".NET 9.0.9", [2025-10-03T16:01:47.182Z] "OSDescription": "Microsoft Windows 10.0.26100", [2025-10-03T16:01:47.183Z] "OSArchitecture": "X64", [2025-10-03T16:01:47.184Z] "CommandLine": "C:\\Users\\steven.bouwkamp\\source\\repos\\dd-trace-dotnet\\artifacts\\bin\\Samples.AzureFunctions.V4Isolated.AspNetCore\\debug_net9.0\\Samples.AzureFunctions.V4Isolated.AspNetCore.dll --host 127.0.0.1 --port 65401 --workerId e94d23fd-cd3c-4780-a3e3-4980d7b0f644 --requestId 6dba68ac-1954-466a-aeb4-9570cc9b12c2 --grpcMaxMessageLength 2147483647 --functions-uri http://127.0.0.1:65401/ --functions-worker-id e94d23fd-cd3c-4780-a3e3-4980d7b0f644 --functions-request-id 6dba68ac-1954-466a-aeb4-9570cc9b12c2 --functions-grpc-max-message-length 2147483647" [2025-10-03T16:01:47.185Z] } ``` - Added `DD_LOGS_DIRECT_SUBMISSION_AZURE_FUNCTIONS_HOST_ENABLED` that defaults to `false` to disable the duplicate logs from being sent. ## Test coverage - Added a new test project and test for having host logs enabled / disabled. - The reason why I added a new test project instead of re-using an existing one was because when I re-ran the Function application in our tests multiple times the `func.exe` would fail to obtain a lock and would need to wait some period of time to for recovery after each subsequent run of the same application. I think this is because we end the `func.exe` process with a `Kill()`. Making a new project wasn't ideal but was a quick and simple workaround. ## Other details Fixes SLES-2364 <!--⚠️ Note: Where possible, please obtain 2 approvals prior to merging. Unless CODEOWNERS specifies otherwise, for external teams it is typically best to have one review from a team member, and one review from apm-dotnet. Trivial changes do not require 2 reviews. MergeQueue is NOT enabled in this repository. If you have write access to the repo, the PR has 1-2 approvals (see above), and all of the required checks have passed, you can use the Squash and Merge button to merge the PR. If you don't have write access, or you need help, reach out in the #apm-dotnet channel in Slack. -->
Fix the `verify_app_trimming_changes_are_persisted` job (#7542) ## Summary of changes Fixes the `verify_app_trimming_changes_are_persisted` job ## Reason for change The trimming job is currently trying to build the native code, but [we're getting this error](https://github.com/DataDog/dd-trace-dotnet/actions/runs/17909695789/job/50921858072?pr=7287): ``` C:\Program Files\Microsoft Visual Studio\2022\Enterprise\MSBuild\Microsoft\VC\v170\Microsoft.Cpp.WindowsSDK.targets(46,5): error MSB8036: The Windows SDK version 10.0.19041.0 was not found. Install the required version of Windows SDK or change the SDK version in the project property pages or by right-clicking the solution and selecting "Retarget solution". ``` We only need to build the managed code to regenerate the trimming file, so fix that ## Implementation details `BuildTracerHome` -> `BuildManagedTracerHome` ## Test coverage If this PR passes, we're good ## Other details Blocking CI in general
Hotfix v3.26.3 - Update CI with Windows signing remediations (#7527) ## Summary of changes This cherry-picks the commits related to resolving the issues that we had with not correctly signing Windows artifacts. ## Reason for change The remediations were only on `master` causing the hotfix to not be signed again. This resolves that. ## Implementation details `git cherry-pick` the three commits ## Test coverage If the build works and says that everything gets correctly signed again then 👍 ## Other details <!-- Fixes #{issue} --> <!--⚠️ Note: Where possible, please obtain 2 approvals prior to merging. Unless CODEOWNERS specifies otherwise, for external teams it is typically best to have one review from a team member, and one review from apm-dotnet. Trivial changes do not require 2 reviews. MergeQueue is NOT enabled in this repository. If you have write access to the repo, the PR has 1-2 approvals (see above), and all of the required checks have passed, you can use the Squash and Merge button to merge the PR. If you don't have write access, or you need help, reach out in the #apm-dotnet channel in Slack. --> --------- Co-authored-by: Andrew Lock <andrew.lock@datadoghq.com> Co-authored-by: Zach Montoya <zach.montoya@datadoghq.com> Co-authored-by: NachoEchevarria <53266532+NachoEchevarria@users.noreply.github.com> Co-authored-by: Lucas Pimentel <lucas.pimentel@datadoghq.com>
Fix GRPC IAST tests (#7485) ## Summary of changes In a previous PR, we updated the [GRPC sample](https://github.com/DataDog/dd-trace-dotnet/pull/7457/files#diff-a584760853efe2efa5b346a11c7a95486f1c9aa7700bcdfb97729eba34e23135). This has affected the IAST tests, that use it. Since the code ownership was not set to ASM, the previous PR passed. This PR: * Updates the snapshots of the IAST tests. The location of the vulnerability has changed (line number, changing the hash of the vulnerability) This particular sample has debug information, so line numbers are taken into account. * The ownership of the sample has changed to include security. * Deleted debug info in test. ## Reason for change ## Implementation details ## Test coverage ## Other details <!-- Fixes #{issue} --> <!--⚠️ Note: Where possible, please obtain 2 approvals prior to merging. Unless CODEOWNERS specifies otherwise, for external teams it is typically best to have one review from a team member, and one review from apm-dotnet. Trivial changes do not require 2 reviews. MergeQueue is NOT enabled in this repository. If you have write access to the repo, the PR has 1-2 approvals (see above), and all of the required checks have passed, you can use the Squash and Merge button to merge the PR. If you don't have write access, or you need help, reach out in the #apm-dotnet channel in Slack. -->
[Tracer] fix: Re-use runtime metrics writer resources to limit memory… … growth (#7434) ## Summary of changes Updates `TracerManagerFactory.CreateTracerManager` to pass and re-use the previous `RuntimeMetricsWriter`, if runtime metrics is enabled for the new TracerManager. We still create a new `IDogStatsd` client every time, so any updated Agent settings are adopted by the DogStatsd client, but this could be further optimized at a later time to re-use the client if none of its configuration (e.g. host/port/tags) have changed. ## Reason for change We've observed a scenario where the number of `RuntimeEventListener` instances continues to grow, consuming more and more memory. This happens whenever new Dynamic Configuration settings are received by the tracer and runtime metrics are enabled. This PR resolves the issue. ## Implementation details When creating the new `TracerManager`, pass in the previous `RuntimeMetricsWriter` instance and only update the `IDogStatsd` object with new settings. This makes sure that we maintain only one `RuntimeMetricsWriter` instance while getting up-to-date DogStatsD settings throughout the application lifetime. ## Test coverage Adds a small unit test to confirm that the previous `RuntimeMetricsWriter` is re-used. Additionally, local testing was done to confirm that the number of `Datadog.Trace.RuntimeMetrics.RuntimeEventListener` objects does not grow when Dynamic Configuration is updated. ### Without the fix After some number of Dynamic Configuration settings were made in the Datadog UI, a dump was taken with the following analysis: ``` > dumpheap -type Datadog.Trace.RuntimeMetrics.RuntimeEventListener Address MT Size 0159c3c7fe88 7ffe4e801810 88 0159c442ed28 7ffe4e801810 88 0159c44eac60 7ffe4e801810 88 0159c4500f00 7ffe4e801810 88 0159c454ea68 7ffe4e801810 88 Statistics: MT Count TotalSize Class Name 7ffe4e801810 5 440 Datadog.Trace.RuntimeMetrics.RuntimeEventListener Total 5 objects, 440 bytes ``` Then after one additional Dynamic Configuration update was made in the Datadog UI, a dump was taken with the following analysis: ``` > dumpheap -type Datadog.Trace.RuntimeMetrics.RuntimeEventListener Address MT Size 0159c3c7fe88 7ffe4e801810 88 0159c442ed28 7ffe4e801810 88 0159c44eac60 7ffe4e801810 88 0159c4500f00 7ffe4e801810 88 0159c454ea68 7ffe4e801810 88 0159c61fef48 7ffe4e801810 88 Statistics: MT Count TotalSize Class Name 7ffe4e801810 6 528 Datadog.Trace.RuntimeMetrics.RuntimeEventListener Total 6 objects, 528 bytes ``` ### With the fix After some number of Dynamic Configuration settings were made in the Datadog UI, a dump was taken with the following analysis: ``` > dumpheap -type Datadog.Trace.RuntimeMetrics.RuntimeEventListener Address MT Size 01527107faf0 7ffe4e7acc00 88 Statistics: MT Count TotalSize Class Name 7ffe4e7acc00 1 88 Datadog.Trace.RuntimeMetrics.RuntimeEventListener Total 1 objects, 88 bytes ``` Then after one additional Dynamic Configuration update was made in the Datadog UI, a dump was taken with the following analysis: ``` > dumpheap -type Datadog.Trace.RuntimeMetrics.RuntimeEventListener Address MT Size 01527107faf0 7ffe4e7acc00 88 Statistics: MT Count TotalSize Class Name 7ffe4e7acc00 1 88 Datadog.Trace.RuntimeMetrics.RuntimeEventListener Total 1 objects, 88 bytes ``` ## Other details <!-- Fixes #{issue} --> <!--⚠️ Note: Where possible, please obtain 2 approvals prior to merging. Unless CODEOWNERS specifies otherwise, for external teams it is typically best to have one review from a team member, and one review from apm-dotnet. Trivial changes do not require 2 reviews. MergeQueue is NOT enabled in this repository. If you have write access to the repo, the PR has 1-2 approvals (see above), and all of the required checks have passed, you can use the Squash and Merge button to merge the PR. If you don't have write access, or you need help, reach out in the #apm-dotnet channel in Slack. --> --------- Co-authored-by: Andrew Lock <andrew.lock@datadoghq.com>
Removes exception throwing during shutdown of dynamic instrumentation (… …#7375) ## Summary of changes Removes exception throwing in "success" path, even when DI is disabled ## Reason for change #7304 introduced a bunch of changes, but the use of `CancellationTokenSource` resulted in exceptions being thrown in the "happy" shutdown path, which can cause crashes in buggy versions of .NET (i.e. all of them, currently) ## Implementation details Replace usages of `TaskCompletionSource<bool>` with `CancellationTokenSource` ## Test coverage Covered by existing - checked the execution tests to make sure the exception count has gone back own. ## Other details Discovered some additional issues that need to be addressed I think. Most importantly, `SafeDisposal` looks like a band-aid due to unclear lifetime management. We should refactor the code to not require it i.e. `Disposing` types should be safe and should not throw (regardless of whether we catch the exception). Additionally, this looks like it does quite a lot of work even when DI is disabled. I would suggest we refactor it so that it doesn't do a bunch of work in cases where it's never enabled.
PreviousNext