Skip to content

fix: Use ArrayPool instead of ThreadStatic field for serialization #2202

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

harnel-tngn
Copy link

There are situations where performing serialization or deserialization in property getter can be useful. However, since MessagePackSerializer uses thread-local byte[] array, invoking another serialization during serialization can lead to serialization bugs. This PR fixes bug by replacing thread-local field to rented byte[] array from ArrayPool. This change should not have any significant impact on other behaviours or any performances.

The following tables show the results of the SerializeBenchmark benchmark. AnswerDeserialize benchmarks are omitted, and only benchmarks using current version of MessagePack (extern aliased as newmsgpack) are included, and iteration count and warmup count modified to 10 and 5. The first table presents results before the replacement, and the second table shows results after the change, and table shows performance difference is negligible.

// * Summary *

BenchmarkDotNet v0.14.0, Windows 10 (10.0.19045.5965/22H2/2022Update)
Intel Core i7-14700, 1 CPU, 28 logical and 20 physical cores
.NET SDK 9.0.103
  [Host]   : .NET 8.0.17 (8.0.1725.26602), X64 RyuJIT AVX2
  ShortRun : .NET 8.0.17 (8.0.1725.26602), X64 RyuJIT AVX2

Job=ShortRun  EnvironmentVariables=DOTNET_TieredPGO=0  Jit=RyuJit
Platform=X64  Runtime=.NET 8.0  IterationCount=10
LaunchCount=1  WarmupCount=5

| Method            | Serializer         | Mean     | Error     | StdDev    | DataSize | Gen0   | Allocated |
|------------------ |------------------- |---------:|----------:|----------:|---------:|-------:|----------:|
| AnswerSerialize   | MessagePack_v2     | 1.430 us | 0.1087 us | 0.0719 us |     469B | 0.0286 |     496 B |
| AnswerSerialize   | MessagePackLz4_v2  | 1.963 us | 0.0748 us | 0.0495 us |     451B | 0.0267 |     480 B |
| AnswerSerialize   | MsgPack_v2_opt     | 1.304 us | 0.0270 us | 0.0161 us |     433B | 0.0267 |     464 B |
| AnswerSerialize   | MsgPack_v2_str_lz4 | 2.952 us | 0.1566 us | 0.1036 us |     868B | 0.0496 |     896 B |
| AnswerSerialize   | MsgPack_v2_string  | 1.680 us | 0.0651 us | 0.0431 us |    1264B | 0.0744 |    1288 B |

| Method            | Serializer         | Mean     | Error     | StdDev    | DataSize | Gen0   | Allocated |
|------------------ |------------------- |---------:|----------:|----------:|---------:|-------:|----------:|
| AnswerSerialize   | MessagePack_v2     | 1.441 us | 0.0273 us | 0.0180 us |     471B | 0.0286 |     496 B |
| AnswerSerialize   | MessagePackLz4_v2  | 1.979 us | 0.0458 us | 0.0273 us |     453B | 0.0267 |     480 B |
| AnswerSerialize   | MsgPack_v2_opt     | 1.381 us | 0.0735 us | 0.0437 us |     435B | 0.0267 |     464 B |
| AnswerSerialize   | MsgPack_v2_str_lz4 | 2.935 us | 0.0678 us | 0.0448 us |     862B | 0.0496 |     896 B |
| AnswerSerialize   | MsgPack_v2_string  | 1.664 us | 0.0229 us | 0.0120 us |    1262B | 0.0744 |    1288 B |

@harnel-tngn
Copy link
Author

@dotnet-policy-service agree

@AArnott
Copy link
Collaborator

AArnott commented Jun 18, 2025

Thank you for identifying this problem and sharing a solution. However, I'm concerned about making it even more possible to nest serializations, because that violates other security measures that this library makes. It's important that serializations not be nested, and in your example that you added in the test, that is a case where a class should not be deserializing its own data, it should be using a converter instead.

What is the symptom of what you saw before this fix? Maybe we can make that symptom far more obvious so people understand the problem and fix it properly.

@neuecc
Copy link
Member

neuecc commented Jun 18, 2025

Thanks.
In principle, I believe we should not use nesting, but it's common to accidentally create nested structures.
I feel like this unintentionally happens quite often, especially when implementing custom formatters.
I tend to make such mistakes carelessly...

I also thought ThreadStatic should be removed eventually.
Recently, the pattern of processing scratch-buffers with small stackalloc Span is widely seen in dotnet/runtime, so following that approach might not be bad.
Since it's bytes, allocating around 1024 should serve the scratch functionality well.
Since SequencePool handles ArrayPool, I don't think there's much need to take large sizes from the pool for scratch purposes.
However, the current BufferWriter is not designed that way (it cannot receive Span as scratch), so some refactoring would be necessary.

@harnel-tngn
Copy link
Author

Symptom of nested serialization follows. In the test scenario NestedSerializationTest.SerializeAndDeserialize, the content of testData.SerializedKey is as follows:

AA // string (Length 12)
74 65 73 74 73 74 72 69 6E 67

The serializer for the SerializeOnSerializeClass class

  • Writes an array header with size 1
  • Retrieve testData.SerializedKey property
  • Write byte array from retrieved testData.SerializedKey property

in serial. Therefore, the expected serialized content of testData is

91 // firarray (length 1)

C4 // bin 8 (testData.SerializedKey)
0B

AA // string (Length 12)
74 65 73 74 73 74 72 69 6E 67

However, while retrieving testData.SerializedKey property, invoking MessagePackSerializer.Serialize starts overwriting the scratchArray buffer. This results buffer just after testData.SerializedKey property retrieved:

AA // string (Length 12), overwritten by second MessagePackWriter during retrieving testData.SerializedKey
   // The first MessagePackWriter is pointing here, just after this (0xAA)

74 65 73 74 73 74 72 69 6E 67

When the serialization completes, final contents of buffer is:

AA // string (Length 12), overwritten by second MessagePackWriter during retrieving testData.SerializedKey

C4 // bin 8 (testData.SerializedKey), re-overwritten by the first MessagePackWriter
0B

AA // string (Length 12)
74 65 73 74 73 74 72 69 6E 67

This leads data malformation, unable to deserialize data.

If nested serialization is a security concern, I believe MessagePack should raise a proper exception. Using thread-local variable to track whether current thread is already serializing/deserializing should be sufficient to detect and prevent unintended nested serialization/deserialization.

Copy link
Collaborator

@AArnott AArnott left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At a minimum it seems like the thread local store should be cleared of the array while it is in use so that it doesn't get overwritten like that.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants