-
Notifications
You must be signed in to change notification settings - Fork 47
Add nvarchar (SQL_C_WCHAR) Support to .NET Core C# Language Extension #67
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
This PR adds comprehensive support for Unicode string data types (nvarchar and nchar, SQL_C_WCHAR) to the .NET Core C# language extension for SQL Server. Previously, the extension only supported ANSI character types (varchar/char). The implementation enables full Unicode data exchange through input columns, output columns, and input/output parameters.
Key Changes:
- Added SQL_C_WCHAR data type support with UTF-16 encoding/decoding throughout the data pipeline
- Implemented UTF-8 byte-level string splitting to fix multi-byte character handling bugs
- Updated memory pinning strategy from
fixedstatements toGCHandle.AllocwithGCHandleType.Pinnedfor proper lifetime management - Added comprehensive test coverage for nvarchar parameters with Unicode characters (Chinese, Cyrillic)
Reviewed changes
Copilot reviewed 13 out of 13 changed files in this pull request and generated 12 comments.
Show a summary per file
| File | Description |
|---|---|
Sql.cs |
Added DotNetWChar enum value and MinUtf16CharSize constant to DataTypeSize dictionary |
InteropUtils.cs |
Added UTF16PtrToStr() overloads for converting unmanaged UTF-16 pointers to managed strings |
DataSetUtils.cs |
Added UTF8ByteSplitToArray() method to correctly split UTF-8 byte buffers without character/byte offset issues |
CSharpInputDataSet.cs |
Added DotNetWChar case for reading UTF-16 input data with proper byte-to-character conversion; updated DotNetChar to use byte-level splitting |
CSharpOutputDataSet.cs |
Modified ExtractColumns() to accept input column metadata; added DotNetWChar case with GetUnicodeStringArray(); updated GC pinning from fixed to GCHandle |
CSharpParamContainer.cs |
Added DotNetWChar cases in AddParamValue() and ReplaceParamValue(); added ReplaceUnicodeStringParam() helper method |
CSharpSession.cs |
Updated to pass input column metadata to ExtractColumns() for type preservation |
CSharpDataSet.cs |
Exposed Columns property to enable metadata propagation |
README.md |
Updated supported data types list to include SQL_C_WCHAR and nvarchar(n) |
CSharpExtensionApiTests.h |
Added GetWStringOutputParam() test helper declaration |
CSharpTestExecutor.cs |
Added CSharpTestExecutorWStringParam class for Unicode output parameter testing |
CSharpInitParamTests.cpp |
Expanded InitWStringParamTest with comprehensive nchar/nvarchar test scenarios including Unicode characters |
CSharpGetOutputParamTests.cpp |
Added GetWStringOutputParamTest and GetWStringOutputParam() helper for validating Unicode output parameters |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
language-extensions/dotnet-core-CSharp/src/managed/CSharpInputDataSet.cs
Outdated
Show resolved
Hide resolved
language-extensions/dotnet-core-CSharp/src/managed/CSharpOutputDataSet.cs
Outdated
Show resolved
Hide resolved
language-extensions/dotnet-core-CSharp/src/managed/CSharpOutputDataSet.cs
Outdated
Show resolved
Hide resolved
language-extensions/dotnet-core-CSharp/src/managed/CSharpParamContainer.cs
Show resolved
Hide resolved
language-extensions/dotnet-core-CSharp/src/managed/CSharpParamContainer.cs
Outdated
Show resolved
Hide resolved
language-extensions/dotnet-core-CSharp/src/managed/CSharpInputDataSet.cs
Outdated
Show resolved
Hide resolved
language-extensions/dotnet-core-CSharp/src/managed/CSharpOutputDataSet.cs
Outdated
Show resolved
Hide resolved
language-extensions/dotnet-core-CSharp/src/managed/CSharpOutputDataSet.cs
Outdated
Show resolved
Hide resolved
language-extensions/dotnet-core-CSharp/src/managed/CSharpOutputDataSet.cs
Outdated
Show resolved
Hide resolved
language-extensions/dotnet-core-CSharp/src/managed/CSharpOutputDataSet.cs
Outdated
Show resolved
Hide resolved
What will happen in case of a very large string? what was happening in the past and then after this code? |
|
How did you find the following issue?
Do we have a test to cover it now? |
language-extensions/dotnet-core-CSharp/test/src/native/CSharpInitParamTests.cpp
Show resolved
Hide resolved
language-extensions/dotnet-core-CSharp/test/src/native/CSharpInitParamTests.cpp
Show resolved
Hide resolved
language-extensions/dotnet-core-CSharp/test/src/native/CSharpGetOutputParamTests.cpp
Show resolved
Hide resolved
language-extensions/dotnet-core-CSharp/src/managed/CSharpOutputDataSet.cs
Show resolved
Hide resolved
language-extensions/dotnet-core-CSharp/src/managed/CSharpOutputDataSet.cs
Show resolved
Hide resolved
language-extensions/dotnet-core-CSharp/src/managed/CSharpParamContainer.cs
Show resolved
Hide resolved
language-extensions/dotnet-core-CSharp/src/managed/CSharpParamContainer.cs
Show resolved
Hide resolved
language-extensions/dotnet-core-CSharp/src/managed/CSharpParamContainer.cs
Outdated
Show resolved
Hide resolved
monamaki
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please see inline comments.
The previous implementation converted the entire UTF-8 byte buffer to a single .NET string, then tried to chunk it by character positions. This caused two issues: Multi-byte character corruption: The byte offsets in strLenOrNullMap represent UTF-8 byte lengths, but after converting to a .NET string (UTF-16), character positions no longer aligned with byte boundaries. For example, "€" is 3 bytes in UTF-8 but 1 character in .NET, so the chunking logic would split at wrong positions. Memory inefficiency for large strings: A 1GB UTF-8 buffer would create a ~2GB intermediate string (since .NET strings are UTF-16), causing ~3GB peak memory usage before the string was even chunked. The fix processes each row's bytes individually using the exact byte length from strLenOrNullMap[i], which correctly handles multi-byte characters and avoids creating a large intermediate string. |
language-extensions/dotnet-core-CSharp/src/managed/CSharpInputDataSet.cs
Show resolved
Hide resolved
language-extensions/dotnet-core-CSharp/src/managed/CSharpOutputDataSet.cs
Show resolved
Hide resolved
language-extensions/dotnet-core-CSharp/src/managed/CSharpSession.cs
Outdated
Show resolved
Hide resolved
language-extensions/dotnet-core-CSharp/src/managed/CSharpOutputDataSet.cs
Outdated
Show resolved
Hide resolved
language-extensions/dotnet-core-CSharp/src/managed/CSharpParamContainer.cs
Show resolved
Hide resolved
language-extensions/dotnet-core-CSharp/test/include/CSharpExtensionApiTests.h
Show resolved
Hide resolved
Aniruddh25
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Waiting on fixing some copilot suggestions.
0f3f1c3 to
9f79549
Compare
9f79549 to
0ac9041
Compare
Changed void*[] and int*[] arrays to IntPtr[] arrays because pointer arrays are not blittable and cannot be pinned with GCHandleType.Pinned on .NET 6. IntPtr is blittable and works correctly across all .NET versions.
Add nvarchar (SQL_C_WCHAR) Support to .NET Core C# Language Extension
Summary
This PR adds full support for
nvarchar(n)andnchar(n)SQL data types (SQL_C_WCHAR) to the .NET Core C# language extension. Previously, the extension only supported ANSI character types (varchar/char). With this change, Unicode string data can now be passed to and returned from C# external scripts viasp_execute_external_script.Why This Change Is Needed
The C# language extension previously lacked support for Unicode character types, which limited its ability to handle multilingual data. Many SQL Server applications use
nvarcharcolumns to store Unicode text (e.g., Chinese, Arabic, Cyrillic characters). This change enables full Unicode support for:@input_data_1)@params)What Changed
Core Data Type Support
src/managed/utils/Sql.csDotNetWChartoDataTypeSizedictionary withMinUtf16CharSize(2 bytes)src/managed/utils/InteropUtils.csUTF16PtrToStr()overloads for converting unmanaged UTF-16 strings to managed stringssrc/managed/utils/DataSetUtils.csUTF8ByteSplitToArray()method for proper multi-byte UTF-8 character handlingInput Data Handling
src/managed/CSharpInputDataSet.csSqlDataType.DotNetWCharthat reads UTF-16 encoded data and correctly handles byte-to-character length conversionsrc/managed/CSharpDataSet.csColumnsproperty to allow metadata propagation to output datasetOutput Data Handling
src/managed/CSharpOutputDataSet.csExtractColumns()to accept input column metadata for preserving nvarchar types- Added
DotNetWCharcase inExtractColumn()to emit UTF-16 data- Added
GetUnicodeStringArray()method for building UTF-16 output buffers- Updated
GetStrLenNullMap()to report correct byte lengths for UTF-16 strings- Fixed pointer pinning to use
GCHandle.Alloc()withGCHandleType.Pinnedinstead offixedstatementssrc/managed/CSharpSession.csExtractColumns()to preserve data type informationParameter Handling
src/managed/CSharpParamContainer.csDotNetWCharcase inAddParamValue()for reading Unicode input parameters- Added
DotNetWCharcase inReplaceParamValue()for writing Unicode output parameters- Added
ReplaceUnicodeStringParam()method for UTF-16 byte conversionDocumentation
README.mdSQL_C_WCHARandnvarchar(n)Test Coverage
test/include/CSharpExtensionApiTests.hGetWStringOutputParam()test helper declarationtest/src/managed/CSharpTestExecutor.csCSharpTestExecutorWStringParamclass for Unicode output parameter testingtest/src/native/CSharpInitParamTests.cppInitWStringParamTestwith comprehensive nchar/nvarchar test cases including Unicode characters (Chinese, Cyrillic)test/src/native/CSharpGetOutputParamTests.cppGetWStringOutputParamTestandGetWStringOutputParam()helper for testing Unicode output parametersBug Fixes Included
UTF-8 Multi-byte Character Handling
Fixed a bug in
CSharpInputDataSet.cswhere multi-byte UTF-8 characters (e.g., Euro symbol€= 3 bytes) were incorrectly split. The previous implementation:Substring()with byte lengthsThis failed because multi-byte UTF-8 characters become single characters in .NET strings, causing byte-based offsets to be wrong.
Fix: Added
UTF8ByteSplitToArray()inDataSetUtils.csthat processes raw UTF-8 bytes directly, splitting by byte offsets first and then decoding each segment independently.Pointer Pinning Memory Safety
Fixed potential memory corruption in
CSharpOutputDataSet.RetrieveColumns()by replacingfixedstatements with properGCHandle.Alloc(..., GCHandleType.Pinned)to ensure arrays remain pinned for the lifetime of the native call.Testing
Unit Tests (Native C++)
All 54 native unit tests pass, including new nvarchar-specific tests:
Key tests:
InitWStringParamTest- Tests nchar/nvarchar input parameters with various sizes and Unicode charactersGetWStringOutputParamTest- Tests nvarchar output parameters with truncation and null handlingGetStringResultsTest- Validates UTF-8 string handling (includes multi-byte character fix)E2E Tests (TestShell)
All 6 E2E tests pass:
The
NvarcharPassthroughtest specifically validates end-to-end nvarchar column handling.Data Flow Diagram
Breaking Changes
None. This is a backwards-compatible addition. Existing varchar/char columns continue to work as before.
Dependencies
No new dependencies added.
How to Test Manually
Checklist