Add nvarchar (SQL_C_WCHAR) Support to .NET Core C# Language Extension #67

SicongLiu2000 · 2025-12-15T20:09:05Z

Add nvarchar (SQL_C_WCHAR) Support to .NET Core C# Language Extension

Summary

This PR adds full support for nvarchar(n) and nchar(n) SQL data types (SQL_C_WCHAR) to the .NET Core C# language extension. Previously, the extension only supported ANSI character types (varchar/char). With this change, Unicode string data can now be passed to and returned from C# external scripts via sp_execute_external_script.

Why This Change Is Needed

The C# language extension previously lacked support for Unicode character types, which limited its ability to handle multilingual data. Many SQL Server applications use nvarchar columns to store Unicode text (e.g., Chinese, Arabic, Cyrillic characters). This change enables full Unicode support for:

Input data columns (@input_data_1)
Output data columns (result sets)
Input/output parameters (@params)

What Changed

Core Data Type Support

File	Changes
`src/managed/utils/Sql.cs`	Added `DotNetWChar` to `DataTypeSize` dictionary with `MinUtf16CharSize` (2 bytes)
`src/managed/utils/InteropUtils.cs`	Added `UTF16PtrToStr()` overloads for converting unmanaged UTF-16 strings to managed strings
`src/managed/utils/DataSetUtils.cs`	Added `UTF8ByteSplitToArray()` method for proper multi-byte UTF-8 character handling

Input Data Handling

File	Changes
`src/managed/CSharpInputDataSet.cs`	Added case for `SqlDataType.DotNetWChar` that reads UTF-16 encoded data and correctly handles byte-to-character length conversion
`src/managed/CSharpDataSet.cs`	Exposed `Columns` property to allow metadata propagation to output dataset

Output Data Handling

File	Changes
`src/managed/CSharpOutputDataSet.cs`	- Modified `ExtractColumns()` to accept input column metadata for preserving nvarchar types - Added `DotNetWChar` case in `ExtractColumn()` to emit UTF-16 data - Added `GetUnicodeStringArray()` method for building UTF-16 output buffers - Updated `GetStrLenNullMap()` to report correct byte lengths for UTF-16 strings - Fixed pointer pinning to use `GCHandle.Alloc()` with `GCHandleType.Pinned` instead of `fixed` statements
`src/managed/CSharpSession.cs`	Pass input column metadata to `ExtractColumns()` to preserve data type information

Parameter Handling

File	Changes
`src/managed/CSharpParamContainer.cs`	- Added `DotNetWChar` case in `AddParamValue()` for reading Unicode input parameters - Added `DotNetWChar` case in `ReplaceParamValue()` for writing Unicode output parameters - Added `ReplaceUnicodeStringParam()` method for UTF-16 byte conversion

Documentation

File	Changes
`README.md`	Updated supported data types list to include `SQL_C_WCHAR` and `nvarchar(n)`

Test Coverage

File	Changes
`test/include/CSharpExtensionApiTests.h`	Added `GetWStringOutputParam()` test helper declaration
`test/src/managed/CSharpTestExecutor.cs`	Added `CSharpTestExecutorWStringParam` class for Unicode output parameter testing
`test/src/native/CSharpInitParamTests.cpp`	Expanded `InitWStringParamTest` with comprehensive nchar/nvarchar test cases including Unicode characters (Chinese, Cyrillic)
`test/src/native/CSharpGetOutputParamTests.cpp`	Added `GetWStringOutputParamTest` and `GetWStringOutputParam()` helper for testing Unicode output parameters

Bug Fixes Included

UTF-8 Multi-byte Character Handling

Fixed a bug in CSharpInputDataSet.cs where multi-byte UTF-8 characters (e.g., Euro symbol € = 3 bytes) were incorrectly split. The previous implementation:

Converted the entire UTF-8 byte buffer to a .NET string
Used Substring() with byte lengths

This failed because multi-byte UTF-8 characters become single characters in .NET strings, causing byte-based offsets to be wrong.

Fix: Added UTF8ByteSplitToArray() in DataSetUtils.cs that processes raw UTF-8 bytes directly, splitting by byte offsets first and then decoding each segment independently.

Pointer Pinning Memory Safety

Fixed potential memory corruption in CSharpOutputDataSet.RetrieveColumns() by replacing fixed statements with proper GCHandle.Alloc(..., GCHandleType.Pinned) to ensure arrays remain pinned for the lifetime of the native call.

Testing

Unit Tests (Native C++)

All 54 native unit tests pass, including new nvarchar-specific tests:

[==========] 54 tests from 1 test suite ran.
[  PASSED  ] 54 tests.

Key tests:

InitWStringParamTest - Tests nchar/nvarchar input parameters with various sizes and Unicode characters
GetWStringOutputParamTest - Tests nvarchar output parameters with truncation and null handling
GetStringResultsTest - Validates UTF-8 string handling (includes multi-byte character fix)

E2E Tests (TestShell)

All 6 E2E tests pass:

✓ EmptyInputDataWithPassThroughScript
✓ EmptyPayLoad
✓ InvalidScript
✓ NullOutputData
✓ EmptyStringPayload
✓ NvarcharPassthrough

The NvarcharPassthrough test specifically validates end-to-end nvarchar column handling.

Data Flow Diagram

SQL Server                    C# Extension
-----------                   ------------
nvarchar column    -->    UTF-16 bytes (SQL_C_WCHAR)
                               |
                               v
                          CSharpInputDataSet.AddColumns()
                               |
                               v
                          Interop.UTF16PtrToStr()
                               |
                               v
                          StringDataFrameColumn (managed string)
                               |
                               v
                          User DataFrame processing
                               |
                               v
                          CSharpOutputDataSet.ExtractColumns()
                               |
                               v
                          GetUnicodeStringArray() --> UTF-16 char[]
                               |
                               v
                          UTF-16 bytes    -->    nvarchar result

Breaking Changes

None. This is a backwards-compatible addition. Existing varchar/char columns continue to work as before.

Dependencies

No new dependencies added.

How to Test Manually

-- Create the external language (if not already created)
CREATE EXTERNAL LANGUAGE Dotnet
FROM (
    CONTENT = N'<path-to>\dotnet-core-CSharp-lang-extension.zip',
    FILE_NAME = 'nativecsharpextension.dll'
);

-- Test nvarchar pass-through
EXEC sp_execute_external_script
    @language = N'Dotnet',
    @script = N'YourLibrary.YourExecutorClass',
    @input_data_1 = N'SELECT N''Hello Unicode: 中文'' AS TextValue',
    @output_data_1_name = N'OutputDataSet'
WITH RESULT SETS ((TextValue NVARCHAR(100)));

Checklist

Code compiles without warnings
All existing unit tests pass
All existing E2E tests pass
New unit tests added for nvarchar parameter handling
Documentation updated (README.md)
No breaking changes to existing functionality

Copilot

Pull request overview

This PR adds comprehensive support for Unicode string data types (nvarchar and nchar, SQL_C_WCHAR) to the .NET Core C# language extension for SQL Server. Previously, the extension only supported ANSI character types (varchar/char). The implementation enables full Unicode data exchange through input columns, output columns, and input/output parameters.

Key Changes:

Added SQL_C_WCHAR data type support with UTF-16 encoding/decoding throughout the data pipeline
Implemented UTF-8 byte-level string splitting to fix multi-byte character handling bugs
Updated memory pinning strategy from fixed statements to GCHandle.Alloc with GCHandleType.Pinned for proper lifetime management
Added comprehensive test coverage for nvarchar parameters with Unicode characters (Chinese, Cyrillic)

Reviewed changes

Copilot reviewed 13 out of 13 changed files in this pull request and generated 12 comments.

Show a summary per file

File	Description
`Sql.cs`	Added DotNetWChar enum value and MinUtf16CharSize constant to DataTypeSize dictionary
`InteropUtils.cs`	Added UTF16PtrToStr() overloads for converting unmanaged UTF-16 pointers to managed strings
`DataSetUtils.cs`	Added UTF8ByteSplitToArray() method to correctly split UTF-8 byte buffers without character/byte offset issues
`CSharpInputDataSet.cs`	Added DotNetWChar case for reading UTF-16 input data with proper byte-to-character conversion; updated DotNetChar to use byte-level splitting
`CSharpOutputDataSet.cs`	Modified ExtractColumns() to accept input column metadata; added DotNetWChar case with GetUnicodeStringArray(); updated GC pinning from fixed to GCHandle
`CSharpParamContainer.cs`	Added DotNetWChar cases in AddParamValue() and ReplaceParamValue(); added ReplaceUnicodeStringParam() helper method
`CSharpSession.cs`	Updated to pass input column metadata to ExtractColumns() for type preservation
`CSharpDataSet.cs`	Exposed Columns property to enable metadata propagation
`README.md`	Updated supported data types list to include SQL_C_WCHAR and nvarchar(n)
`CSharpExtensionApiTests.h`	Added GetWStringOutputParam() test helper declaration
`CSharpTestExecutor.cs`	Added CSharpTestExecutorWStringParam class for Unicode output parameter testing
`CSharpInitParamTests.cpp`	Expanded InitWStringParamTest with comprehensive nchar/nvarchar test scenarios including Unicode characters
`CSharpGetOutputParamTests.cpp`	Added GetWStringOutputParamTest and GetWStringOutputParam() helper for validating Unicode output parameters

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

language-extensions/dotnet-core-CSharp/src/managed/CSharpInputDataSet.cs

language-extensions/dotnet-core-CSharp/src/managed/CSharpOutputDataSet.cs

language-extensions/dotnet-core-CSharp/src/managed/CSharpParamContainer.cs

language-extensions/dotnet-core-CSharp/src/managed/CSharpInputDataSet.cs

language-extensions/dotnet-core-CSharp/src/managed/CSharpOutputDataSet.cs

monamaki · 2025-12-16T18:12:53Z

Fixed a bug in CSharpInputDataSet.cs where multi-byte UTF-8 characters (e.g., Euro symbol € = 3 bytes) were incorrectly split. The previous implementation: Converted the entire UTF-8 byte buffer to a .NET string

What will happen in case of a very large string? what was happening in the past and then after this code?

monamaki · 2025-12-16T23:01:26Z

How did you find the following issue?

Fixed a bug in CSharpInputDataSet.cs where multi-byte UTF-8 characters (e.g., Euro symbol € = 3 bytes) were incorrectly split.

Do we have a test to cover it now?

language-extensions/dotnet-core-CSharp/README.md

language-extensions/dotnet-core-CSharp/test/src/native/CSharpInitParamTests.cpp

language-extensions/dotnet-core-CSharp/test/src/native/CSharpGetOutputParamTests.cpp

language-extensions/dotnet-core-CSharp/src/managed/utils/Sql.cs

language-extensions/dotnet-core-CSharp/src/managed/CSharpOutputDataSet.cs

language-extensions/dotnet-core-CSharp/src/managed/CSharpParamContainer.cs

monamaki

Please see inline comments.

SicongLiu2000 · 2025-12-17T22:34:14Z

Fixed a bug in CSharpInputDataSet.cs where multi-byte UTF-8 characters (e.g., Euro symbol € = 3 bytes) were incorrectly split. The previous implementation: Converted the entire UTF-8 byte buffer to a .NET string

What will happen in case of a very large string? what was happening in the past and then after this code?

The previous implementation converted the entire UTF-8 byte buffer to a single .NET string, then tried to chunk it by character positions. This caused two issues:

Multi-byte character corruption: The byte offsets in strLenOrNullMap represent UTF-8 byte lengths, but after converting to a .NET string (UTF-16), character positions no longer aligned with byte boundaries. For example, "€" is 3 bytes in UTF-8 but 1 character in .NET, so the chunking logic would split at wrong positions.

Memory inefficiency for large strings: A 1GB UTF-8 buffer would create a ~2GB intermediate string (since .NET strings are UTF-16), causing ~3GB peak memory usage before the string was even chunked.

The fix processes each row's bytes individually using the exact byte length from strLenOrNullMap[i], which correctly handles multi-byte characters and avoids creating a large intermediate string.

language-extensions/dotnet-core-CSharp/src/managed/CSharpInputDataSet.cs

language-extensions/dotnet-core-CSharp/src/managed/CSharpOutputDataSet.cs

language-extensions/dotnet-core-CSharp/src/managed/CSharpSession.cs

language-extensions/dotnet-core-CSharp/src/managed/CSharpOutputDataSet.cs

language-extensions/dotnet-core-CSharp/README.md

language-extensions/dotnet-core-CSharp/src/managed/CSharpParamContainer.cs

language-extensions/dotnet-core-CSharp/test/include/CSharpExtensionApiTests.h

Aniruddh25

Waiting on fixing some copilot suggestions.

Changed void*[] and int*[] arrays to IntPtr[] arrays because pointer arrays are not blittable and cannot be pinned with GCHandleType.Pinned on .NET 6. IntPtr is blittable and works correctly across all .NET versions.

Nvarchar support

d094909

SicongLiu2000 requested review from Copilot and monamaki and removed request for Copilot December 15, 2025 20:09

Copilot started reviewing on behalf of SicongLiu2000 December 15, 2025 20:09 View session

SicongLiu2000 requested review from Aniruddh25 and Copilot December 15, 2025 20:10

Copilot started reviewing on behalf of SicongLiu2000 December 15, 2025 20:11 View session

Copilot AI reviewed Dec 15, 2025

View reviewed changes

monamaki requested review from beccadaniel and yaelh December 16, 2025 17:58

microsoft deleted a comment from hiteshbhatia-microsoft Dec 16, 2025

monamaki reviewed Dec 17, 2025

View reviewed changes

monamaki requested changes Dec 17, 2025

View reviewed changes

Aniruddh25 reviewed Dec 18, 2025

View reviewed changes

language-extensions/dotnet-core-CSharp/src/managed/CSharpInputDataSet.cs Show resolved Hide resolved

Aniruddh25 reviewed Dec 18, 2025

View reviewed changes

language-extensions/dotnet-core-CSharp/src/managed/CSharpOutputDataSet.cs Show resolved Hide resolved

Aniruddh25 reviewed Dec 18, 2025

View reviewed changes

language-extensions/dotnet-core-CSharp/src/managed/CSharpSession.cs Outdated Show resolved Hide resolved

Aniruddh25 reviewed Dec 19, 2025

View reviewed changes

language-extensions/dotnet-core-CSharp/src/managed/CSharpOutputDataSet.cs Outdated Show resolved Hide resolved

Aniruddh25 reviewed Dec 19, 2025

View reviewed changes

language-extensions/dotnet-core-CSharp/README.md Outdated Show resolved Hide resolved

Aniruddh25 reviewed Dec 19, 2025

View reviewed changes

language-extensions/dotnet-core-CSharp/src/managed/CSharpParamContainer.cs Show resolved Hide resolved

Aniruddh25 reviewed Dec 19, 2025

View reviewed changes

language-extensions/dotnet-core-CSharp/test/include/CSharpExtensionApiTests.h Show resolved Hide resolved

Aniruddh25 requested changes Dec 19, 2025

View reviewed changes

SicongLiu2000 force-pushed the dev/sicongliu/nvarchar branch from 0f3f1c3 to 9f79549 Compare December 22, 2025 20:14

Resolve comments

0ac9041

SicongLiu2000 force-pushed the dev/sicongliu/nvarchar branch from 9f79549 to 0ac9041 Compare December 22, 2025 21:16

Fix GCHandle.Alloc blittability issue by using IntPtr arrays

a340661

Changed void*[] and int*[] arrays to IntPtr[] arrays because pointer arrays are not blittable and cannot be pinned with GCHandleType.Pinned on .NET 6. IntPtr is blittable and works correctly across all .NET versions.

Add nvarchar (SQL_C_WCHAR) Support to .NET Core C# Language Extension #67

Are you sure you want to change the base?

Add nvarchar (SQL_C_WCHAR) Support to .NET Core C# Language Extension #67

Uh oh!

Conversation

SicongLiu2000 commented Dec 15, 2025

Add nvarchar (SQL_C_WCHAR) Support to .NET Core C# Language Extension

Summary

Why This Change Is Needed

What Changed

Core Data Type Support

Input Data Handling

Output Data Handling

Parameter Handling

Documentation

Test Coverage

Bug Fixes Included

UTF-8 Multi-byte Character Handling

Pointer Pinning Memory Safety

Testing

Unit Tests (Native C++)

E2E Tests (TestShell)

Data Flow Diagram

Breaking Changes

Dependencies

How to Test Manually

Checklist

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

monamaki commented Dec 16, 2025

Uh oh!

monamaki commented Dec 16, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

monamaki left a comment

Choose a reason for hiding this comment

Uh oh!

SicongLiu2000 commented Dec 17, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Aniruddh25 left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants