Skip to content

Fix broken WebSocket defragmentation#383

Merged
mathieucarbou merged 1 commit intomainfrom
issue-353
Feb 8, 2026
Merged

Fix broken WebSocket defragmentation#383
mathieucarbou merged 1 commit intomainfrom
issue-353

Conversation

@mathieucarbou
Copy link
Member

Fix #382
WebSocket defragmentation was broken after a fix for Safari in PR #353

This PR refactors several parts:

  • The defragmentation handling of the mask (Safari can send it fragmented)
  • The defragmentation of the WS frames which was broken since Deal with safari fragmented pong data #353
  • The handling of Safari sending invalid disconnect frames
  • The logging - which was left but in verbose mode to facilitate debug from users as well

How to test this PR:

With WebSocket.ino example, compile in verbose mode.

Complete frame

# (\n at the end)
> echo hello | websocat ws://192.168.4.1/ws
ws connect
[310144][V][AsyncWebSocket.cpp:517] _onData(): WS[2]: _onData: plen=12, _pstate=0, _status=1
[310153][V][AsyncWebSocket.cpp:544] _onData(): WS[2]: _pinfo: index: 0, final: 1, opcode: 1, masked: 1, len: 6
[310163][V][AsyncWebSocket.cpp:584] _onData(): WS[2]: mask read complete
[310170][V][AsyncWebSocket.cpp:617] _onData(): WS[2]: processing final fragment index=0, len=6
[310178][V][AsyncWebSocket.cpp:654] _onData(): WS[2]: processing data frame num=0
index: 0, len: 6, final: 1, opcode: 1, framelen: 6
ws text: hello

[310186][V][AsyncWebSocket.cpp:517] _onData(): WS[2]: _onData: plen=6, _pstate=0, _status=1
[310199][V][AsyncWebSocket.cpp:544] _onData(): WS[2]: _pinfo: index: 0, final: 1, opcode: 8, masked: 1, len: 0
[310209][V][AsyncWebSocket.cpp:584] _onData(): WS[2]: mask read complete
[310215][V][AsyncWebSocket.cpp:617] _onData(): WS[2]: processing final fragment index=0, len=0
[310224][V][AsyncWebSocket.cpp:620] _onData(): WS[2]: processing disconnect
[310235][V][AsyncWebSocket.cpp:517] _onData(): WS[2]: _onData: plen=6, _pstate=0, _status=2
[310244][V][AsyncWebSocket.cpp:544] _onData(): WS[2]: _pinfo: index: 0, final: 1, opcode: 10, masked: 1, len: 0
[310254][V][AsyncWebSocket.cpp:584] _onData(): WS[2]: mask read complete
[310260][V][AsyncWebSocket.cpp:617] _onData(): WS[2]: processing final fragment index=0, len=0
[310268][V][AsyncWebSocket.cpp:648] _onData(): WS[2]: processing pong
ws pong
ws disconnect
Connected clients: 1 / 1 total
Free heap: 192416

Fragmentation test:

# Generates 2001 characters (\n at the end) to cause a fragmentation (over TCP MSS)
> openssl rand -hex 1000 | websocat ws://192.168.4.1/ws
ws connect
[492418][V][AsyncWebSocket.cpp:517] _onData(): WS[3]: _onData: plen=1436, _pstate=0, _status=1
[492427][V][AsyncWebSocket.cpp:544] _onData(): WS[3]: _pinfo: index: 0, final: 1, opcode: 1, masked: 1, len: 2001
[492437][V][AsyncWebSocket.cpp:584] _onData(): WS[3]: mask read complete
[492444][V][AsyncWebSocket.cpp:599] _onData(): WS[3]: processing next fragment index=0, len=1428
index: 0, len: 2001, final: 1, opcode: 1, framelen: 1428
ws[/ws][3] [0] MSG START text
ws[/ws][3] [0] FRAME START len=2001
ws[/ws][3] [0] FRAME text, index=0, len=1428]: 7f2c96ab959f566887af03f6d7166c42d25eb8328.........7da971d4b4eedd
[492586][V][AsyncWebSocket.cpp:517] _onData(): WS[3]: _onData: plen=573, _pstate=2, _status=1
[492599][V][AsyncWebSocket.cpp:544] _onData(): WS[3]: _pinfo: index: 1428, final: 1, opcode: 1, masked: 1, len: 2001
[492609][V][AsyncWebSocket.cpp:617] _onData(): WS[3]: processing final fragment index=1428, len=573
[492618][V][AsyncWebSocket.cpp:654] _onData(): WS[3]: processing data frame num=0
index: 1428, len: 2001, final: 1, opcode: 1, framelen: 573
ws[/ws][3] [0] FRAME text, index=1428, len=573]: 3e75adcc607e5236a55c50d9f40ba7aa05a21b3566.........2fbafdff7583cf5225a9252cdac69

ws[/ws][3] [0] FRAME END
ws[/ws][3] [0] MSG END
[492688][V][AsyncWebSocket.cpp:517] _onData(): WS[3]: _onData: plen=6, _pstate=0, _status=1
[492696][V][AsyncWebSocket.cpp:544] _onData(): WS[3]: _pinfo: index: 0, final: 1, opcode: 8, masked: 1, len: 0
[492706][V][AsyncWebSocket.cpp:584] _onData(): WS[3]: mask read complete
[492712][V][AsyncWebSocket.cpp:617] _onData(): WS[3]: processing final fragment index=0, len=0
[492721][V][AsyncWebSocket.cpp:620] _onData(): WS[3]: processing disconnect
[492730][V][AsyncWebSocket.cpp:517] _onData(): WS[3]: _onData: plen=6, _pstate=0, _status=2
[492738][V][AsyncWebSocket.cpp:544] _onData(): WS[3]: _pinfo: index: 0, final: 1, opcode: 10, masked: 1, len: 0
[492748][V][AsyncWebSocket.cpp:584] _onData(): WS[3]: mask read complete
[492755][V][AsyncWebSocket.cpp:617] _onData(): WS[3]: processing final fragment index=0, len=0
[492763][V][AsyncWebSocket.cpp:648] _onData(): WS[3]: processing pong
ws pong
ws disconnect

Safari fragmented packet test

Hit multiple time CMD+R on Safari => look for waiting for more mask data: read=0/4

ws connect
ws disconnect
[574838][V][AsyncWebSocket.cpp:517] _onData(): WS[5]: _onData: plen=2, _pstate=0, _status=1
[574847][V][AsyncWebSocket.cpp:544] _onData(): WS[5]: _pinfo: index: 0, final: 1, opcode: 10, masked: 1, len: 0
[574857][V][AsyncWebSocket.cpp:568] _onData(): WS[5]: waiting for more mask data: read=0/4
[574865][V][AsyncWebSocket.cpp:517] _onData(): WS[5]: _onData: plen=4, _pstate=1, _status=1
[574873][V][AsyncWebSocket.cpp:544] _onData(): WS[5]: _pinfo: index: 0, final: 1, opcode: 10, masked: 1, len: 0
[574883][V][AsyncWebSocket.cpp:584] _onData(): WS[5]: mask read complete
[574889][V][AsyncWebSocket.cpp:617] _onData(): WS[5]: processing final fragment index=0, len=0
[574898][V][AsyncWebSocket.cpp:648] _onData(): WS[5]: processing pong
ws pong
[575062][V][AsyncWebSocket.cpp:517] _onData(): WS[5]: _onData: plen=8, _pstate=0, _status=1
[575071][V][AsyncWebSocket.cpp:544] _onData(): WS[5]: _pinfo: index: 0, final: 1, opcode: 8, masked: 1, len: 2
[575080][V][AsyncWebSocket.cpp:584] _onData(): WS[5]: mask read complete
[575087][V][AsyncWebSocket.cpp:617] _onData(): WS[5]: processing final fragment index=0, len=2
[575095][V][AsyncWebSocket.cpp:620] _onData(): WS[5]: processing disconnect
ws disconnect
ws connect
[575155][V][AsyncWebSocket.cpp:517] _onData(): WS[6]: _onData: plen=2, _pstate=0, _status=1
[575164][V][AsyncWebSocket.cpp:544] _onData(): WS[6]: _pinfo: index: 0, final: 1, opcode: 10, masked: 1, len: 0
[575174][V][AsyncWebSocket.cpp:568] _onData(): WS[6]: waiting for more mask data: read=0/4
[575182][V][AsyncWebSocket.cpp:517] _onData(): WS[6]: _onData: plen=4, _pstate=1, _status=1
[575190][V][AsyncWebSocket.cpp:544] _onData(): WS[6]: _pinfo: index: 0, final: 1, opcode: 10, masked: 1, len: 0
[575200][V][AsyncWebSocket.cpp:584] _onData(): WS[6]: mask read complete
[575206][V][AsyncWebSocket.cpp:617] _onData(): WS[6]: processing final fragment index=0, len=0
[575215][V][AsyncWebSocket.cpp:648] _onData(): WS[6]: processing pong
ws pong

Comment on lines 28 to 30
#define SATE_FRAME_START 0
#define SATE_FRAME_MASK 1
#define SATE_FRAME_DATA 2
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using _pstate as suggested by @willmmiles here to keep track of the reassembling for the mask

uint8_t *data = (uint8_t *)pbuf;

while (plen > 0) {
async_ws_log_v("WS[%" PRIu32 "]: _onData: plen=%" PRIu32 ", _pstate=%" PRIu8 ", _status=%" PRIu8, _clientId, plen, _pstate, static_cast<uint8_t>(_status));
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

kept all logs async_ws_log_v to still be able to easily debug

Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR fixes a critical bug in WebSocket frame defragmentation that was introduced in PR #353 when addressing Safari's fragmented pong data. The fix refactors the state machine for handling WebSocket frames, particularly improving mask byte accumulation across packet boundaries and restoring proper frame reassembly.

Changes:

  • Introduces explicit state constants for the WebSocket parsing state machine
  • Refactors mask byte reading to properly handle Safari's behavior of splitting the 4-byte mask across multiple TCP packets
  • Fixes frame defragmentation logic that was broken since PR #353
  • Adds comprehensive verbose logging to aid debugging
  • Updates the WebSocket example to demonstrate both complete and fragmented frame handling

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 4 comments.

File Description
src/AsyncWebSocket.cpp Core fix: adds state constants, refactors mask reading logic to use _pinfo.masked as a counter (1-5), improves frame defragmentation handling, adds verbose logging, and enhances error handling
examples/WebSocket/WebSocket.ino Updates example to demonstrate fragmented frame handling with detailed logging and increases polling interval from 100ms to 500ms

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

// _pinfo.masked is 1 if we need to start reading mask bytes
// _pinfo.masked is 2, 3, or 4 if we have partially read the mask
// _pinfo.masked is 5 if the mask is complete
while (_pinfo.masked && _pstate <= SATE_FRAME_MASK && _pinfo.masked < 5) {
Copy link
Member Author

@mathieucarbou mathieucarbou Feb 3, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

reassemble de mask using _pinfo.masked to track byte count but resets _pinfo.masked to 1 at the end for backward compat'

Comment on lines 560 to +561
_pinfo.index = 0;
} else {
// Wait for more data
// async_ws_log_w("WS[%" PRIu32 "]: waiting for more mask data: read=%zu/4", _clientId, mask_offset);
_pinfo.index = mask_offset; // Save progress
_pstate = 1;
return;
_pinfo.len = 0;
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To go into the disconnect use case below

}

const size_t datalen = std::min((size_t)(_pinfo.len - _pinfo.index), plen);
const auto datalast = datalen ? data[datalen] : 0;
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I really hate this thing... users have to add \0 and overflow the data array (see WS examples). That's not normal. I don't know if there is a way to refactor that in a backward compatible way one day.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This hack is outright wrong. Writing off the end of the pbuf pointer is just not safe in a preemptive system. I've seen crashes because the onEvent handler wrote off the end of the buffer, corrupting the heap, and then got pre-empted by another task that tried to do heap allocation before AsyncWebSocket could "undo" the damage. When I fixed the client, I still saw a crash because of the attempt to restore the byte; same setup - preempted during the onEvent callback - but the (now fixed) client didn't overwrite the value; so the heap manager wrote something new to that location, which then had one byte "restored" to an old value when AsyncWebSocket got the CPU back.

Ultimately I had to remove this from the WLED fork and we fixed all our clients to respect the buffer boundaries. Users don't have to write off the end of the array. It is true that when data arrives as a Pascal-style string (pointer+length), it cannot be used as a C-style string without a copy. But that's only necessary if you want to interact with C functions; modern tools like std::string_view can be used directly. (And if you must use a C API, std::string has a convenient pointer+length constructor for making a quick copy.)

That said: given the current example code, if we consider "onEvent handlers are permitted to write off the end of the buffer by one byte" as part of our backwards compatibilty API contract, then we really have no safe choice but to copy the data into a bigger buffer before calling the callback if the frame ends at the packet boundary.

Copy link
Member Author

@mathieucarbou mathieucarbou Feb 8, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That said: given the current example code, if we consider "onEvent handlers are permitted to write off the end of the buffer by one byte" as part of our backwards compatibilty API contract, then we really have no safe choice but to copy the data into a bigger buffer before calling the callback if the frame ends at the packet boundary.

What I thought too which is very ugly.
Let’s merge this PR I will open an issue for that. What we can do for now is to create a copy buffer but only for opcode TEXT with a \0 at the end.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Created issue here: #384

Comment on lines +669 to +673
_status = WS_DISCONNECTING;
if (_client) {
_client->ackLater();
}
_queueControl(WS_DISCONNECT, data, datalen);
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if we receive an invalid frame, nothing were done so the connection would be left in an unknown state, including frame defragmentation... So let's force a disconnect to restore the connection.

it could also be a rogue client...

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good call.

Fix #382
WebSocket defragmentation was broken after a fix for Safari in PR #353
Copy link

@willmmiles willmmiles left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry it's taken me so long to get through this; I don't have any Apple devices, so I hacked websocat to let me generate arbitrary frame boundaries.

The code seems to be working well for me for the test cases as defined. I've noted some other issues with the WS header framing code but they are out-of-scope for this PR. me-no-dev#953 on the old repo has a general fix for header fragmentation; this is what I pulled in the WLED fork. To improve here, we could adopt a similar approach of building up the header in a temporary buffer; or expand the state machine to cover the complete header assembly. That can be taken on later.

// );

data += 2;
plen -= 2;

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Availability of these bytes is not checked - in theory plen could be 1 here. This is problematic throughout the header analysis. Not a new problem for this PR but I'd like to note it nonetheless.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes I saw that, but we have to look at the WS spec also because these are the first 2 bytes received for the first frame: there shouldn't be any fragmentation there ?
The only case where fragmentation is happening within the assembly of the frame header is for the case of Safari which is sending the mask bytes in 2 shots.

if (_pinfo.len == 126 && plen >= 2) {
_pinfo.len = fdata[3] | (uint16_t)(fdata[2]) << 8;
data += 2;
plen -= 2;

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Availability of these bytes is checked, but if it spans the next TCP packet, our deserializer gets confused and we end up out of step. (Again out of scope here but worth noting)

Comment on lines +669 to +673
_status = WS_DISCONNECTING;
if (_client) {
_client->ackLater();
}
_queueControl(WS_DISCONNECT, data, datalen);

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good call.

@mathieucarbou
Copy link
Member Author

Thank you for your review @willmmiles !

@mathieucarbou mathieucarbou merged commit 4138937 into main Feb 8, 2026
49 of 52 checks passed
@mathieucarbou mathieucarbou deleted the issue-353 branch February 8, 2026 17:30
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Status: Pending Merge Type: Bug Something isn't working

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Websocket multipart message does not work as expected

2 participants