Skip to content

fix(filters): handle sync-flush terminated zlib streams in FlateDecode#20

Merged
Mythie merged 4 commits intomainfrom
fix/flate-sync-flush
Feb 8, 2026
Merged

fix(filters): handle sync-flush terminated zlib streams in FlateDecode#20
Mythie merged 4 commits intomainfrom
fix/flate-sync-flush

Conversation

@Mythie
Copy link
Contributor

@Mythie Mythie commented Feb 8, 2026

Summary

Fixes #16TypeError: Cannot read properties of undefined (reading 'length') when extracting text from PDFs generated by PDFium.

  • Root cause: pako.inflate() silently returns undefined (instead of throwing) for zlib streams terminated with a sync-flush marker (00 00 FF FF) instead of a proper final block and Adler-32 checksum. Some PDF generators (notably PDFium) produce these streams. The undefined propagated through FilterPipeline into concatenateChunks() which crashed on .length.
  • Fix: FlateFilter now detects when pako.inflate() returns undefined or throws, and falls back to recovering partial output from pako's internal Inflate state. Truly unrecoverable streams return empty Uint8Array rather than throwing, consistent with our lenient approach to malformed PDFs.
  • Tests: Added 5 unit tests using the exact byte sequences from the issue report.

pako.inflate() returns undefined (rather than throwing) for zlib streams
terminated with a sync-flush marker (00 00 FF FF) instead of a proper
final block. Some PDF generators like PDFium produce these streams.

The undefined result propagated through FilterPipeline and caused a
TypeError in downstream code trying to access .length on undefined
chunks.

FlateFilter.decode() now detects this case and recovers the
decompressed data from pako's internal output buffer. Truly corrupt
streams return empty rather than throwing, consistent with the
library's lenient approach to malformed PDFs.

Closes #16
@vercel
Copy link
Contributor

vercel bot commented Feb 8, 2026

The latest updates on your projects. Learn more about Vercel for GitHub.

Project Deployment Actions Updated (UTC)
core Ready Ready Preview, Comment Feb 8, 2026 0:24am

@Mythie Mythie merged commit 90f123f into main Feb 8, 2026
4 of 5 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Exception due to undefined chunks when extracting text

1 participant