refactor: Split read benchmarks and add addParquetScanCases helper by andygrove · Pull Request #3407 · apache/datafusion-comet

andygrove · 2026-02-05T16:07:38Z

Summary

Extract iceberg benchmarks from CometReadBaseBenchmark into new CometIcebergReadBenchmark object
Add addParquetScanCases helper to CometBenchmarkBase that encapsulates the repeated 3-case pattern (Spark / native_datafusion / native_iceberg_compat) used by all parquet benchmarks
Refactor CometReadBaseBenchmark and CometPartitionColumnBenchmark to use the new helper, eliminating ~270 lines of duplicated boilerplate

Test plan

./mvnw compile test-compile -pl spark -DskipTests passes
./mvnw spotless:check -pl spark passes
Run CometReadBenchmark to verify parquet benchmarks work
Run CometIcebergReadBenchmark to verify iceberg benchmarks work
Run CometPartitionColumnBenchmark to verify partition benchmarks work

🤖 Generated with Claude Code

Extract iceberg benchmarks into CometIcebergReadBenchmark and add addParquetScanCases helper to CometBenchmarkBase to eliminate the repeated 3-case pattern (Spark / native_datafusion / native_iceberg_compat) across all parquet benchmarks. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

Eliminate the JVM data round trip for data columns in native_iceberg_compat scans. Data columns are read directly from the native BatchContext via zero-copy Arc::clone, while only partition columns cross the JVM boundary via Arrow FFI. Previously, data made a wasteful round trip: Rust ParquetSource → per-column JNI export to JVM → JVM wraps as CometVector → JVM exports ALL cols back to Rust via Arrow FFI → Rust ScanExec deep-copies every column Now in passthrough mode: Rust ParquetSource → batch stays in native BatchContext → Rust ScanExec reads data cols directly (zero-copy) → Only partition cols imported from JVM FFI (small, constant) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

…cans" This reverts commit 75750af.

andygrove and others added 2 commits February 5, 2026 08:56

savE

7735744

andygrove marked this pull request as ready for review February 5, 2026 16:44

andygrove and others added 2 commits February 5, 2026 10:44

Revert "feat: Native batch passthrough for native_iceberg_compat V1 s…

fef0576

…cans" This reverts commit 75750af.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

refactor: Split read benchmarks and add addParquetScanCases helper#3407

refactor: Split read benchmarks and add addParquetScanCases helper#3407
andygrove wants to merge 4 commits intoapache:mainfrom
andygrove:parquet-read-bench

andygrove commented Feb 5, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

andygrove commented Feb 5, 2026

Summary

Test plan

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant