feat: support map_contains_key expression by peterxcli · Pull Request #3369 · apache/datafusion-comet

peterxcli · 2026-02-03T02:20:22Z

Which issue does this PR close?

Closes: #3164

Rationale for this change

Comet does not currently support the Spark map_contains_key function, causing queries using this function to fall back to Spark's JVM execution instead of running natively on DataFusion.

The MapContainsKey expression checks whether a given key exists in a map. It is implemented as a runtime-replaceable expression that internally uses ArrayContains on the map's keys to perform the lookup.

Supporting this expression would allow more Spark workloads to benefit from Comet's native acceleration.

What changes are included in this PR?

query plan serde to map MapContainsKey -> CometMapContainsKey
implement CometMapContainsKey to convert the expr to array_has(map_keys(map), key)
- ref: https://github.com/apache/spark/blob/04b821c69e85be5f51a1270b3a9a4155afdb5334/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala#L239
update spark expr supporting list
btw, I saw fix: support map_keys #1788 had implemented the map_keys function, so I also mark it as done.

How are these changes tested?

gen map type data and use that to verify the correctness of map_contains_key
Empty-map tests

empty map tests in spark:

Copilot

Copilot encountered an error and was unable to review this pull request. You can try again by re-requesting a review.

codecov-commenter · 2026-02-03T02:33:02Z

Codecov Report

❌ Patch coverage is 37.50000% with 5 lines in your changes missing coverage. Please review.
✅ Project coverage is 59.98%. Comparing base (f09f8af) to head (105cadd).
⚠️ Report is 918 commits behind head on main.

Files with missing lines	Patch %	Lines
...k/src/main/scala/org/apache/comet/serde/maps.scala	16.66%	5 Missing ⚠️

Additional details and impacted files

@@             Coverage Diff              @@
##               main    #3369      +/-   ##
============================================
+ Coverage     56.12%   59.98%   +3.85%     
- Complexity      976     1475     +499     
============================================
  Files           119      175      +56     
  Lines         11743    16172    +4429     
  Branches       2251     2681     +430     
============================================
+ Hits           6591     9701    +3110     
- Misses         4012     5119    +1107     
- Partials       1140     1352     +212

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

hsiang-c · 2026-02-03T06:33:55Z

spark/src/main/scala/org/apache/comet/serde/maps.scala

+
+    val mapKeysExpr = scalarFunctionExprToProto("map_keys", mapExpr)
+
+    val mapContainsKeyExpr = scalarFunctionExprToProto("array_has", mapKeysExpr, keyExpr)


I was trying to add the spark reference link as comment, but the scala style check was still failing even after I ran make format. so i think I would just leave it in the PR description.

https://github.com/apache/spark/blob/branch-4.0/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala#L233

spark/src/test/scala/org/apache/comet/CometMapExpressionSuite.scala

coderfender

left test related comments

spark/src/test/scala/org/apache/comet/CometMapExpressionSuite.scala

mbutrovich · 2026-02-03T13:09:42Z

Thanks @peterxcli! Can we add/migrate the tests to the new framework?

https://datafusion.apache.org/comet/contributor-guide/sql-file-tests.html

peterxcli · 2026-02-04T11:54:07Z

spark/src/test/resources/sql-tests/expressions/map/map_contains_key.sql

+
+-- ConfigMatrix: parquet.enable.dictionary=false,true
+
+-- TODO: replace map_from_arrays with map whenever map is supported in Comet


peterxcli · 2026-02-04T11:54:54Z

spark/src/test/resources/sql-tests/expressions/map/map_contains_key.sql

+select map_contains_key(map_from_arrays(array(1, 2), array('a', 'b')), 1)
+
+-- Decimal type coercion tests
+-- TODO: requires map cast to be supported in Comet


peterxcli · 2026-02-04T12:04:24Z

spark/src/test/resources/sql-tests/expressions/map/map_contains_key.sql

+select map_contains_key(map_from_arrays(array(1.0, 2), array('a', 'b')), 1)
+
+-- Empty map tests
+-- TODO: requires casting from NullType to be supported in Comet


peterxcli · 2026-02-04T12:08:23Z

Thanks @peterxcli! Can we add/migrate the tests to the new framework?

thanks @mbutrovich for the heads up. I've updated them.

andygrove

Thanks for implementing this, @peterxcli. The approach of rewriting map_contains_key to array_has(map_keys(map), key) follows Spark's internal implementation nicely, and I appreciate that you migrated the tests to the new SQL file test framework as requested.

I have one concern about null key handling that I wanted to raise. In Spark, ArrayContains (which underlies MapContainsKey) follows SQL three-valued logic: if the array contains null elements and no match is found, it returns null rather than false. This is because the result is indeterminate - the value might or might not match the null element.

DataFusion's array_has function historically followed DuckDB semantics instead, returning false in this case. There's an open Comet issue discussing a similar inconsistency with arrays_overlap (#2036).

Could you verify how this behaves when a map has a null key? For example:

SELECT map_contains_key(map(1, 'a', null, 'b'), 5)
-- Spark should return NULL (key not found, but null key exists = indeterminate)
-- DataFusion might return false

If there is a difference, it might be worth either adding a note about the incompatibility or investigating whether there's a Spark-compatible implementation path in the spark-expr crate.

That said, I noticed all CI checks are passing, which suggests the test suite didn't catch any compatibility issues. The test coverage looks good for the core functionality - empty maps, null maps, different key types, and missing keys are all covered. It might be worth adding a test specifically for maps with null keys to document the expected behavior and catch any regressions if the underlying DataFusion behavior changes.

This review was generated with AI assistance.

peterxcli added 2 commits February 3, 2026 10:09

Add map_contains_key expression support

1f5fe9b

upd spark expr support list

105cadd

Copilot AI review requested due to automatic review settings February 3, 2026 02:20

Copilot AI reviewed Feb 3, 2026

View reviewed changes

peterxcli changed the title ~~[Feature] Support Spark expression: map_contains_key~~ feat: support map_contains_key expression Feb 3, 2026

hsiang-c reviewed Feb 3, 2026

View reviewed changes

coderfender reviewed Feb 3, 2026

View reviewed changes

spark/src/test/scala/org/apache/comet/CometMapExpressionSuite.scala Show resolved Hide resolved

coderfender reviewed Feb 3, 2026

View reviewed changes

spark/src/test/scala/org/apache/comet/CometMapExpressionSuite.scala Outdated Show resolved Hide resolved

coderfender reviewed Feb 3, 2026

View reviewed changes

spark/src/test/scala/org/apache/comet/CometMapExpressionSuite.scala Outdated Show resolved Hide resolved

peterxcli added 2 commits February 4, 2026 19:43

move map_contains_key tests from old scala test suit to slt

298d837

Merge branch 'main' into feat/map_contains_key

3ac4999

peterxcli mentioned this pull request Feb 4, 2026

Feat: map_from_entries #2905

Merged

peterxcli requested review from coderfender and hsiang-c February 4, 2026 11:51

peterxcli commented Feb 4, 2026

View reviewed changes

andygrove reviewed Feb 4, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: support map_contains_key expression#3369

feat: support map_contains_key expression#3369
peterxcli wants to merge 4 commits intoapache:mainfrom
peterxcli:feat/map_contains_key

peterxcli commented Feb 3, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

codecov-commenter commented Feb 3, 2026 •

edited

Loading

Uh oh!

hsiang-c Feb 3, 2026

Uh oh!

peterxcli Feb 4, 2026

Uh oh!

Uh oh!

Uh oh!

coderfender left a comment

Uh oh!

Uh oh!

mbutrovich commented Feb 3, 2026

Uh oh!

peterxcli Feb 4, 2026

Uh oh!

peterxcli Feb 4, 2026

Uh oh!

peterxcli Feb 4, 2026

Uh oh!

peterxcli commented Feb 4, 2026

Uh oh!

andygrove left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants


		val mapKeysExpr = scalarFunctionExprToProto("map_keys", mapExpr)

		val mapContainsKeyExpr = scalarFunctionExprToProto("array_has", mapKeysExpr, keyExpr)


		-- ConfigMatrix: parquet.enable.dictionary=false,true

		-- TODO: replace map_from_arrays with map whenever map is supported in Comet

Conversation

peterxcli commented Feb 3, 2026

Which issue does this PR close?

Rationale for this change

What changes are included in this PR?

How are these changes tested?

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Uh oh!

codecov-commenter commented Feb 3, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

hsiang-c Feb 3, 2026

Choose a reason for hiding this comment

Uh oh!

peterxcli Feb 4, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

coderfender left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

mbutrovich commented Feb 3, 2026

Uh oh!

peterxcli Feb 4, 2026

Choose a reason for hiding this comment

Uh oh!

peterxcli Feb 4, 2026

Choose a reason for hiding this comment

Uh oh!

peterxcli Feb 4, 2026

Choose a reason for hiding this comment

Uh oh!

peterxcli commented Feb 4, 2026

Uh oh!

andygrove left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

codecov-commenter commented Feb 3, 2026 •

edited

Loading