Skip to content

feat: Support Spark expression years#3405

Open
YutaLin wants to merge 3 commits intoapache:mainfrom
YutaLin:feat/spark_expression_years
Open

feat: Support Spark expression years#3405
YutaLin wants to merge 3 commits intoapache:mainfrom
YutaLin:feat/spark_expression_years

Conversation

@YutaLin
Copy link

@YutaLin YutaLin commented Feb 5, 2026

Which issue does this PR close?

Closes #3131

Rationale for this change

Comet does not currently support the Spark years function, causing queries using this function to fall back to Spark's JVM execution instead of running natively on DataFusion.

This patch adds serialization support for Years, allowing Comet to recognize the expression

What changes are included in this PR?

  • Map Spark Years expression to DataFusion `datepart('year, child)' and cast to integer
  • Implement getSupportLevel to restrict support type DateType, TimestampType and TimestampNTZType
  • Register the new handler in QueryPlanSerde

How are these changes tested?

  • Add unit test in CometExpressionSerde to verify Years expression is identified correctly and serialized into Protobuf message.
  • Add unit test to verify that getSupportLevel returns Compatible for supported temporal types.

@mbutrovich
Copy link
Contributor

Thank you for the contribution @YutaLin! Could you add/migrate the expression tests to the new SQL test framework? https://datafusion.apache.org/comet/contributor-guide/sql-file-tests.html

@YutaLin YutaLin force-pushed the feat/spark_expression_years branch from cb61473 to 379180e Compare February 6, 2026 03:57
@YutaLin
Copy link
Author

YutaLin commented Feb 6, 2026

Hi @mbutrovich, thanks for the review!
I investigated adding a SQL test, but found out Years is a PartitionTransformExpression that extends Unevaluable in Spark, which means spark itself cannot execute Years at run time, it would throws [INTERNAL_ERROR] cannot generate code for expression, I cannot write SELECT years(col) or WHERE years(col) = 2024, it's only used as partition metadata, could you guide me on the best way to add a solid test in SQL test framework to verify the partitioning?
I've added tests to verify serialization and support level.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Feature] Support Spark expression: years

2 participants