feat: Enable native Parquet writer in Spark SQL tests#3351
feat: Enable native Parquet writer in Spark SQL tests#3351coderfender wants to merge 9 commits intoapache:mainfrom
Conversation
b5cea4f to
951b394
Compare
.github/workflows/spark_sql_test.yml
Outdated
| cd apache-spark | ||
| rm -rf /root/.m2/repository/org/apache/parquet # somehow parquet cache requires cleanups | ||
| ENABLE_COMET=true ENABLE_COMET_ONHEAP=true ${{ matrix.config.scan-env }} ENABLE_COMET_LOG_FALLBACK_REASONS=${{ github.event.inputs.collect-fallback-logs || 'false' }} \ | ||
| ENABLE_COMET=true ENABLE_COMET_ONHEAP=true ENABLE_COMET_WRITER=true ${{ matrix.config.scan-env }} ENABLE_COMET_LOG_FALLBACK_REASONS=${{ github.event.inputs.collect-fallback-logs || 'false' }} \ |
There was a problem hiding this comment.
we also need to enable
--conf spark.comet.operator.DataWritingCommandExec.allowIncompatible=true
There was a problem hiding this comment.
We currently dont have the config available in spark tests through ENABLE_COMET_WRITER=true . Perhaps you meant enabling them in the benchmark suite TPCH / TPCDS as well ?
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## main #3351 +/- ##
============================================
+ Coverage 56.12% 59.99% +3.86%
- Complexity 976 1464 +488
============================================
Files 119 175 +56
Lines 11743 16165 +4422
Branches 2251 2681 +430
============================================
+ Hits 6591 9698 +3107
- Misses 4012 5114 +1102
- Partials 1140 1353 +213 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
|
@comphead , We have bunch of failures largely due to hitting edge cases with comet writer (CTAS , Insert Overwrite which is likely a bug) . I am thinking of two possible options here
|
|
@comphead please kickoff CI whenever you get a chance |
|
My diff files for Spark versions 3.4 and 3.5 missed disabling comet writer for CTAS statements (while it was set for Spark 4) causing above test failures . However, there are only 2 failed tests down from 13 |
|
Seems like there were issues applying diff file for Spark 4 |
|
@comphead all the tests passed .Please take a look whenever you get a chance |
|
All tests passed and the following tests are ignored : |
dev/diffs/3.4.3.diff
Outdated
|
|
||
| - test("SPARK-33901: ctas should should not change table's schema") { | ||
| + test("SPARK-33901: ctas should should not change table's schema", | ||
| + IgnoreComet("comet native writer does not support empty dir / table creation yet")) { |
There was a problem hiding this comment.
Could you link to the specific GitHub issues in the IgnoreComet annotations?
|
Fixing merge conflicts for the diff files :) |
6c6a142 to
b60ea7e
Compare
Which issue does this PR close?
Closes: #3209
I added new environment variable
ENABLE_COMET_WRITERwhich enables native comet writer and runs spark tests3 kinds of tests have been ignored:
charvarcha.sqlfilesRationale for this change
Now that we have a native comet writer, this PR enables spark tests while ignoring the ones which fail
What changes are included in this PR?
Diff file changes to enable comet native writer on spark tests
How are these changes tested?