-
Notifications
You must be signed in to change notification settings - Fork 2.5k
calcite-file: Support custom CSV separator in CsvTableFactory #4782
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
calcite-file: Support custom CSV separator in CsvTableFactory #4782
Conversation
|
Thank you for your contribution. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
This PR adds support for custom field separators (delimiters) in the CSV adapter, addressing a limitation where the adapter previously hardcoded comma as the only separator. The change enables users to specify alternative delimiters like pipes or tabs through the model JSON configuration.
Changes:
- Added separator parameter parsing in CsvTableFactory with comma as default
- Propagated separator character through CsvTable, CsvTranslatableTable, CsvEnumerator, and CsvStreamReader
- Maintained backward compatibility via constructor overloading with default separator values
Reviewed changes
Copilot reviewed 5 out of 5 changed files in this pull request and generated 4 comments.
Show a summary per file
| File | Description |
|---|---|
| CsvTableFactory.java | Parses optional separator from operand map, defaults to comma |
| CsvTable.java | Adds separator field and constructor overload, passes to CsvEnumerator.deduceRowType |
| CsvTranslatableTable.java | Adds constructor overload accepting separator, passes to CsvEnumerator in project method |
| CsvEnumerator.java | Adds separator parameter to constructors and deduceRowType/openCsv methods |
| CsvStreamReader.java | Updates constructor to accept and use custom separator |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| String separatorString = (String) operand.get("separator"); | ||
| char separator = separatorString != null && separatorString.length() > 0 | ||
| ? separatorString.charAt(0) : ','; |
Copilot
AI
Jan 30, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The new separator feature lacks test coverage. Given that the repository has comprehensive test coverage for CSV functionality (as seen in FileAdapterTest.java with tests like testCsvSalesEmps, testCsvSalesDepts, etc.), this new feature should include tests that verify custom separators work correctly. Consider adding tests that use different separators like pipe (|) or tab (\t).
|
|
||
| import au.com.bytecode.opencsv.CSVParser; | ||
|
|
Copilot
AI
Jan 30, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The import for au.com.bytecode.opencsv.CSVParser is placed incorrectly according to the import ordering convention seen in other files in this package. It should appear after all org.apache.calcite imports but before org.checkerframework imports. In CsvTable.java (lines 19-28), the correct ordering is demonstrated: org.apache.calcite imports first, then au.com.bytecode, then org.checkerframework, then java imports.
| String separatorString = (String) operand.get("separator"); | ||
| char separator = separatorString != null && separatorString.length() > 0 | ||
| ? separatorString.charAt(0) : ','; |
Copilot
AI
Jan 30, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The separator parsing does not validate or handle special characters properly. Users might want to specify escape sequences like "\t" for tabs or "\n" for newlines, but the current implementation will treat these as two-character strings and only take the backslash. Consider adding support for common escape sequences or documenting that only single literal characters are supported.
| char separator = separatorString != null && separatorString.length() > 0 | ||
| ? separatorString.charAt(0) : ','; |
Copilot
AI
Jan 30, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The separator validation is insufficient. If a user provides an empty string as the separator, the code will default to a comma due to the length check. However, this silent fallback could mask configuration errors. Consider adding validation to reject empty strings explicitly or documenting this behavior clearly. Additionally, multi-character separators will be silently truncated to the first character, which could be confusing.
| char separator = separatorString != null && separatorString.length() > 0 | |
| ? separatorString.charAt(0) : ','; | |
| final char separator; | |
| if (separatorString == null) { | |
| separator = ','; | |
| } else if (separatorString.length() == 1) { | |
| separator = separatorString.charAt(0); | |
| } else { | |
| throw new IllegalArgumentException( | |
| "Invalid separator '" + separatorString | |
| + "'. Separator must be a single character."); | |
| } |
Title: [FILE] Support custom separator in CSV adapter
Description
This PR adds support for custom field delimiters (separators) in the CSV adapter. previously, the adapter hardcoded the comma (
,) as the separator, preventing the use of other common delimiters like pipes (|) or tabs (\t).Changes
separatorproperty from the operand map.CsvStreamReaderto accept a custom separator in its constructor.,) if no separator is specified, preserving backward compatibility.Example Configuration
Users can now specify a
separatorin their model JSON:{ "version": "1.0", "defaultSchema": "TEST", "schemas": [ { "name": "TEST", "type": "custom", "factory": "org.apache.calcite.adapter.file.CsvTableFactory", "operand": { "file": "data.csv", "separator": "|" } } ] }