Skip to content

Conversation

@Diveyam-Mishra
Copy link

Title: [FILE] Support custom separator in CSV adapter

Description

This PR adds support for custom field delimiters (separators) in the CSV adapter. previously, the adapter hardcoded the comma (,) as the separator, preventing the use of other common delimiters like pipes (|) or tabs (\t).

Changes

  • Modified [CsvTableFactory] to parse an optional separator property from the operand map.
  • Updated [CsvTable], [CsvTranslatableTable], and [CsvEnumerator] to propagate the separator character down to the reader.
  • Updated CsvStreamReader to accept a custom separator in its constructor.
  • Defaults to comma (,) if no separator is specified, preserving backward compatibility.

Example Configuration

Users can now specify a separator in their model JSON:

{
  "version": "1.0",
  "defaultSchema": "TEST",
  "schemas": [
    {
      "name": "TEST",
      "type": "custom",
      "factory": "org.apache.calcite.adapter.file.CsvTableFactory",
      "operand": {
        "file": "data.csv",
        "separator": "|"
      }
    }
  ]
}

Copilot AI review requested due to automatic review settings January 30, 2026 18:18
@mihaibudiu
Copy link
Contributor

Thank you for your contribution.
For most substantive contributions to Calcite the workflow requires you to first file an issue using the Calcite JIRA: https://issues.apache.org/jira/projects/CALCITE/issues; then you can submit a PR.
Can you browse the existing issues? There may be one already for this feature.
After you have filed an issue your commit message and PR have to refer to the issue that is being solved: see this documentation https://calcite.apache.org/develop/

Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR adds support for custom field separators (delimiters) in the CSV adapter, addressing a limitation where the adapter previously hardcoded comma as the only separator. The change enables users to specify alternative delimiters like pipes or tabs through the model JSON configuration.

Changes:

  • Added separator parameter parsing in CsvTableFactory with comma as default
  • Propagated separator character through CsvTable, CsvTranslatableTable, CsvEnumerator, and CsvStreamReader
  • Maintained backward compatibility via constructor overloading with default separator values

Reviewed changes

Copilot reviewed 5 out of 5 changed files in this pull request and generated 4 comments.

Show a summary per file
File Description
CsvTableFactory.java Parses optional separator from operand map, defaults to comma
CsvTable.java Adds separator field and constructor overload, passes to CsvEnumerator.deduceRowType
CsvTranslatableTable.java Adds constructor overload accepting separator, passes to CsvEnumerator in project method
CsvEnumerator.java Adds separator parameter to constructors and deduceRowType/openCsv methods
CsvStreamReader.java Updates constructor to accept and use custom separator

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +51 to +53
String separatorString = (String) operand.get("separator");
char separator = separatorString != null && separatorString.length() > 0
? separatorString.charAt(0) : ',';
Copy link

Copilot AI Jan 30, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The new separator feature lacks test coverage. Given that the repository has comprehensive test coverage for CSV functionality (as seen in FileAdapterTest.java with tests like testCsvSalesEmps, testCsvSalesDepts, etc.), this new feature should include tests that verify custom separators work correctly. Consider adding tests that use different separators like pipe (|) or tab (\t).

Copilot uses AI. Check for mistakes.
Comment on lines +28 to +30

import au.com.bytecode.opencsv.CSVParser;

Copy link

Copilot AI Jan 30, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The import for au.com.bytecode.opencsv.CSVParser is placed incorrectly according to the import ordering convention seen in other files in this package. It should appear after all org.apache.calcite imports but before org.checkerframework imports. In CsvTable.java (lines 19-28), the correct ordering is demonstrated: org.apache.calcite imports first, then au.com.bytecode, then org.checkerframework, then java imports.

Copilot uses AI. Check for mistakes.
Comment on lines +51 to +53
String separatorString = (String) operand.get("separator");
char separator = separatorString != null && separatorString.length() > 0
? separatorString.charAt(0) : ',';
Copy link

Copilot AI Jan 30, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The separator parsing does not validate or handle special characters properly. Users might want to specify escape sequences like "\t" for tabs or "\n" for newlines, but the current implementation will treat these as two-character strings and only take the backslash. Consider adding support for common escape sequences or documenting that only single literal characters are supported.

Copilot uses AI. Check for mistakes.
Comment on lines +52 to +53
char separator = separatorString != null && separatorString.length() > 0
? separatorString.charAt(0) : ',';
Copy link

Copilot AI Jan 30, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The separator validation is insufficient. If a user provides an empty string as the separator, the code will default to a comma due to the length check. However, this silent fallback could mask configuration errors. Consider adding validation to reject empty strings explicitly or documenting this behavior clearly. Additionally, multi-character separators will be silently truncated to the first character, which could be confusing.

Suggested change
char separator = separatorString != null && separatorString.length() > 0
? separatorString.charAt(0) : ',';
final char separator;
if (separatorString == null) {
separator = ',';
} else if (separatorString.length() == 1) {
separator = separatorString.charAt(0);
} else {
throw new IllegalArgumentException(
"Invalid separator '" + separatorString
+ "'. Separator must be a single character.");
}

Copilot uses AI. Check for mistakes.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants