Skip to content

Conversation

@devanshu0987
Copy link

@devanshu0987 devanshu0987 commented Jan 4, 2026

Which issue does this PR close?

  • Closes #.

Rationale for this change

In Postgres translate function implementation, the duplicates in the from argument are ignored, and the first occurrence wins.

A similar implementation is also present in DuckDB.
If the character already exists, the index/mapping is not updated.
https://github.com/duckdb/duckdb/blob/4b7a6b7bd0f8c968bfecab08e801cdc1f0a5cdfd/extension/core_functions/scalar/string/translate.cpp#L45

Before the change in DataFusion

> SELECT translate('abcabc', 'aa', 'de');
+-------------------------------------------------+
| translate(Utf8("abcabc"),Utf8("aa"),Utf8("de")) |
+-------------------------------------------------+
| ebcebc                                          |
+-------------------------------------------------+
1 row(s) fetched. 
Elapsed 0.001 seconds.

While DuckDB returns

D SELECT translate('abcabc', 'aa', 'de');
┌─────────────────────────────────┐
│ translate('abcabc', 'aa', 'de') │
│             varchar             │
├─────────────────────────────────┤
│ dbcdbc                          │
└─────────────────────────────────┘

Postgres returns

SELECT translate('abcabc', 'aa', 'de')

Output:

 translate 
-----------
 dbcdbc
(1 row)

What changes are included in this PR?

  • If there are duplicate characters present in from, the first occurrence wins.
  • SLT Tests update. The earlier tests from value was foo which contained duplicates and was mapping o -> r instead of Postgress compatible o -> a

Are these changes tested?

  • New Unit Tests are added, which test this behaviour.

Are there any user-facing changes?

This is a contract change in some sense. If someone has taken dependency on this behaviour, they will encounter a change. However, I am unsure and need help to properly categorize this.

@github-actions github-actions bot added the functions Changes to functions implementation label Jan 4, 2026
@devanshu0987 devanshu0987 changed the title fix: Make translate function postgres compatible fix(functions): Make translate function postgres compatible Jan 4, 2026
@Jefffrey
Copy link
Contributor

Jefffrey commented Jan 4, 2026

Looks like this affects some existing tests which are now failing; would you be able to double check these against DuckDB & Postgres too?

@github-actions github-actions bot added the sqllogictest SQL Logic Tests (.slt) label Jan 4, 2026
@devanshu0987
Copy link
Author

  • I ran the following script on Postgres 18.
  • I copied the data that we have in SLT tests and ran the same queries that are run inside tests.
  • We can notice that the result is the same as our SLT output updates.
  • The logic change is that the earlier tests from value was foo which contained duplicates and was mapping o -> r instead of Postgres compatible o -> a
CREATE TABLE test_source (
  ascii_1 TEXT,
  ascii_2 TEXT,
  unicode_1 TEXT,
  unicode_2 TEXT
);
select * from test_source;
INSERT INTO test_source values ('Andrew', 'X', 'datafusion📊🔥', '🔥');
INSERT INTO test_source values ('Xiangpeng', 'Xiangpeng', 'datafusion数据融合', 'datafusion数据融合');
INSERT INTO test_source values ('Raphael', 'R', 'datafusionДатаФусион', 'аФус');
INSERT INTO test_source values ('under_score', 'un_____core', 'un iść core', 'chrząszcz na łące w 東京都');
INSERT INTO test_source values ('percent', 'p%t', 'pan Tadeusz ma iść w kąt', 'Pan Tadeusz ma frunąć stąd w kąt');
INSERT INTO test_source values ('', '%', '', '');
INSERT INTO test_source values ('', '%%', '', '');
INSERT INTO test_source values ('%', '\%', '', '');
INSERT INTO test_source values ('_', '\_', '', '');
INSERT INTO test_source values (NULL, '%', NULL, NULL);
INSERT INTO test_source values (NULL, 'R', NULL, '🔥');

SELECT
  TRANSLATE(ascii_1, 'foo', 'bar') as c
FROM test_source;

c          |
-----------+
Andrew     |
Xiangpeng  |
Raphael    |
under_scare| ---> under_sc[o]re changed to under_sc[a]re
percent    |
           |
           |
%          |
_          |
           |
           |

SELECT
  TRANSLATE(unicode_1, 'foo', 'bar') as c
FROM test_source;

c                       |
------------------------+
databusian📊🔥          | ---> datafusi[o]n changed to databusi[a]n and everything else remains same
databusian数据融合          |
databusianДатаФусион    |
un iść care             | ---> un_____core where c[o]re changed to c[a]re
pan Tadeusz ma iść w kąt|
                        |
                        |
                        |
                        |
                        |
                        |

@devanshu0987
Copy link
Author

  • Results from DuckDB
CREATE TABLE test_source (
  ascii_1   TEXT,
  ascii_2   TEXT,
  unicode_1 TEXT,
  unicode_2 TEXT
);

INSERT INTO test_source VALUES
  ('Andrew', 'X', 'datafusion📊🔥', '🔥'),
  ('Xiangpeng', 'Xiangpeng', 'datafusion数据融合', 'datafusion数据融合'),
  ('Raphael', 'R', 'datafusionДатаФусион', 'аФус'),
  ('under_score', 'un_____core', 'un iść core', 'chrząszcz na łące w 東京都'),
  ('percent', 'p%t', 'pan Tadeusz ma iść w kąt', 'Pan Tadeusz ma frunąć stąd w kąt'),
  ('', '%', '', ''),
  ('', '%%', '', ''),
  ('%', '\%', '', ''),
  ('_', '\_', '', ''),
  (NULL, '%', NULL, NULL),
  (NULL, 'R', NULL, '🔥');

SELECT * FROM test_source;

SELECT
  TRANSLATE(ascii_1, 'foo', 'bar') AS c
FROM test_source;

c          |
-----------+
Andrew     |
Xiangpeng  |
Raphael    |
under_scare|
percent    |
           |
           |
%          |
_          |
           |
           |

SELECT
  TRANSLATE(unicode_1, 'foo', 'bar') AS c
FROM test_source;

c                       |
------------------------+
databusian📊🔥          |
databusian数据融合          |
databusianДатаФусион    |
un iść care             |
pan Tadeusz ma iść w kąt|
                        |
                        |
                        |
                        |
                        |
                        |

@devanshu0987 devanshu0987 reopened this Jan 4, 2026
@devanshu0987
Copy link
Author

Looks like this affects some existing tests which are now failing; would you be able to double check these against DuckDB & Postgres too?

Hi @Jefffrey , I double checked it against DuckDB and Postgres, and the results are the same.
Can you take another look at the PR, please?
Thanks.

@devanshu0987 devanshu0987 force-pushed the make_translate_postgres_compatible branch from 1fccd03 to e2c3643 Compare January 4, 2026 05:13
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

functions Changes to functions implementation sqllogictest SQL Logic Tests (.slt)

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants