Skip to content

BUG: specifying unit in to_datetime when parsing strings causes different error behaviour #63472

@jorisvandenbossche

Description

@jorisvandenbossche

Consider the case of "mixed" iso strings. Nowadays that gives an error, but you can then specify format to indicate you know it is all ISO strings:

>>> pd.to_datetime(["2025-01-01T00:00:00.000000", "2025-01-01T01:00:00"])
...
ValueError: time data "2025-01-01T01:00:00" doesn't match format "%Y-%m-%dT%H:%M:%S.%f". You might want to try:
    - passing `format` if your strings have a consistent format;
    - passing `format='ISO8601'` if your strings are all ISO8601 but not necessarily in exactly the same format;
    - passing `format='mixed'`, and the format will be inferred for each element individually. You might want to use `dayfirst` alongside this.

>>> pd.to_datetime(["2025-01-01T00:00:00.000000", "2025-01-01T01:00:00"], format="ISO8601")
DatetimeIndex(['2025-01-01 00:00:00', '2025-01-01 01:00:00'], dtype='datetime64[us]', freq=None)

Strangely, passing a unit (which is generally ignored when not parsing numeric input), the error is not given, and it parses the input fine by default, but also gives ns instead of us:

>>> pd.to_datetime(["2025-01-01T00:00:00.000000", "2025-01-01T01:00:00"], unit="s")
DatetimeIndex(['2025-01-01 00:00:00', '2025-01-01 01:00:00'], dtype='datetime64[ns]', freq=None)

cc @jbrockmendel

Metadata

Metadata

Assignees

No one assigned

    Labels

    BugNon-Nanodatetime64/timedelta64 with non-nanosecond resolutionTimestamppd.Timestamp and associated methods

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions