Skip to content

Conversation

@fangchenli
Copy link
Member

@fangchenli fangchenli commented Dec 23, 2025

  • closes #xxxx (Replace xxxx with the GitHub issue number)
  • Tests added and passed if fixing a bug or adding a new feature
  • All code checks passed.
  • Added type annotations to new arguments/methods/functions.
  • Added an entry in the latest doc/source/whatsnew/vX.X.X.rst file if fixing a bug or adding a new feature.
  • If I used AI to develop this pull request, I prompted it to follow AGENTS.md.
Accessor Before (ms) After (ms) Speedup
.dt.days 303.669 0.319 952.6x
.dt.seconds 310.434 0.257 1205.8x
.dt.microseconds 314.680 0.602 522.7x
.dt.nanoseconds 330.143 0.576 573.5x
.dt.components 1149.894 4.113 279.5x

It was run on an M1 MacBook Pro. Could someone verify this improvement on a more powerful machine?

Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR implements a significant performance optimization for duration component calculations on Arrow-backed timedelta arrays. Instead of converting to NumPy arrays and back, the new implementation uses native PyArrow compute functions to extract duration components directly, resulting in speedups of 200-1200x for various accessor operations.

Key changes:

  • Implemented Arrow-native duration component extraction using PyArrow compute kernels
  • Added floor division logic to correctly handle negative durations matching pandas semantics
  • Introduced caching for the day remainder calculation with proper invalidation

Reviewed changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated no comments.

File Description
pandas/core/arrays/arrow/array.py Implements Arrow-native duration component extraction methods with floor division for negative values, cached day remainder calculation, and optimized _dt_components method
pandas/core/indexes/accessors.py Updates the components property to use the new _dt_components method when available for Arrow arrays
pandas/tests/extension/test_arrow.py Adds comprehensive tests covering positive/negative durations, null handling, component extraction, Python timedelta semantics, and cache invalidation
asv_bench/benchmarks/timedelta.py Adds benchmarks comparing NumPy and PyArrow backends for duration component operations

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@fangchenli fangchenli marked this pull request as ready for review December 23, 2025 23:13
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant