Conversation
📝 WalkthroughWalkthroughAdded an optional Changes
Sequence Diagram(s)(omitted — changes are small data-shape addition plus test coverage; no multi-component control-flow requiring visualization) Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes Possibly related PRs
Suggested labels
Suggested reviewers
Poem
🚥 Pre-merge checks | ✅ 3✅ Passed checks (3 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing touches
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
|
Yes this was there and also one more thing make sure in /backend/app/crud/evaluations/langfuse.py create_langfuse_dataset_run function It should not be It should be |
Codecov Report✅ All modified and coverable lines are covered by tests. 📢 Thoughts on this report? Let us know! |
There was a problem hiding this comment.
Actionable comments posted: 1
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
backend/app/crud/evaluations/score.py (1)
20-27: Fixquestion_idtype contract:intannotation conflicts with""(str) runtime values.
TraceData.question_idis typed asintbut initialized and retrieved as empty string""throughout the codebase:
- Initialized as empty string in langfuse.py line 426
- Retrieved with empty string default in langfuse.py line 449
- Checked explicitly for empty string in core.py line 378
Additionally, the grouping logic in core.py line 391 (
sorted(groups.keys())) will fail at runtime with a TypeError when trace data contains mixed int and str values forquestion_id, as the dict key type is annotated asintbut receives str values from fetched traces.Align the type to match runtime behavior: change
question_id: inttoquestion_id: int | Noneorquestion_id: str | int, or enforce conversion to int in the fetcher and initialize toNoneinstead of"".Example fix (int | None only)
class TraceData(TypedDict): """Data for a single trace including Q&A and scores.""" trace_id: str question: str llm_answer: str - question_id: int + question_id: int | None ground_truth_answer: str scores: list[TraceScore]
🤖 Fix all issues with AI agents
In `@backend/app/tests/crud/evaluations/test_langfuse.py`:
- Around line 712-742: Update the tests to match the declared
TraceData.question_id contract (int | None) and the normalization performed in
fetch_trace_scores_from_langfuse: when a trace has no question_id, assert
trace["question_id"] is None (not ""), and for non-int question_id values assert
they are converted to int (or None on failure) consistent with TraceData; adjust
expectations in test_fetch_trace_scores_without_question_id and
test_fetch_trace_scores_mixed_question_id_types to reference this behavior and
the fetch_trace_scores_from_langfuse normalization logic.
| def test_fetch_trace_scores_without_question_id(self) -> None: | ||
| """Test fetching traces without question_id (backwards compatibility).""" | ||
| mock_langfuse = MagicMock() | ||
|
|
||
| # Mock dataset run | ||
| mock_run_item = MagicMock() | ||
| mock_run_item.trace_id = "trace_1" | ||
| mock_dataset_run = MagicMock() | ||
| mock_dataset_run.dataset_run_items = [mock_run_item] | ||
| mock_langfuse.api.datasets.get_run.return_value = mock_dataset_run | ||
|
|
||
| # Mock trace without question_id in metadata | ||
| mock_trace = MagicMock() | ||
| mock_trace.input = {"question": "What is 2+2?"} | ||
| mock_trace.output = {"answer": "4"} | ||
| mock_trace.metadata = {"ground_truth": "4"} # No question_id | ||
| mock_trace.scores = [] | ||
|
|
||
| mock_langfuse.api.trace.get.return_value = mock_trace | ||
|
|
||
| result = fetch_trace_scores_from_langfuse( | ||
| langfuse=mock_langfuse, | ||
| dataset_name="test_dataset", | ||
| run_name="test_run", | ||
| ) | ||
|
|
||
| # Verify trace has empty string for question_id | ||
| assert len(result["traces"]) == 1 | ||
| trace = result["traces"][0] | ||
| assert trace["question_id"] == "" | ||
| assert trace["trace_id"] == "trace_1" |
There was a problem hiding this comment.
Tests allow non-int question_id, conflicting with the declared contract.
test_fetch_trace_scores_without_question_id expects an empty string, and test_fetch_trace_scores_mixed_question_id_types expects a string. This conflicts with TraceData.question_id: int and the PR objective. Align tests with the final contract (e.g., int | None with normalization) or widen the type if strings are supported.
Also applies to: 979-1018
🤖 Prompt for AI Agents
In `@backend/app/tests/crud/evaluations/test_langfuse.py` around lines 712 - 742,
Update the tests to match the declared TraceData.question_id contract (int |
None) and the normalization performed in fetch_trace_scores_from_langfuse: when
a trace has no question_id, assert trace["question_id"] is None (not ""), and
for non-int question_id values assert they are converted to int (or None on
failure) consistent with TraceData; adjust expectations in
test_fetch_trace_scores_without_question_id and
test_fetch_trace_scores_mixed_question_id_types to reference this behavior and
the fetch_trace_scores_from_langfuse normalization logic.
Earlier
question_idwas not in the response for GET evaluations/{evaluation_id} endpoint.In the
app/crud/evaluations/score.pyThis pydantic model did not have
question_idearlier, hence in thefetch_trace_scores_from_langfusefunction where the model was used omitting the concerned key.Summary by CodeRabbit
New Features
Tests
✏️ Tip: You can customize this high-level summary in your review settings.