Skip to content

Conversation

@AkhileshNegi
Copy link
Collaborator

@AkhileshNegi AkhileshNegi commented Jan 22, 2026

Summary

Target issue is #544

Checklist

Before submitting a pull request, please ensure that you mark these task.

  • Ran fastapi run --reload app/main.py or docker compose up in the repository root and test.
  • If you've fixed a bug or added code that is tested and has test cases.

Notes

Please add here if any other information is required for the reviewer.

Summary by CodeRabbit

  • New Features

    • Added question ID tracking across evaluation runs and dataset uploads to group duplicates and maintain consistent per-question identifiers.
    • Evaluation parsing now propagates question IDs into result items when present.
  • Bug Fixes / Backwards Compatibility

    • Missing question IDs are handled gracefully (remain absent/null) to preserve compatibility.
  • Tests

    • Added tests covering question ID propagation, sequencing, duplicate grouping, and missing-ID cases.

✏️ Tip: You can customize this high-level summary in your review settings.

@coderabbitai
Copy link

coderabbitai bot commented Jan 22, 2026

📝 Walkthrough

Walkthrough

Propagate a per-question question_id through evaluation parsing, Langfuse dataset item metadata, trace creation, and trace/score retrieval. IDs are generated sequentially (1-based) per original input and applied consistently to duplicates; parsing and uploads were extended to accept and carry question_id.

Changes

Cohort / File(s) Summary
Langfuse Integration Core
backend/app/crud/evaluations/langfuse.py
Extended upload_item signature to (..., question_id) and include "question_id" in item metadata and trace.metadata; create/upload tasks now include question_id; fetch_trace_scores_from_langfuse initializes and reads question_id from traces; lines changed: +18/-9.
Evaluation Processing
backend/app/crud/evaluations/processing.py
parse_evaluation_output now extracts question_id from dataset_item.metadata.question_id and includes it in each result object; lines changed: +4/-0.
Tests
backend/app/tests/crud/evaluations/test_langfuse.py, backend/app/tests/crud/evaluations/test_processing.py
Added tests validating: metadata propagation of question_id when present/absent, sequential 1-based ID assignment across unique items, consistent IDs for duplicates, and parsing behavior; lines changed: +190/-0.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Possibly related PRs

  • Evaluation: Fix score format #549 — touches the same evaluation pipeline and alters trace/score metadata formats; potentially related to score/metadata shape changes.

Suggested reviewers

  • Prajna1999

Poem

🐰 I hop through traces with a tiny tag,
A number for each question in my bag,
Duplicates march with the same small tune,
Counts start at one beneath the moon,
Metadata gleams — a carrot-sized flag. 🥕

🚥 Pre-merge checks | ✅ 3
✅ Passed checks (3 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title 'Evaluation: Add question id' directly aligns with the main objective of the pull request, which is to propagate and track question identifiers throughout the evaluation system across Langfuse dataset creation, item uploads, and trace data retrieval.
Docstring Coverage ✅ Passed Docstring coverage is 93.33% which is sufficient. The required threshold is 80.00%.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing touches
  • 📝 Generate docstrings

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@AkhileshNegi AkhileshNegi linked an issue Jan 22, 2026 that may be closed by this pull request
@AkhileshNegi AkhileshNegi self-assigned this Jan 22, 2026
@AkhileshNegi AkhileshNegi added the enhancement New feature or request label Jan 22, 2026
@AkhileshNegi AkhileshNegi marked this pull request as ready for review January 22, 2026 03:16
@codecov
Copy link

codecov bot commented Jan 22, 2026

Codecov Report

❌ Patch coverage is 98.87640% with 1 line in your changes missing coverage. Please review.

Files with missing lines Patch % Lines
backend/app/crud/evaluations/langfuse.py 91.66% 1 Missing ⚠️

📢 Thoughts on this report? Let us know!

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
backend/app/crud/evaluations/langfuse.py (1)

254-296: Fix question_id type hint (int vs str).

question_id is assigned as an integer, but the helper signature declares str, which will trip type checkers and contradicts tests.

🐛 Suggested fix
-    def upload_item(item: dict[str, str], duplicate_num: int, question_id: str) -> bool:
+    def upload_item(item: dict[str, str], duplicate_num: int, question_id: int) -> bool:
🧹 Nitpick comments (3)
backend/app/tests/crud/evaluations/test_processing.py (1)

70-98: Add a type annotation to the new test method signature.

Per the project requirement, parameters should be typed; this applies to test methods too.

♻️ Suggested update
-from typing import Any
+from typing import Any, Self
@@
-    def test_parse_evaluation_output_without_question_id(self) -> None:
+    def test_parse_evaluation_output_without_question_id(self: Self) -> None:
As per coding guidelines, please add type hints to all function parameters.
backend/app/tests/crud/evaluations/test_langfuse.py (1)

499-571: Type annotate valid_items in the new tests.

The new test methods introduce untyped parameters.

♻️ Suggested update
-    def test_upload_dataset_to_langfuse_question_id_in_metadata(self, valid_items):
+    def test_upload_dataset_to_langfuse_question_id_in_metadata(
+        self, valid_items: list[dict[str, str]]
+    ) -> None:
@@
-    def test_upload_dataset_to_langfuse_same_question_id_for_duplicates(
-        self, valid_items
-    ):
+    def test_upload_dataset_to_langfuse_same_question_id_for_duplicates(
+        self, valid_items: list[dict[str, str]]
+    ) -> None:
As per coding guidelines, please add type hints to all function parameters.
backend/app/crud/evaluations/langfuse.py (1)

424-447: Keep question_id type consistent in fetched trace data.

Defaulting to "" introduces a str/int mix; prefer None when absent to align with upstream.

♻️ Suggested update
-                    "question_id": "",
+                    "question_id": None,
@@
-                    trace_data["question_id"] = trace.metadata.get("question_id", "")
+                    trace_data["question_id"] = trace.metadata.get("question_id")

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
backend/app/crud/evaluations/langfuse.py (1)

256-297: Add question_id field to TraceData TypedDict and normalize type to int.

The question_id field is missing from the TraceData TypedDict definition in backend/app/crud/evaluations/score.py (lines 20–27), yet the code attempts to set it at lines 421–427 and 449. Additionally, the function parameter is annotated as str (line 256) but assigned as int from enumerate (line 287), then defaulted to "" string (line 426) when reading.

Fix:

  1. Add question_id: int to the TraceData TypedDict (score.py)
  2. Change the parameter annotation from question_id: str to question_id: int (langfuse.py, line 256)
  3. Initialize trace_data["question_id"] = 0 instead of "" (langfuse.py, line 426)
  4. Convert the metadata value to int on read: trace_data["question_id"] = int(trace.metadata.get("question_id", 0)) if trace.metadata.get("question_id") else 0 (langfuse.py, line 449)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Evaluation: Parent ID to run traces

2 participants