-
Notifications
You must be signed in to change notification settings - Fork 3.2k
[DO NOT MERGE] Raise Not Applicable exception for tool-based evaluators in case of no tool calls/no tool definitions provided #44470
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
…nts' field + test cases
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
This PR changes the behavior of tool-based evaluators (ToolCallAccuracyEvaluator, ToolInputAccuracyEvaluator, and ToolSelectionEvaluator) to raise EvaluationException with the NOT_APPLICABLE error category instead of returning "not applicable" results when tool calls or tool definitions are missing. This makes error handling more explicit and consistent.
Key Changes
- Evaluators now raise exceptions instead of silently returning "not applicable" results when encountering missing tool calls, missing tool definitions, or validation errors
- Error categories have been refined (e.g.,
INVALID_VALUE→NOT_APPLICABLEfor missing tool definitions,FAILED_EXECUTION→INVALID_VALUEfor invalid evaluator outputs) - Added validation for missing 'arguments' field in tool calls with explicit error handling
Reviewed changes
Copilot reviewed 8 out of 8 changed files in this pull request and generated 8 comments.
Show a summary per file
| File | Description |
|---|---|
test_tool_selection_evaluator.py |
Updated tests to expect EvaluationException with NOT_APPLICABLE category when tool calls or definitions are missing |
test_tool_input_accuracy_evaluator.py |
Modified tests to validate exception-based error handling and added test for missing 'arguments' field validation |
test_tool_call_accuracy_evaluator.py |
Converted tests to expect exceptions for missing tools/definitions and added tests for missing 'arguments' field |
_tool_selection.py |
Replaced error message returns with explicit exception raising for missing tool calls and definitions |
_tool_input_accuracy.py |
Changed to raise exceptions for missing response/tool calls/definitions and added 'arguments' field validation |
_tool_call_accuracy.py |
Updated to raise exceptions instead of returning error messages and added validation for 'arguments' field |
_base_prompty_eval.py |
Refined error categories for tool definition validation (INVALID_VALUE → NOT_APPLICABLE/MISSING_FIELD) |
_base_eval.py |
Added ensure_arguments parameter to _parse_tools_from_response for stricter validation |
sdk/evaluation/azure-ai-evaluation/tests/unittests/test_tool_call_accuracy_evaluator.py
Show resolved
Hide resolved
| with pytest.raises(EvaluationException) as exc_info: | ||
| evaluator(query=query, tool_calls=tool_calls, tool_definitions=tool_definitions) | ||
|
|
||
| # The error message should mention the specific tool that's missing |
Copilot
AI
Dec 18, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The comment states "The error message should mention the specific tool that's missing" but the test case is about having no tool calls at all (empty tool_calls list), not about a missing specific tool. The comment would be more accurate if it said something like "The error message should mention that no tool calls were found".
| # The error message should mention the specific tool that's missing | |
| # The error message should mention that no tool calls were found |
| with pytest.raises(EvaluationException) as exc_info: | ||
| evaluator(query=query, response=response, tool_definitions=tool_definitions) | ||
|
|
||
| # The error message should mention the specific tool that's missing |
Copilot
AI
Dec 18, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The comment states "The error message should mention the specific tool that's missing" but the test case is about having no tool calls at all (no tool_calls found in response), not about a missing specific tool. The comment would be more accurate if it said something like "The error message should mention that no tool calls were found".
| # The error message should mention the specific tool that's missing | |
| # The error message should mention that no tool calls were found |
Description
Please add an informative description that covers that changes made by the pull request and link all relevant issues.
If an SDK is being regenerated based on a new API spec, a link to the pull request containing these API spec changes should be included above.
All SDK Contribution checklist:
General Guidelines and Best Practices
Testing Guidelines