Skip to content

Conversation

@salma-elshafey
Copy link
Contributor

Description

Please add an informative description that covers that changes made by the pull request and link all relevant issues.

If an SDK is being regenerated based on a new API spec, a link to the pull request containing these API spec changes should be included above.

All SDK Contribution checklist:

  • The pull request does not introduce [breaking changes]
  • CHANGELOG is updated for new features, bug fixes or other significant changes.
  • I have read the contribution guidelines.

General Guidelines and Best Practices

  • Title of the pull request is clear and informative.
  • There are a small number of commits, each of which have an informative message. This means that previously merged commits do not appear in the history of the PR. For more information on cleaning up the commits in your PR, see this page.

Testing Guidelines

  • Pull request includes test coverage for the included changes.

@salma-elshafey salma-elshafey requested a review from a team as a code owner December 18, 2025 12:22
Copilot AI review requested due to automatic review settings December 18, 2025 12:22
@github-actions github-actions bot added the Evaluation Issues related to the client library for Azure AI Evaluation label Dec 18, 2025
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR changes the behavior of tool-based evaluators (ToolCallAccuracyEvaluator, ToolInputAccuracyEvaluator, and ToolSelectionEvaluator) to raise EvaluationException with the NOT_APPLICABLE error category instead of returning "not applicable" results when tool calls or tool definitions are missing. This makes error handling more explicit and consistent.

Key Changes

  • Evaluators now raise exceptions instead of silently returning "not applicable" results when encountering missing tool calls, missing tool definitions, or validation errors
  • Error categories have been refined (e.g., INVALID_VALUENOT_APPLICABLE for missing tool definitions, FAILED_EXECUTIONINVALID_VALUE for invalid evaluator outputs)
  • Added validation for missing 'arguments' field in tool calls with explicit error handling

Reviewed changes

Copilot reviewed 8 out of 8 changed files in this pull request and generated 8 comments.

Show a summary per file
File Description
test_tool_selection_evaluator.py Updated tests to expect EvaluationException with NOT_APPLICABLE category when tool calls or definitions are missing
test_tool_input_accuracy_evaluator.py Modified tests to validate exception-based error handling and added test for missing 'arguments' field validation
test_tool_call_accuracy_evaluator.py Converted tests to expect exceptions for missing tools/definitions and added tests for missing 'arguments' field
_tool_selection.py Replaced error message returns with explicit exception raising for missing tool calls and definitions
_tool_input_accuracy.py Changed to raise exceptions for missing response/tool calls/definitions and added 'arguments' field validation
_tool_call_accuracy.py Updated to raise exceptions instead of returning error messages and added validation for 'arguments' field
_base_prompty_eval.py Refined error categories for tool definition validation (INVALID_VALUE → NOT_APPLICABLE/MISSING_FIELD)
_base_eval.py Added ensure_arguments parameter to _parse_tools_from_response for stricter validation

with pytest.raises(EvaluationException) as exc_info:
evaluator(query=query, tool_calls=tool_calls, tool_definitions=tool_definitions)

# The error message should mention the specific tool that's missing
Copy link

Copilot AI Dec 18, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The comment states "The error message should mention the specific tool that's missing" but the test case is about having no tool calls at all (empty tool_calls list), not about a missing specific tool. The comment would be more accurate if it said something like "The error message should mention that no tool calls were found".

Suggested change
# The error message should mention the specific tool that's missing
# The error message should mention that no tool calls were found

Copilot uses AI. Check for mistakes.
with pytest.raises(EvaluationException) as exc_info:
evaluator(query=query, response=response, tool_definitions=tool_definitions)

# The error message should mention the specific tool that's missing
Copy link

Copilot AI Dec 18, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The comment states "The error message should mention the specific tool that's missing" but the test case is about having no tool calls at all (no tool_calls found in response), not about a missing specific tool. The comment would be more accurate if it said something like "The error message should mention that no tool calls were found".

Suggested change
# The error message should mention the specific tool that's missing
# The error message should mention that no tool calls were found

Copilot uses AI. Check for mistakes.
@salma-elshafey salma-elshafey changed the title Raise Not Applicable exception for tool-based evaluators in case of no tool calls/no tool definitions provided [DO NOT MERGE] Raise Not Applicable exception for tool-based evaluators in case of no tool calls/no tool definitions provided Dec 18, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Evaluation Issues related to the client library for Azure AI Evaluation

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants