[DO NOT MERGE] Raise Not Applicable exception for tool-based evaluators in case of no tool calls/no tool definitions provided #44470

salma-elshafey · 2025-12-18T12:22:03Z

Description

Please add an informative description that covers that changes made by the pull request and link all relevant issues.

If an SDK is being regenerated based on a new API spec, a link to the pull request containing these API spec changes should be included above.

All SDK Contribution checklist:

The pull request does not introduce [breaking changes]
CHANGELOG is updated for new features, bug fixes or other significant changes.
I have read the contribution guidelines.

General Guidelines and Best Practices

Title of the pull request is clear and informative.
There are a small number of commits, each of which have an informative message. This means that previously merged commits do not appear in the history of the PR. For more information on cleaning up the commits in your PR, see this page.

Testing Guidelines

Pull request includes test coverage for the included changes.

…nts' field + test cases

…tool definitions

Copilot

Pull request overview

This PR changes the behavior of tool-based evaluators (ToolCallAccuracyEvaluator, ToolInputAccuracyEvaluator, and ToolSelectionEvaluator) to raise EvaluationException with the NOT_APPLICABLE error category instead of returning "not applicable" results when tool calls or tool definitions are missing. This makes error handling more explicit and consistent.

Key Changes

Evaluators now raise exceptions instead of silently returning "not applicable" results when encountering missing tool calls, missing tool definitions, or validation errors
Error categories have been refined (e.g., INVALID_VALUE → NOT_APPLICABLE for missing tool definitions, FAILED_EXECUTION → INVALID_VALUE for invalid evaluator outputs)
Added validation for missing 'arguments' field in tool calls with explicit error handling

Reviewed changes

Copilot reviewed 8 out of 8 changed files in this pull request and generated 8 comments.

Show a summary per file

File	Description
`test_tool_selection_evaluator.py`	Updated tests to expect `EvaluationException` with `NOT_APPLICABLE` category when tool calls or definitions are missing
`test_tool_input_accuracy_evaluator.py`	Modified tests to validate exception-based error handling and added test for missing 'arguments' field validation
`test_tool_call_accuracy_evaluator.py`	Converted tests to expect exceptions for missing tools/definitions and added tests for missing 'arguments' field
`_tool_selection.py`	Replaced error message returns with explicit exception raising for missing tool calls and definitions
`_tool_input_accuracy.py`	Changed to raise exceptions for missing response/tool calls/definitions and added 'arguments' field validation
`_tool_call_accuracy.py`	Updated to raise exceptions instead of returning error messages and added validation for 'arguments' field
`_base_prompty_eval.py`	Refined error categories for tool definition validation (INVALID_VALUE → NOT_APPLICABLE/MISSING_FIELD)
`_base_eval.py`	Added `ensure_arguments` parameter to `_parse_tools_from_response` for stricter validation

sdk/evaluation/azure-ai-evaluation/tests/unittests/test_tool_call_accuracy_evaluator.py

Copilot · 2025-12-18T12:32:52Z

sdk/evaluation/azure-ai-evaluation/tests/unittests/test_tool_selection_evaluator.py

+        with pytest.raises(EvaluationException) as exc_info:
+            evaluator(query=query, tool_calls=tool_calls, tool_definitions=tool_definitions)
+
+        # The error message should mention the specific tool that's missing


The comment states "The error message should mention the specific tool that's missing" but the test case is about having no tool calls at all (empty tool_calls list), not about a missing specific tool. The comment would be more accurate if it said something like "The error message should mention that no tool calls were found".

Suggested change

# The error message should mention the specific tool that's missing

# The error message should mention that no tool calls were found

Copilot · 2025-12-18T12:32:52Z

sdk/evaluation/azure-ai-evaluation/tests/unittests/test_tool_input_accuracy_evaluator.py

+        with pytest.raises(EvaluationException) as exc_info:
+            evaluator(query=query, response=response, tool_definitions=tool_definitions)
+
+        # The error message should mention the specific tool that's missing


The comment states "The error message should mention the specific tool that's missing" but the test case is about having no tool calls at all (no tool_calls found in response), not about a missing specific tool. The comment would be more accurate if it said something like "The error message should mention that no tool calls were found".

Suggested change

# The error message should mention the specific tool that's missing

# The error message should mention that no tool calls were found

sdk/evaluation/azure-ai-evaluation/tests/unittests/test_tool_input_accuracy_evaluator.py

sdk/evaluation/azure-ai-evaluation/tests/unittests/test_tool_call_accuracy_evaluator.py

sdk/evaluation/azure-ai-evaluation/tests/unittests/test_tool_input_accuracy_evaluator.py

sdk/evaluation/azure-ai-evaluation/tests/unittests/test_tool_call_accuracy_evaluator.py

…t_applicable_behavior

salma-elshafey added 4 commits December 16, 2025 14:34

Ensure tool calls in tool call accuracy evaluator include the 'argume…

60fb848

…nts' field + test cases

Ensure arguments field in Tool Input Accuracy Evaluator + test case

1802d14

Throw exception in case of insufficient tool definitions provided

cc55e2d

Raise not applicable exception in case of no tool calls/no(t enough) …

0e9a652

…tool definitions

salma-elshafey requested a review from a team as a code owner December 18, 2025 12:22

Copilot AI review requested due to automatic review settings December 18, 2025 12:22

github-actions bot added the Evaluation Issues related to the client library for Azure AI Evaluation label Dec 18, 2025

Copilot started reviewing on behalf of salma-elshafey December 18, 2025 12:22 View session

Copilot AI reviewed Dec 18, 2025

View reviewed changes

Update failed output category to failed execution

446bbbb

salma-elshafey changed the title ~~Raise Not Applicable exception for tool-based evaluators in case of no tool calls/no tool definitions provided~~ [DO NOT MERGE] Raise Not Applicable exception for tool-based evaluators in case of no tool calls/no tool definitions provided Dec 18, 2025

salma-elshafey added 3 commits December 18, 2025 16:11

Add error category verification in tests

8d1e93b

Update test comments

7b6e70e

Merge remote-tracking branch 'upstream/main' into selshafey/update_no…

0e64c0e

…t_applicable_behavior

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[DO NOT MERGE] Raise Not Applicable exception for tool-based evaluators in case of no tool calls/no tool definitions provided #44470

[DO NOT MERGE] Raise Not Applicable exception for tool-based evaluators in case of no tool calls/no tool definitions provided #44470

Uh oh!

salma-elshafey commented Dec 18, 2025

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Copilot AI Dec 18, 2025

Uh oh!

Copilot AI Dec 18, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

	# The error message should mention the specific tool that's missing
	# The error message should mention that no tool calls were found

[DO NOT MERGE] Raise Not Applicable exception for tool-based evaluators in case of no tool calls/no tool definitions provided #44470

Are you sure you want to change the base?

[DO NOT MERGE] Raise Not Applicable exception for tool-based evaluators in case of no tool calls/no tool definitions provided #44470

Uh oh!

Conversation

salma-elshafey commented Dec 18, 2025

Description

All SDK Contribution checklist:

General Guidelines and Best Practices

Testing Guidelines

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Key Changes

Reviewed changes

Uh oh!

Uh oh!

Copilot AI Dec 18, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Dec 18, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants