Skip to content

Conversation

@sgmv
Copy link
Contributor

@sgmv sgmv commented Jan 28, 2026

This PR migrates the reporter from Status Dashboard API V1 to V2 for sending incidents. The migration introduces component ID resolution via a cached lookup system and updates the incident data structure to match the V2 API contract.

Changes

Core Migration

  • Replace /v1/component_status endpoint with /v2/incidents endpoint for incident creation
  • Implement new V2 incident data structure with fields: title, description, impact, components, start_date, system, and type
  • Use static title ("System incident from monitoring system") and description ("System-wide incident affecting one or multiple components. Created automatically.") to avoid exposing sensitive operational data on the public Status Dashboard

Component Cache System

  • Add component cache that fetches from /v2/components at startup
  • Map (component name, attributes) to component ID with subset attribute matching
  • Auto-refresh cache when a component is not found
  • Retry initial cache load up to 3 times with 60-second delays

Configuration Updates

  • Increase HTTP timeout from 2s to 5s to accommodate V2 API response times
  • No changes required to existing configuration file format or authorization mechanism (HMAC-signed JWT)

Logging Enhancements

  • Log comprehensive diagnostic details locally: timestamp, service name, environment, component details, impact value, and triggered metrics with values
  • Diagnostic details are intentionally excluded from API requests for security

Behavioral Notes

  • Creates new incident request for every detection; relies on Status Dashboard's duplicate handling
  • Continues monitoring other services if incident creation fails for one service
  • Fails to start only if initial component cache load fails after all retries

Testing

  • Verify incidents are created via V2 API with correct component IDs
  • Verify component cache refreshes correctly when components are added
  • Verify existing HMAC authorization works with V2 endpoints

sgmv and others added 30 commits January 20, 2026 17:38
- Define 6 prioritized user stories for test coverage
- Specify 25 functional requirements across 5 categories
- Target 95% coverage for core business functions
- Include 27 acceptance scenarios with Given-When-Then format
- Define 15 measurable success criteria
- Complete quality validation with all checklist items passing

This spec enables safe refactoring and provides regression protection
for the metrics-processor codebase.
- Phase 6: Configuration Processing Tests (T037-T047)
  * Template variable substitution and environment expansion
  * Threshold overrides and dash-to-underscore conversion
  * Service set population and expression copying
  * Config validation and multi-source loading
  * All 11 tests passing with 100% config.rs coverage

- Phase 7: API Endpoint Tests (T048-T060)
  * API v1 root, info, and health endpoints
  * Graphite compatibility endpoints (functions, tags, render)
  * Integration tests with mocked Graphite backend
  * Error response format validation
  * 10/13 tests complete with integration coverage

- Phase 9: Coverage & Documentation (T071-T080)
  * Overall library coverage: 71.56% (307/429 lines)
  * Core business functions: 89.9% coverage
  * Test execution time: < 1 second (target: < 2 minutes)
  * Comprehensive testing guide in docs/TESTING.md
  * Test count: 52 tests (target: ≥50 tests)

Test Results:
- Library tests: 44 passing
- Integration tests: 8 passing
- Total: 52 tests passing
- Execution time: < 0.2 seconds

Coverage by Module:
- src/config.rs: 100.0% ✅
- src/common.rs: 89.3% ✅
- src/types.rs: 82.6% ✅
- src/api/v1.rs: 74.4%
- src/graphite.rs: 56.8%

Phase 8 (Graphite Integration Tests T061-T070) mostly covered by
existing integration tests and unit tests in graphite.rs module.
@sgmv sgmv requested a review from bakhterets January 28, 2026 13:45
@sgmv sgmv self-assigned this Jan 28, 2026
sgmv added 11 commits January 28, 2026 14:59
# Conflicts:
#	.github/workflows/ci.yml
#	Makefile
#	src/api/v1.rs
#	src/config.rs
#	src/graphite.rs
#	src/types.rs
#	tests/fixtures/configs.rs
#	tests/fixtures/graphite_responses.rs
#	tests/fixtures/helpers.rs
#	tests/integration_api.rs
#	tests/integration_health.rs
bakhterets
bakhterets previously approved these changes Jan 29, 2026
Copy link

@bakhterets bakhterets left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

tested

@bakhterets
Copy link

leave one source with a E2E testing diagram: (doc or comment)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants