Monitoring, Telemetry, Grafana and Alerts
Chronix includes a large monitoring and telemetry layer around the trading system.
Grafana is the main surface for historical and operational dashboards:
- connector health;
- feed status;
- latency metrics;
- order round-trip metrics;
- rate-limit state;
- account and exposure metrics;
- strategy business metrics;
- algo runtime status;
- algo administration views: status, parameters, pauses/stops, inventory, orders, risk events and rate-limit state;
- technical service status;
- error and event timelines;
- historical risk and execution analytics.
Telemetry is not only for engineering. It is a business and operations layer:
- traders see whether a workflow is behaving normally;
- risk managers see exposure and limit state;
- operators see service health and alerts;
- engineers diagnose latency, reconnects and failure modes;
- quants compare strategy behavior across live and historical runs.
Alerts are primarily handled through Grafana alerting, plus lightweight scripts or notification routing where a deployment needs a custom channel or action. This keeps alerting close to the same metrics and event data used for monitoring. Alerts convert abnormal states into action:
- risk alerts;
- connector health alerts;
- latency and rate-limit alerts;
- strategy error alerts;
- execution/order-state alerts;
- formula-based market condition alerts;
- acknowledgement, silence, resolution and escalation flows.
Chronix should not be allowed to sit silently in an abnormal state. The goal is to surface failures, degradation and unsafe conditions quickly enough that a desk can act.