Monitoring, Telemetry, Grafana and Alerts

Chronix includes a large monitoring and telemetry layer around the trading system.

Grafana is the main surface for historical and operational dashboards:

connector health;
feed status;
latency metrics;
order round-trip metrics;
rate-limit state;
account and exposure metrics;
strategy business metrics;
algo runtime status;
algo administration views: status, parameters, pauses/stops, inventory, orders, risk events and rate-limit state;
technical service status;
error and event timelines;
historical risk and execution analytics.

Telemetry is not only for engineering. It is a business and operations layer:

traders see whether a workflow is behaving normally;
risk managers see exposure and limit state;
operators see service health and alerts;
engineers diagnose latency, reconnects and failure modes;
quants compare strategy behavior across live and historical runs.

Alerts are primarily handled through Grafana alerting, plus lightweight scripts or notification routing where a deployment needs a custom channel or action. This keeps alerting close to the same metrics and event data used for monitoring. Alerts convert abnormal states into action:

risk alerts;
connector health alerts;
latency and rate-limit alerts;
strategy error alerts;
execution/order-state alerts;
formula-based market condition alerts;
acknowledgement, silence, resolution and escalation flows.

Chronix should not be allowed to sit silently in an abnormal state. The goal is to surface failures, degradation and unsafe conditions quickly enough that a desk can act.