System Functions

Alerts

6 min

commandit alerts overview 1\ purpose of alerting the commandit alerting system is designed to provide timely notification of events, potential issues, or critical conditions occurring within managed client environments or the commandit platform itself by monitoring infrastructure, applications, security events, and backups, the alerting system enables proactive response, faster troubleshooting, automated remediation, and ultimately contributes to maintaining service levels and system health 2\ how alerts work in commandit the process involves generating alert events, processing them centrally, tracking their state, and triggering appropriate actions or notifications 2 1 alert generation sources alerts can originate from various sources integrated within commandit monitoring rules defined checks for performance counters (cpu, ram, disk), service states, event logs, ping responses, hardware health, etc , configured via monitoring policies ( monitoringrules ) compliance rules checks against defined compliance policies ( compliancerules ) patching policies notifications related to patch failures or pending reboots ( patchpolicies ) backup status monitoring of backup job success, failure, or missed schedules ( backupjobstatus ) website & domain monitoring checks for website uptime, ssl expiry, domain registration expiry ( monitoredwebsites , monitoreddomains ) security monitoring findings from password checks, dark web scans, risky sign ins, or edr integrations ( passwordsecurityfindings , darkwebbreachevents , azureadriskysigninevents ) email triage emails identified as alerts processed by the ai email triage agent snmp traps received traps processed via an snmp listener integration api integrations alerts pushed from external monitoring tools or platforms via commandit's api manual creation technicians can manually create alerts if necessary 2 2 central processing alert processing engine regardless of the source, incoming alert data is standardized and processed by the alert processing engine this engine is the core intelligence behind the alerting system its key functions include signature identification determining if a new alert notification relates to the same underlying condition as an already active alert state management tracking the lifecycle of each unique alert condition (first seen, last seen, occurrence count, current status) using the alerts table policy & rule evaluation applying hierarchical alertprocessingpolicies and alertprocessingrules to determine the appropriate response thresholding evaluating frequency and duration thresholds before taking action, reducing noise from transient issues flapping suppression detecting rapidly oscillating alert states and temporarily suppressing actions maintenance window awareness checking applicable maintenancewindows and suppressing actions accordingly (unless overridden by a specific rule) action execution triggering actions defined in policies, such as creating tickets, running automation scripts, sending notifications, or calling webhooks for detailed information on the engine's internal logic, thresholds, state transitions, and configuration, please refer to the separate document \[commandit global alert processing engine functional specification] (link to engine spec md) 2 3 alert lifecycle & status ( alerts status ) alerts progress through various statuses visible in the ui new a new alert condition detected, potentially awaiting threshold confirmation acknowledged a technician has seen the alert and acknowledged responsibility ticketcreated a commandit ticket has been automatically created based on this alert via processing rules actionattempted / actionsucceeded / actionfailed an automated action (like a script) was triggered suppressed maint the alert occurred during a maintenance window and action was suppressed suppressed flapping actions are temporarily suppressed due to rapid state changes resolved the underlying condition has cleared (detected automatically or set manually) closed the alert (and potentially associated ticket) has been fully handled and closed by a technician (other internal statuses like processingerror , unknown may exist but are primarily for backend/admin view) 2 4 automated actions based on configured alert processing policies, the engine can trigger various automated actions, including ignoring noisy or informational alerts creating detailed tickets assigned to the appropriate service board/technician using specific templates running remediation scripts via the commandit agent sending notifications via email, sms, or other channels using notificationprofiles calling external systems via webhooks 3\ working with alerts commandit ui technicians primarily interact with alerts through the following screens within the commandit platform 3 1 main alerts monitoring screen / dashboard location monitoring > active alerts / dashboard (or similar) this is the primary workspace for noc/service desk teams purpose provides a consolidated view of alerts requiring attention key features view configurable dashboard potentially with panels/widgets, and a primary list/table view of active alerts (e g , status not 'resolved' or 'closed') should support auto refreshing columns (configurable) severity (color coded), alert title/signature, affected ci name (device/app/domain etc linked), client organization, status, first seen timestamp, last seen timestamp, occurrence count, linked ticket number (linked) filtering & grouping powerful options to filter by severity, status, organization, device/ci type, time range, acknowledgement status, etc ability to group alerts (e g , by client, by device) sorting sortable columns, typically defaulting to sort by severity then last seen time (descending) bulk actions ability to select multiple alerts and perform actions like 'acknowledge' row actions quick actions available per alert, such as 'acknowledge', 'view details', 'create ticket' (if none exists), 'view ticket' (if exists) visualizations (optional) charts showing alert trends, top problematic cis, alerts by severity over time 3 2 individual alert detail view (modal/drill down screen) trigger accessed by clicking on an alert from the main monitoring screen or potentially from related logs/tickets purpose provides comprehensive information and actions for a single alert instance content header displays key information prominently severity, title/signature, current status context associated organization, primary affected ci (device, app, etc with link), originating source (monitoring rule name, email source, etc ) timeline first occurred at , last occurred at , last status change timestamp if resolved/closed, shows those timestamps state details occurrence count , action taken flag (indicates if a threshold action like ticketing/scripting was performed for this instance) alert payload formatted display of the alerts details jsonb content, showing specific metrics or messages that triggered the alert related ticket displays the linked tickets ticket number (if any) with a direct link provides a 'create ticket' button if no ticket is linked and the alert status warrants it ('new', 'acknowledged') history a log/timeline of status changes specifically for this alert signature or alert id notes section to view and add technician notes specifically related to the investigation/handling of this alert instance (utilizes the generic notes table linked to the alert id ) actions buttons/options to acknowledge changes status to 'acknowledged', records user/time resolve / close changes status, requires mandatory reason/resolution notes create/view ticket creates a new linked ticket or navigates to the existing one run action (future enhancement?) manually trigger associated remediation scripts mute/suppress temporarily silence notifications for this alert (future) 4\ related configuration the behavior of the alerting system is configured in several areas within commandit settings alert rules define specific conditions and signature logic ( alertprocessingrules ) alert policies group rules, define actions, thresholds, flapping settings, and overrides ( alertprocessingpolicies ) policy assignment happens on orgs, locations, or alert endpoints monitoring rules define the actual checks performed by agents/probes ( monitoringrules ) maintenance windows define periods during which alerts may be suppressed ( maintenancewindows ) notification profiles define who gets notified how and when for actions ( notificationprofiles ) scripts define automation scripts that can be triggered by alert actions ( scripts ) 5\ further details for an in depth explanation of the alert processing logic, state management, threshold calculations, and technical implementation details, please consult the docid\ fvropqtoyngjewvy7rnuo