System Health Monitoring in RouterOS: A Complete Guide
System Health Monitoring in RouterOS: A Complete Guide
Section titled âSystem Health Monitoring in RouterOS: A Complete GuideâRouterOS Version: 7.x+ Difficulty: Beginner Estimated Time: 20 minutes
Overview
Section titled âOverviewâSystem health monitoring provides real-time hardware status information including temperature, voltage, current, fan speed, and power consumption. Monitoring these values helps you:
- Detect overheating before hardware damage occurs
- Verify power supplies are functioning correctly
- Monitor fan operation to prevent thermal throttling
- Track power consumption for capacity planning
- Integrate with external monitoring via SNMP
Health data is available via CLI, WinBox, API, and SNMP, making it easy to integrate with monitoring systems like MRTG, Cacti, Zabbix, or The Dude.
Key limitation: Health monitoring requires hardware support. CHR (Cloud Hosted Router) and some low-end devices have no sensors and will show an empty health menu.
Menu Reference
Section titled âMenu Referenceâ| Menu | Purpose |
|---|---|
/system/health | View sensor readings |
/system/health/settings | Configure fan control (v7.9+) |
Understanding Health Readings
Section titled âUnderstanding Health ReadingsâCommon Sensors
Section titled âCommon Sensorsâ| Sensor | Type | Description |
|---|---|---|
cpu-temperature | C | CPU die temperature |
board-temperature | C | Board/ambient temperature |
temperature | C | General temperature sensor |
voltage | V | Input voltage |
current | A | Input current draw |
power-consumption | W | Total power draw |
fan1-speed | RPM | Fan rotation speed |
psu1-state | ok/fail | Power supply status |
Whatâs Normal?
Section titled âWhatâs Normal?â| Sensor | Normal Range | Warning |
|---|---|---|
| CPU Temperature | 40-70°C | Above 80°C |
| Board Temperature | 20-50°C | Above 60°C |
| Voltage | 10-28V (varies) | Check device specs |
| Fan Speed | 2000-6000 RPM | 0 RPM when should be running |
Note: CPU temperatures of 70-80°C are normal under load. The âoperating temperatureâ in device specs (-20°C to +70°C) refers to room/ambient temperature, not CPU temperature.
Configuration Examples
Section titled âConfiguration ExamplesâExample 1: View Current Health Status
Section titled âExample 1: View Current Health Statusâ/system/health/printExample output on a CCR device:
# NAME VALUE TYPE 0 power-consumption 50.8 W 1 cpu-temperature 43 C 2 fan1-speed 5654 RPM 3 board-temperature1 29 C 4 voltage 24.5 VExample 2: Filter Specific Readings
Section titled âExample 2: Filter Specific ReadingsâView only temperatures:
/system/health/print where type="C"View only fan speeds:
/system/health/print where type="RPM"Find a specific sensor:
/system/health/print where name="cpu-temperature"Example 3: Configure Fan Control (v7.9+)
Section titled âExample 3: Configure Fan Control (v7.9+)âControl when fans start and reach full speed:
# View current settings/system/health/settings/print
# Set temperature where fans start spinning/system/health/settings/set fan-target-temp=55
# Set temperature where fans reach maximum speed/system/health/settings/set fan-full-speed-temp=65
# Prevent fans from completely stopping (reduces cycling)/system/health/settings/set fan-min-speed-percent=15Fan control is available on: CRS3xx, CRS5xx, CCR2xxx (v7.9+), CCR1036/CCR1016 (v7.14+)
Example 4: Enable CPU Overtemperature Protection
Section titled âExample 4: Enable CPU Overtemperature ProtectionâFor ARM/ARM64 devices, enable automatic protection if CPU overheats:
# Enable overtemperature monitoring/system/health/settings/set cpu-overtemp-check=yes
# Set threshold (default 105°C)/system/health/settings/set cpu-overtemp-threshold=100
# Delay after boot before monitoring (prevents false triggers)/system/health/settings/set cpu-overtemp-startup-delay=2mExample 5: Create Temperature Alert Script
Section titled âExample 5: Create Temperature Alert ScriptâSend email when temperature exceeds threshold:
# First configure email (required)/tool/e-mail/set server=smtp.example.com from=router@example.com
# Create alert script/system/script/add name=temp-alert policy=read,write,test source={ :local cpuTemp [/system/health/get [find name="cpu-temperature"] value] :local threshold 75 :if ($cpuTemp > $threshold) do={ :log warning "CPU temperature high: $cpuTemp C" /tool/e-mail/send to="admin@example.com" \ subject="[ALERT] Router temperature high" \ body="CPU temperature: $cpuTemp C (threshold: $threshold C)" }}
# Schedule to run every 5 minutes/system/scheduler/add name=temp-check interval=5m on-event=temp-alertExample 6: Temperature Alert with Rate Limiting
Section titled âExample 6: Temperature Alert with Rate LimitingâPrevent email spam by limiting alerts:
/system/script/add name=temp-alert-limited policy=read,write,test source={ :global lastTempAlert :local cpuTemp [/system/health/get [find name="cpu-temperature"] value] :local threshold 75 :local cooldown 3600
:if ($cpuTemp > $threshold) do={ :local now [/system/clock/get time] :local currentSecs ([:pick $now 0 2] * 3600 + [:pick $now 3 5] * 60)
:if (($lastTempAlert = nil) or (($currentSecs - $lastTempAlert) > $cooldown)) do={ :log warning "CPU temperature high: $cpuTemp C" /tool/e-mail/send to="admin@example.com" \ subject="[ALERT] Router temperature high" \ body="CPU temperature: $cpuTemp C" :set lastTempAlert $currentSecs } }}Example 7: Monitor via SNMP
Section titled âExample 7: Monitor via SNMPâGet SNMP OIDs for external monitoring tools:
/system/health/print oidOutput shows OID for each sensor:
# NAME VALUE TYPE OID 0 cpu-temperature 43 C .1.3.6.1.4.1.14988.1.1.3.11 1 voltage 24.5 V .1.3.6.1.4.1.14988.1.1.3.8Poll from external system (Linux example):
# Get CPU temperature (returns decidegrees - divide by 10)snmpget -v2c -c public 192.168.1.1 .1.3.6.1.4.1.14988.1.1.3.11.0Example 8: Configure SNMP Temperature Traps
Section titled âExample 8: Configure SNMP Temperature TrapsâSend SNMP trap when temperature exceeds threshold:
# Enable SNMP/snmp set enabled=yes
# Configure trap destination/snmp set trap-target=192.168.1.100 trap-community=public trap-version=2
# Enable temperature exception traps/snmp set trap-generators=temp-exceptionTrap triggers at 100°C or the cpu-overtemp-threshold value.
Example 9: Check Power Supply Status (Dual-PSU Devices)
Section titled âExample 9: Check Power Supply Status (Dual-PSU Devices)â# View PSU status/system/health/print where name~"psu"Expected output:
# NAME VALUE TYPE 0 psu1-state ok 1 psu2-state ok 2 psu1-voltage 24.2 V 3 psu2-voltage 24.3 VIf a PSU fails, psu1-state or psu2-state will show fail.
SNMP Integration
Section titled âSNMP IntegrationâCommon Health OIDs
Section titled âCommon Health OIDsâ| Reading | OID | Notes |
|---|---|---|
| voltage | .1.3.6.1.4.1.14988.1.1.3.8.0 | Decivolts (á10) |
| temperature | .1.3.6.1.4.1.14988.1.1.3.10.0 | Decidegrees (á10) |
| cpu-temperature | .1.3.6.1.4.1.14988.1.1.3.11.0 | Decidegrees (á10) |
| power-consumption | .1.3.6.1.4.1.14988.1.1.3.12.0 | Deciwatts (á10) |
| fan-speed | .1.3.6.1.4.1.14988.1.1.3.17.0 | RPM |
| psu1-state | .1.3.6.1.4.1.14988.1.1.3.15.0 | 0=fail, 1=ok |
| psu2-state | .1.3.6.1.4.1.14988.1.1.3.16.0 | 0=fail, 1=ok |
Important: SNMP returns values multiplied by 10. CLI shows 24.5V; SNMP returns 245.
Use in Monitoring Tools
Section titled âUse in Monitoring ToolsâZabbix/Cacti/MRTG: Use the OIDs above with a multiplier of 0.1 for voltage/temperature/power.
The Dude: Supports MikroTik health OIDs natively; no configuration needed.
Common Problems and Solutions
Section titled âCommon Problems and SolutionsâProblem 1: Health Menu is Empty
Section titled âProblem 1: Health Menu is EmptyâCause: Device has no hardware monitoring support.
Solution: Check device specifications at mikrotik.com. CHR and some low-end RouterBOARDs have no sensors.
Problem 2: Scripts Broken After v7 Upgrade
Section titled âProblem 2: Scripts Broken After v7 UpgradeâCause: RouterOS v7 changed health menu structure.
v6 syntax (broken):
:local temp [/system health get temperature]v7 syntax (correct):
:local temp [/system/health/get [find name="cpu-temperature"] value]Problem 3: SNMP Values Look Wrong
Section titled âProblem 3: SNMP Values Look WrongâCause: SNMP returns decivolts/decidegrees (multiplied by 10).
Solution: Divide SNMP values by 10 in your monitoring tool.
Problem 4: Fan Shows 0 RPM
Section titled âProblem 4: Fan Shows 0 RPMâPossible causes:
- Temperature below
fan-target-temp(fans not needed) - Device doesnât support fan control
- Fan hardware failure
Check:
/system/health/settings/print# If fan-target-temp is higher than current temp, fans won't spinProblem 5: Cannot Control Fan Speed Directly
Section titled âProblem 5: Cannot Control Fan Speed DirectlyâCause: MikroTik doesnât allow direct RPM control.
Solution: Use temperature thresholds to influence behavior:
/system/health/settings/set fan-target-temp=50 fan-full-speed-temp=60Problem 6: Temperature Email Spam
Section titled âProblem 6: Temperature Email SpamâCause: Script runs repeatedly while temperature is high.
Solution: Add rate limiting (see Example 6) or track recovery:
# Only alert once until temperature recovers:global tempAlertSent:if ($cpuTemp > 75) do={ :if ($tempAlertSent != true) do={ # Send alert :set tempAlertSent true }} else={ :set tempAlertSent false}Problem 7: PoE Voltage Reads Lower Than Expected
Section titled âProblem 7: PoE Voltage Reads Lower Than ExpectedâCause: PoE-powered devices have protection circuitry causing voltage drop in readings.
Solution: This is expected behavior. Actual input voltage is higher than displayed.
Fan Control Settings Reference
Section titled âFan Control Settings Referenceâ| Setting | Default | Description |
|---|---|---|
fan-target-temp | 58°C | Temperature where fans start |
fan-full-speed-temp | 65°C | Temperature where fans reach max |
fan-min-speed-percent | 12% | Minimum fan speed (prevents cycling) |
fan-control-interval | 30s | Seconds between temp readings |
Supported devices (v7.9+): CRS3xx, CRS5xx, CCR2xxx
Additional support (v7.14+): CCR1036, CCR1016
Verification Commands
Section titled âVerification Commandsâ# Check if health monitoring is available/system/health/print# Empty = no hardware support
# View all temperature sensors/system/health/print where type="C"
# Check fan status/system/health/print where type="RPM"
# View fan control settings/system/health/settings/print
# Get SNMP OIDs/system/health/print oid
# Check PSU status (dual-PSU devices)/system/health/print where name~"psu.*state"Related Features
Section titled âRelated Featuresâ- SNMP (
/snmp) - Export health data to monitoring systems - Scheduler (
/system/scheduler) - Run health checks periodically - Email (
/tool/e-mail) - Send alert notifications - Scripts (
/system/script) - Custom health monitoring logic - Netwatch (
/tool/netwatch) - Complement with connectivity monitoring - The Dude - MikroTikâs monitoring tool with native health support
Summary
Section titled âSummaryâHealth monitoring in RouterOS provides essential hardware status information:
- View readings with
/system/health/print - Configure fans with
/system/health/settings(v7.9+) - Create alerts using scripts and scheduler
- Integrate externally via SNMP
Key points:
- Available sensors vary by device model
- SNMP values are multiplied by 10 (divide for actual values)
- Fan control is indirect via temperature thresholds
- v7 changed health menu structure (update scripts accordingly)
- Alert scripts need rate limiting to prevent spam
Related Topics
Section titled âRelated TopicsâMonitoring Integration
Section titled âMonitoring Integrationâ- SNMP Configuration - export health data to monitoring systems
- Netwatch - complement with connectivity monitoring
Alerting
Section titled âAlertingâ- Email Tool - send temperature alerts
- Scheduler - run health checks periodically
- Scripts - custom health monitoring logic
System Reliability
Section titled âSystem Reliabilityâ- Watchdog - automatic recovery on system issues
- System Backup - backup before potential hardware issues
Related Topics
Section titled âRelated Topicsâ- Logging - log health-related events