Troubleshooting Exchange 2013 with the Official Management Pack
Microsoft Exchange Server 2013 introduced a radical shift in monitoring philosophy compared to its predecessors. Moving away from passive Windows Event Log monitoring, Exchange 2013 relies on Managed Availability. This built-in feature constantly tests the system, automatically attempts repairs, and escalates unresolved issues to System Center Operations Manager (SCOM) via the Exchange Server 2013 Management Pack.
Understanding how this Management Pack functions is the key to maintaining a healthy Exchange environment. The Philosophy Shift: Managed Availability
The Exchange 2013 Management Pack is completely different from the Exchange 2010 version. It acts as a reporter rather than an investigator. It relies entirely on Managed Availability, an internal Exchange service that runs on three primary components:
Probes: Synthetic transactions that actively test user experiences (e.g., sending a test email or logging into OWA).
Monitors: State engines that analyze probe data to determine if a health set is healthy, degraded, or unhealthy.
Responders: Automated workflows that trigger recovery actions when a monitor fails, such as restarting a service or recycling an IIS application pool.
SCOM only receives an alert when a responder fails to self-heal the system. If you see an alert in SCOM, it means Exchange has already tried to fix itself and failed. The Simplified SCOM Console Layout
Because Exchange handles its own health logic, the SCOM console for Exchange 2013 is highly streamlined. Instead of thousands of rules and monitors, the Management Pack focuses on Health Sets. The monitoring is broken down into three main dashboards:
Active Alerts: Shows only actionable, high-priority issues that require human intervention.
Organization Health: A high-level view of the overall Exchange infrastructure health.
Server Health: A granular view mapping health categories directly to individual servers. Step-by-Step Troubleshooting Workflow
When the Exchange 2013 Management Pack surfaces an alert in SCOM, follow this systematic workflow using the Exchange Management Shell (EMS) to find the root cause. Step 1: Identify the Unhealthy Health Set
Look at the SCOM alert to find the specific Health Set and Server Name reported. If you are already in the EMS, you can get a quick summary of all unhealthy components across a server by running: powershell
Get-HealthReport -Server Use code with caution. Step 3: Inspect Probe Failures and Execution History
To understand why the monitor failed, you need to look at the recent probe results. This provides the exact error message and execution time: powershell
\(HealthSet = Get-ServerHealth -Identity <ServerName> -HealthSet <HealthSetName> \)HealthSet | Foreach-Object {Get-MonitoringItemIdentity -Identity $.Name} Use code with caution.
To see the direct output of a specific probe execution, use the Crimson Channel event logs via PowerShell: powershell
Get-WinEvent -LogName “Microsoft-Exchange-ActiveMonitoring/ProbeResult” | Where-Object {\(_.Message -like "*<MonitorName>*"} | Select-Object -First 5 | Format-List Message </code> Use code with caution. Step 4: Review Responder Action History</p> <p>Before you manually intervene, check what recovery steps Exchange already attempted. Reviewing the Responder log helps you see if a service was recently restarted or if a bug is causing a loop: powershell</p> <p><code>Get-WinEvent -LogName "Microsoft-Exchange-ActiveMonitoring/ResponderResult" | Where-Object {\)_.Message -like “ Use code with caution. Handling Persistent False Positives
Sometimes, a probe might fail due to environmental factors unique to your organization, creating “noise” in SCOM. Managed Availability allows you to create overrides to tune these monitors globally or on specific servers.
To disable a monitor or modify its threshold globally for 60 days, use the Add-GlobalMonitoringOverride cmdlet: powershell
Add-GlobalMonitoringOverride -Identity “Exchange Use code with caution.
Note: Overrides can only be set for a maximum of 180 days at a time and must be renewed. Conclusion
The Exchange 2013 Official Management Pack changes the role of the Exchange Administrator from a continuous monitor to an escalations engineer. By understanding that SCOM alerts represent a failure of internal self-healing, you can use the Exchange Management Shell to quickly pinpoint the exact synthetic transaction that failed, saving hours of manual log digging.
To help refine this guide for your specific environment, could you share:
The exact name of the Health Set currently triggering alerts? Your current Exchange 2013 Cumulative Update (CU) version?
Leave a Reply