Microsoft Exchange Server 2013 Management Pack: Monitoring Best Practices

Written by

in

Troubleshooting Exchange 2013 with the Official Management Pack

Microsoft Exchange Server 2013 introduced a radical shift in monitoring philosophy compared to its predecessors. Moving away from passive Windows Event Log monitoring, Exchange 2013 relies on Managed Availability. This built-in feature constantly tests the system, automatically attempts repairs, and escalates unresolved issues to System Center Operations Manager (SCOM) via the Exchange Server 2013 Management Pack.

Understanding how this Management Pack functions is the key to maintaining a healthy Exchange environment. The Philosophy Shift: Managed Availability

The Exchange 2013 Management Pack is completely different from the Exchange 2010 version. It acts as a reporter rather than an investigator. It relies entirely on Managed Availability, an internal Exchange service that runs on three primary components:

Probes: Synthetic transactions that actively test user experiences (e.g., sending a test email or logging into OWA).

Monitors: State engines that analyze probe data to determine if a health set is healthy, degraded, or unhealthy.

Responders: Automated workflows that trigger recovery actions when a monitor fails, such as restarting a service or recycling an IIS application pool.

SCOM only receives an alert when a responder fails to self-heal the system. If you see an alert in SCOM, it means Exchange has already tried to fix itself and failed. The Simplified SCOM Console Layout

Because Exchange handles its own health logic, the SCOM console for Exchange 2013 is highly streamlined. Instead of thousands of rules and monitors, the Management Pack focuses on Health Sets. The monitoring is broken down into three main dashboards:

Active Alerts: Shows only actionable, high-priority issues that require human intervention.

Organization Health: A high-level view of the overall Exchange infrastructure health.

Server Health: A granular view mapping health categories directly to individual servers. Step-by-Step Troubleshooting Workflow

When the Exchange 2013 Management Pack surfaces an alert in SCOM, follow this systematic workflow using the Exchange Management Shell (EMS) to find the root cause. Step 1: Identify the Unhealthy Health Set

Look at the SCOM alert to find the specific Health Set and Server Name reported. If you are already in the EMS, you can get a quick summary of all unhealthy components across a server by running: powershell

Get-HealthReport -Server | Where-Object {\(_.AlertValue -ne "Healthy"} </code> Use code with caution. Step 2: Drill Down into the Monitors</p> <p>Once you identify the unhealthy Health Set (for example, <code>Autodiscover</code> or <code>HubTransport</code>), find out which specific monitor triggered the failure: powershell</p> <p><code>Get-ServerHealth -Identity <ServerName> -HealthSet <HealthSetName> | Where-Object {\).AlertValue -ne “Healthy”} Use code with caution. Step 3: Inspect Probe Failures and Execution History

To understand why the monitor failed, you need to look at the recent probe results. This provides the exact error message and execution time: powershell

\(HealthSet = Get-ServerHealth -Identity <ServerName> -HealthSet <HealthSetName> \)HealthSet | Foreach-Object {Get-MonitoringItemIdentity -Identity $.Name} Use code with caution.

To see the direct output of a specific probe execution, use the Crimson Channel event logs via PowerShell: powershell

Get-WinEvent -LogName “Microsoft-Exchange-ActiveMonitoring/ProbeResult” | Where-Object {\(_.Message -like "*<MonitorName>*"} | Select-Object -First 5 | Format-List Message </code> Use code with caution. Step 4: Review Responder Action History</p> <p>Before you manually intervene, check what recovery steps Exchange already attempted. Reviewing the Responder log helps you see if a service was recently restarted or if a bug is causing a loop: powershell</p> <p><code>Get-WinEvent -LogName "Microsoft-Exchange-ActiveMonitoring/ResponderResult" | Where-Object {\)_.Message -like “”} | Select-Object -First 10 | Format-List TimeCreated, Message Use code with caution. Handling Persistent False Positives

Sometimes, a probe might fail due to environmental factors unique to your organization, creating “noise” in SCOM. Managed Availability allows you to create overrides to tune these monitors globally or on specific servers.

To disable a monitor or modify its threshold globally for 60 days, use the Add-GlobalMonitoringOverride cmdlet: powershell

Add-GlobalMonitoringOverride -Identity “Exchange” -ItemType “Monitor” -PropertyName “Enabled” -PropertyValue “0” -Duration 60.00:00:00 Use code with caution.

Note: Overrides can only be set for a maximum of 180 days at a time and must be renewed. Conclusion

The Exchange 2013 Official Management Pack changes the role of the Exchange Administrator from a continuous monitor to an escalations engineer. By understanding that SCOM alerts represent a failure of internal self-healing, you can use the Exchange Management Shell to quickly pinpoint the exact synthetic transaction that failed, saving hours of manual log digging.

To help refine this guide for your specific environment, could you share:

The exact name of the Health Set currently triggering alerts? Your current Exchange 2013 Cumulative Update (CU) version?

Comments

Leave a Reply

Your email address will not be published. Required fields are marked *

More posts