Many health care IT organizations create service-level agreements (SLAs) for new applications and infrastructure projects. It's important to collect metrics related to SLA line items and review your metrics on a regular basis.
In a health care environment, some applications have life-or-death importance, while others don't require significant oversight. To ensure every business unit gets the service it needs, prioritize metric review according to risk.
Sort Applications and Infrastructure Into Classes
John Halamka, MD, is CIO of the Beth Israel Deaconess Medical Center (BDIMC). On his Geekdoctor blog, he breaks down metrics according to availability classifications for hospital applications:
- AAAA. These applications run critical medical devices or are essential to patient safety, so they require continuous availability. These agreements promise no more than one hour of unscheduled downtime per year and no more than four hours of planned downtime per month, with recovery time objectives (RTO) of one hour for internal disruptions and 24 hours for external disasters.
- AAA. These key enterprise systems maintain smooth functioning within the health care environment and require ongoing availability. This means no more than 10 hours per year of unscheduled downtime, eight monthly planned downtime hours and RTOs of less than eight hours for internal disruptions and 48 hours for external disasters.
- AA. AA applications are classified as general availability because they're functionally important, if not as vital. This allots for no more than 100 hours of unscheduled downtime, fewer than 12 hours per month of planned downtime and RTOs of 24 hours maximum for internal issues, and three days or fewer for external disasters.
- A. These applications aren't often used, so they're labeled as limited availability. Such low-priority systems get 150 hours or fewer budgeted for unscheduled downtime, intermittent planned downtime and next business day RTOs for internal problems and potentially weeks of non-functionality related to external disasters.
Additionally, Halamka lists SLAs for recovery point objectives (RPO) according to application class. For AAAA and AAA applications, he guarantees no data loss. AA applications are recoverable in 15-minute increments, and A applications have no listed RPOs.
Metrics for SaaS Solutions
Most health groups utilize at least one SaaS application, and third-party SLAs are just as important as any agreement between IT and its business units. These metrics should include uptime guarantees, which can vary according to application classification, metrics related to functional performance, guarantees for how third parties respond to escalated application issues and what remedies they offer, such as maintenance credits, for SLA failure.
Leverage Predictive Monitoring
In many cases, metrics for application performance that aren't necessarily pertinent to your SLAs will serve as tripwires that keep you from violating SLA components. Clive Longbottom, service director at Quocirca, writing for TechTarget, offers the example of a service's response time to user requests. If that response time dips below a certain threshold, IT should proactively assess, diagnose and fix before availability problems occur.
Within your health care IT organization, identify critical application or system performance canary metrics that can prevent your SLA from failing. Infrastructure metrics may include memory or CPU utilization, while application metrics may include database query response time. You may also monitor network metrics related to bandwidth utilization and errors or middleware metrics such as average queue length.
For key canary metrics on crucial applications, set up automated responses and alerts when possible. Avoid triggering so many alerts that the beeps begin to lose their significance.
Annual review works fine for non-problematic SLAs, but when you know you're not meeting expectations, it's better to issue a mea culpa to your business unit and not wait for the annual review. Likewise, you can put the onus on the business unit to initiate SLA review for AA or A applications.
Most agreements fail because IT set unrealistic expectations; it's often best to admit this and ask for some flexibility. For AAAA and AAA applications, where agreements are less negotiable, make appropriate fixes or infrastructure investments to deliver an appropriate level of service. For example, if you're not meeting business unit expectations related to file transfer from electronic health records (EHR) or picture archiving and communication system (PACS), and this failure represents a patient safety risk, a solution like managed file transfer can help you avoid SLA penalties and even save a patient's life.