Empowering SREs: A Closer Look at Mainframe Observability

March 19, 2024

I am confident that Broadcom's WatchTower Platform™ for mainframe AIOps is poised to revolutionize the landscape for IT ops teams overseeing mainframe applications. If you're not yet acquainted with our recent announcement, WatchTower introduces a mainframe observability platform offering a unified and user-friendly experience tailored to various skill levels. But WatchTower also extends its support beyond the mainframe, accommodating performance analysts responsible for the health of business applications spanning multiple platforms in hybrid IT environments.

How WatchTower Empowers SREs

Meet Kelly, a site reliability engineer (SRE) at Acme Corporation. Kelly's primary responsibility is to swiftly identify any underperforming business applications under her purview. She is determined to ensure that the user experience for Acme's applications never includes the dreaded "spinning beach ball of death!"

Kelly's toolkit includes a plethora of application performance monitoring (APM) options from vendors like Datadog, Grafana, New Relic, Splunk, and Jaeger. These tools grant her full visibility into Acme's cloud and on-prem distributed systems. However, many of Acme's business applications involve transactional elements that run on the mainframe.

While Kelly's tools can indicate when an application invokes a mainframe-based service, she lacks visibility into the inner workings of the mainframe, obstructing her end-to-end view of the application flow. Consequently, if a callout to a mainframe service becomes problematic, Kelly struggles to identify the appropriate mainframe support team to contact.

WatchTower's integrated real-time information streaming (z/IRIS®) capability addresses this challenge. z/IRIS ensures that mainframe performance information is readily accessible to everyone outside the mainframe, including SREs like Kelly, in real time. This empowers them to leverage their enterprise APM tools to precisely identify bottlenecks, even if they occur within the mainframe.

For instance, envision Kelly utilizing an APM service map tracking Acme's banking application, currently experiencing service degradation. The application includes two calls to Acme's mainframe—one via an API gateway and another through a direct connection to a transaction service—both highlighted in red, indicating issues. Previously, Kelly would see these calls connecting to the mainframe but lacked visibility into the underlying cause and would be unable to determine which mainframe team to contact.


However, with WatchTower’s z/IRIS, Kelly's service map traces now include mainframe flow details. She can swiftly pinpoint the problem—in this example, a downstream Db2 database—and promptly transfer it to the mainframe Db2 team for resolution. The Db2 team can then use WatchTower’s powerful mainframe-specific capabilities (such as alert insights, anomaly detection, and infrastructure topology) to drill down further and rapidly remediate the problem.


Full Enterprise Observability

In enterprise applications, multiple downstream dependencies on mainframe resources are common. Enabling an SRE to swiftly determine the root cause of a mainframe issue—whether in a connection service, message queue, transaction manager, or database—signifies a significant step forward in enterprise-wide observability. And the ability to promptly identify problem contexts results in tangible business implications.

WatchTower fosters collaboration among observability team members with varying mainframe skill levels. It enables SRE teams to gain end-to-end visibility across the enterprise, while also providing mainframe teams with deep insights into their applications. By consolidating insights, WatchTower facilitates a comprehensive understanding of application health, enhancing remediation efforts.

To see what WatchTower can do for you, visit our website.