Workload Automation

Boosting Agility With OpenTelemetry Observability Integrations

November 14, 2023

Are you sitting on a ticking bomb?

Many organizations today struggle with workflow orchestration tools that fail to offer a comprehensive view of their hybrid environments. As these enterprises grow, maintaining SLA compliance becomes a daunting task, and the risks of inefficiencies mount. Blocked by isolated data views, teams often get bogged down in minutiae instead of focusing on the larger business objectives. A lack of visibility into key enterprise business processes can result in extended troubleshooting and dissatisfactory mean-time-to-detect (MTTD) and mean-time-to-resolution (MTTR) when issues crop up, and without historical context, it's tough to optimize for future challenges.

These limitations, combined with an inability to effectively predict or optimize, leave businesses vulnerable, leading to missed SLAs, dissatisfied customers, fines, and revenue hits.

This is where observability can be a game-changer. The IT Industry is moving towards OpenTelemetry for observability. As a major vendor of mainframe observability solutions, Broadcom plans to support OpenTelemetry wherever possible in our toolset.

With a robust observability integration based on OpenTelemetry:

  • Visibility becomes panoramic, encompassing every aspect of the operational environment. You see not just what's happening but also why it's happening.
  • Predictive analysis is empowered by the historical context, enabling teams to anticipate potential challenges and act proactively
  • Optimization opportunities surface, ensuring that processes run smoothly, efficiently, and in alignment with business objectives
  • Rapid response becomes the norm. With real-time insights and analytics, teams can swiftly identify, address, and resolve issues, significantly slashing MTTR.

Deep Dive into the Dual Facets of Observability

In today's complex IT infrastructure that organizations deal with, understanding and managing the health of business processes and systems is paramount. Observability, especially in the realm of workflow orchestration has two categories: workflow observability and infrastructure observability. Let's delve deeper into each.

Workflow Observability

Workflow observability is all about the bigger picture. It focuses on ensuring that workflows for entire business processes are running optimally. By monitoring the performance and status of workflows across the hybrid infrastructure, teams can ensure that service level agreements (SLAs) are consistently met from an end-to-end perspective. It aids in determining if workflows are being executed on time, if there are any delays, and if temporary or permanent changes need to be made.

Maintaining the health of workflows directly influences business outcomes, and this is where the Automation Analytics & Intelligence (AAI) solution plays a pivotal role.

AAI helps customers transition from merely overseeing workflow jobs to strategically monitoring and enhancing business processes. It offers an all-encompassing automation observability platform for hybrid environments, and facilitates centralized monitoring for business processes–from Kubernetes deployments to mainframes. The consolidated views in AAI empower customers to oversee complete jobstreams, clearly identify critical paths, and expedite troubleshooting processes. AAI extends beyond Broadcom’s schedulers and also aggregates data from a variety of vendors, such as IBM, BMC, and others.

If the enterprise uses only one scheduler, there's no need for a unified view across multiple schedulers. In such cases, basic data like logs, traces, metrics, and events from CA 7 and ESP provide a general understanding of workflow characteristics that can be fed into standard observability tools or application performance monitors (APMs) to visualize and analyze workflow events. Using these tools, enterprises can establish and monitor SLAs for their workflows. Furthermore, standard APM and observability tools also offer predictive analytics capabilities.

Infrastructure Observability

This digs into the nitty-gritty. Observability of orchestration infrastructure components is centered on the individual pieces that enable the orchestration of workflows. It focuses on the health of these automation components: CA 7, ESP, Java REST Servers, and more. It’s essential to monitor the well-being of each component that plays a role in automating workflows.

This granular view enables:

  • Early problem detection: By continually checking the status and performance of individual components, issues can be spotted and addressed before they escalate, preventing larger system failures.
  • Faster MTTD and MTTR: With real-time data on infrastructure health, teams can rapidly pinpoint and address the root causes of problems, dramatically reducing MTTD and MTTR.
  • Optimization for performance and efficiency: By understanding how each component functions within the larger system, teams can make adjustments to ensure each piece operates at peak efficiency, contributing positively to the overall system performance.

The CA 7 and ESP teams are working to provide telemetry data (logs, traces, metrics, and events) from all our Core and Java services using OpenTelemetry. This will enable customers to set up the OpenTelemetry collector and integrate the telemetry data into their favorite enterprise Observability/APM tool.

Impacts of Enhanced Observability on Personas and Corresponding Scenarios

  • Application Developers:
    • Scenario: Improving application performance by identifying problematic jobs in a critical path
    • Impact: Pinpoint areas in applications that need enhancement
  • Line of Business Owners:
    • Scenario:  Keeping a vigilant eye on SLA adherence by leveraging historical trend reports and real-time projections
    • Impact: Higher confidence in applications’ SLA adherence
  • Operators / Schedulers:
    • Scenario: Monitoring batch jobs to identify potential bottlenecks in real time
    • Impact: Oversee workload performance and pinpoint potential slowdowns

A Holistic Approach to Observability

An ideal observability platform isn't just about one type or the other; it provides a holistic overview. Such a platform presents data on both business processes and infrastructure health, allowing teams to shift seamlessly between different perspectives. This holistic approach ensures that while the broader business objectives (like SLA compliance) are being met, the underlying infra is also functioning optimally.

Want to learn more? Fill out this form, check the Workload Automation box, and our team will reach out to you and help get you started on your OpenTelemetry journey.