Contact Us
    AIOps

    Using OpenTelemetry to Enhance Observability in Hybrid Environments

    December 18, 2024
    How CA 7™ Observability with OpenTelemetry Bridges the Mainframe Data Gap for Unified Insights

    In today’s interconnected world, customer experience hinges on the reliability of the services they rely on. When technology fails, customers don’t hesitate to voice their frustrations online, instantly damaging brand trust. Observability has emerged as a vital means for protecting brand identity, providing IT teams with the insights needed to maintain uptime and deliver exceptional service. However, observability is not complete in hybrid IT environments without seamless integration of mainframe workload data. With Workload Automation Observability, available now with CA 7™ Workload Automation Intelligence, businesses can connect mainframe insights to their preferred observability platforms, ensuring real-time visibility, proactive alerts, and comprehensive control across their end-to-end IT landscape.

    Observability and Customer Experience

    The customer experience has become paramount for enterprises because it directly impacts brand reputation and trust. In today’s technology-driven world, customer trust depends on the reliability and availability of the services they consume. If an ATM card fails to work, a flight can’t be checked in on via mobile, or insurance details aren’t immediately accessible, it takes only a few minutes for customers to vent their frustrations on X, Facebook, or TikTok, negatively affecting the brand’s image. This is where observability emerges as a savior: it focuses on gathering, analyzing, and generating insights from infrastructure, application, and service telemetry. The goal of observability is to quickly troubleshoot issues by building contextual understanding from vast amounts of collected information, proactively prevent incidents through timely alerts, and continuously improve applications and services using ML/AI-generated insights.

    OpenTelemetry (OTel) is an open source framework that helps IT teams collect and route observability data from infrastructure, apps, and services. OTel provides a standard way to capture and export telemetry data, such as events, metrics, traces, and logs. This data can be used to understand software performance and behavior. OTel is an incubating project of the Cloud Native Computing Foundation (CNCF).

    OTel makes it easier to capture and export telemetry data from hybrid applications, which can be distributed and complicated to work with. OTel also provides a standard format for collecting and sending observability data, which helps solve observability problems like availability or performance issues.

    The Problem

    In today’s hybrid IT environments, mission-critical applications often span multiple platforms—including mainframes. Without mainframe job processing data integrated into your observability platform, the overall problem context, proactive alerts, and ML/AI-driven insights may not be fully reliable. Simply put, the observability story remains incomplete until it includes data on mainframe job execution.

    To effectively manage SLAs, operators, site reliability engineers (SREs), and application developers need a unified view of all application data, including mainframe job execution metrics. This comprehensive perspective enables faster problem identification, speeds up resolution, prevents SLA breaches, and supports proactive alerts powered by machine learning.
    Yet, mainframe job monitoring often remains siloed, limiting visibility into how mainframe issues affect overall application performance.

    Key Challenges with Mainframe Data Access

    • People-Related Challenges: Access to mainframe job data typically relies on legacy interfaces like ISPF, which can be unfamiliar to distributed teams accustomed to unified tools.
    • Data-Related Challenges: Even when accessible, mainframe data is often not real-time. This stale information hinders quick responses and may compromise SLAs.
    • Security and Maintenance Issues: Allowing non-mainframe users to access mainframe data often requires creating multiple user or service IDs, complicating security and necessitating constant updates as team members change.
      Imperfect Predictive Alerting: Without mainframe job execution data, predictive analytics for SLA adherence is less effective.
    • Underutilization of Observability Platforms: Observability platforms cannot fully deliver on their potential—such as self-service dashboards, advanced problem analytics, and seamless integration—if key mainframe job-related data from workload automation tools is missing.
    The Solution

    CA 7 Workload Automation Intelligence now includes observability, which integrates CA 7 events with OpenTelemetry-compliant observability platforms such as Splunk, Datadog, Grafana, Dynatrace, and ServiceNow Cloud Observability - to name a few.

    The CA 7 log holds valuable workload telemetry, including the most recent status of every active job. By making this log data available to an observability platform, you can uncover new insights, automate processes, track trends, and gain holistic visibility across hybrid IT environments. This integration eliminates data silos, supports proactive SLA management, reduces reliance on multiple tools, simplifies user ID management, and strengthens predictive analytics. In short, it lays the foundation for decentralized scheduling and self-service capabilities for SREs, application developers, and operators.

    Once CA 7 data is integrated with an observability platform, numerous benefits become possible:

    • Self-Service Dashboards for Job Monitoring: Teams can create their own dashboards to visualize end-to-end workflows, including mainframe jobs managed by CA 7.
    • Proactive Alerting and Faster Troubleshooting: By correlating CA 7 data with other application metrics, teams can detect anomalies before they cause incidents, reduce mean time to resolution (MTTR), and maintain smooth operations.
    • Out-of-the-Box Integrations (ServiceNow, PagerDuty, etc.): Observability platforms often feature native integrations with popular third-party tools. Critical alerts can automatically create ServiceNow incidents, send immediate notifications to PagerDuty or other paging systems, and even engage ChatOps solutions like Slack. While each observability platform varies in its integration ecosystem, these capabilities significantly enhance operational efficiency and collaboration.
    Integration 

    CA 7 Observability with OpenTelemetry transforms CA 7 log data into OpenTelemetry format and streams it to an OpenTelemetry collector.

    Integration Requirements:

    • Updates to the CA 7 core engine (CA7ONL) to enable it to send data to WLA Observability 12.1 Service
    • Deployment of WLA Observability 12.1 Service (two containers on the distributed side)
    • An OpenTelemetry collector (provided by the customer)
    • An Observability platform (provided by the customer)

    MSD_FY25_BLOG_Using OpenTelemetry to Enhance Observability in Hybrid Environments_sample-deployment

    Figure 1: sample deployment

    The diagram illustrates three CA 7 instances configured for CA 7 Integration with OpenTelemetry. As each CA7ONL instance (the core scheduling engines) updates a job’s status in the active workload, it writes a corresponding log entry. A telemetry subtask running within CA7ONL retrieves this log entry, creates a JSON payload containing the status update, and sends it via POST to WLA Observability service. WLA Observability service then converts the received data into the OpenTelemetry format and passes it to the OpenTelemetry collector. The customer is responsible for configuring the collector to export this data to one or more chosen observability platforms.

    Summary

    Integrating CA 7 job event monitoring into the observability platforms used by operators and SREs for distributed job management provides real-time visibility and prevents issues that arise from relying on outdated data. Application development teams can access CA 7 events immediately, eliminating slow overnight batch processes and improving turnaround times. Feeding CA 7 event data into an observability platform also enables proactive incident detection, automated ticket creation in ServiceNow, and instant notification through PagerDuty. With current CA 7 data at their fingertips and the ability to leverage AI/ML insights, teams can streamline workflows, gain a deeper understanding of their environments, and drive more informed, data-driven decisions.

    As we continue expanding our offerings beyond core CA 7 events to include additional data types, now is the time to get involved. Contact us today to explore how integrating more mainframe data can transform your observability strategy.

     

    Tag(s): AIOps, Mainframe