Having been active in the IT profession for more than a few decades (but who's counting?), I've become exposed to more over-hyped technology buzzwords than I can remember - heck, these 'innovations' are so widespread that Gartner had to invent their infamous Hype Cycle to help clients understand them.
After recently taking over marketing of the Broadcom Mainframe Software AIOps portfolio, I was introduced to yet another buzzword: observability. Every IT vendor has its own idea of the definition and scope for observability. For neutrality’s sake, I’ll pull a snippet from the Wikipedia page: “Observability is a measure of how well internal states of a system can be inferred from knowledge of its external outputs.”
I found the term easiest to understand by thinking about a system that we can all identify with – the human body. Most of the time we can tell when something’s wrong with the state of our bodies and can express that with an alert of some form – crying, whining, swearing… depending on the age of our ‘system’.
Back in the early days of my particular system, diagnostics included a mercury-based thermometer. Collecting any data beyond temperature required a visit from the doctor and a few other tools like the stethoscope and maybe an x-ray. In other words, observability of human systems was very limited and highly reliant on expert opinion.
These days I wear a fitness tracking device on my wrist – my choice is the Apple Watch. At a point during the recent pandemic, I noticed that my heartbeat had gotten kind of funky. Almost immediately, my Apple Watch informed me that I was experiencing paroxysmal a-fib. I was able to use the watch to produce EKGs during these episodes and bring them to my doctor to discuss. My ‘watch’ had dramatically improved observability of my personal human system to get me the care I needed (and thankfully the a-fib episodes have since disappeared).
Yes, my watch is packed with sensors that deliver data, but it took artificial intelligence (AI) and machine learning (ML) to analyze that data and improve the observability of my human system. Producing data is great, but it takes advanced algorithms to detect, interpret, and classify patterns in that data – and to know whether these patterns should either be ignored (no real issue) or surfaced (requiring attention – or an automated response).
My point: there are now a huge spectrum of tools supporting human systems observability – ranging from the thermometer to the Apple Watch – and the observability of human systems is a function of both data collection as well as advanced analysis of that data.
Observability of IT systems works the same way.
IT operators have had tools to access systems data for as long as systems have been in existence. But, like thermometers, these tools simply present data. And like the doctors of my childhood, operators are tasked to interpret an ever-growing stream of information and use their experience to identify problems, discern their root causes, and take appropriate action.
Some vendors will claim that IT observability is about ‘monitors’ – tools that present system information in easily digested dashboards. While today’s monitors are certainly prettier than green-screen data scrolls, they don’t really raise the bar on observability because they use pre-defined metrics and thresholds that focus on the health of silos (e.g., network health, storage health, transactional health, etc.) rather than the health of systems. They are programmed to look for expected outcomes. They don’t help you deal with the unexpected.
Handling unknown problems requires more than this level of observability. In fact, to me, the term ‘observability’ is just a subset of a broader initiative required to improve IT operations.
Customers struggle today to observe their systems at all because, as I mentioned, most IT tools present siloed data and don’t share that data easily. At Broadcom, we’ve enriched our mainframe IT management tools with open APIs that allow mainframe systems data to be extracted and correlated across technological silos – this allows the internals of mainframe systems to be truly seen and enables our tools to derive insights from this observed data.
This broader initiative is what the industry tends to refer to as AIOps – the application of AI and ML algorithms to detect, interpret, and classify patterns as either normal system behavior or outliers (aka, the unexpected and unknown!).
AIOps is focused on analyzing correlated data in context so that IT operations can better maintain the health of their systems – reducing Mean Time to Remediation (MTTR) of existing issues as well as predicting how probable future issues can be proactively avoided. And it offers the option for applying data-driven automation to remove the need for human intervention in treating and avoiding well-understood issues.
Observability is the necessary starting point, but it is the combination of machine intelligence with human intelligence that will help you continuously improve mainframe operations.
So, the next time a vendor comes knocking at your door with an observability story, find out if their offering is just a slick new way to monitor siloed data, or if they are using machine learning to improve your IT health like Broadcom AIOps can. Whether it’s the next generation of the Apple Watch or the unfolding of Broadcom’s AIOps plan, I’m excited for the future of true observability, one that prescribes improvements and takes action just as well as it monitors changes.