Source – insidebigdata.com
The wonders of automation have brought incredible efficiencies to standard IT monitoring practices, especially when it comes to the detection-prevention-analysis-response (DPAR) cycle. Automating detection and remediation steps via alerts has alleviated massive amounts of stress for IT teams and businesses alike, providing the data needed to understand how and why issues happen. But it begs the question: are we doing enough with that data, and how can we do more? Is machine learning a viable solution for modern monitoring practices?
Why DPAR? First, some background: DPAR cycles stem from the three ultimate goals of the information security process that guide businesses on how to protect confidentiality, ensure integrity, and maintain availability. While these guidelines are fluid and should not be thought of as a cure-all, they also inform the process of building sophisticated monitoring, especially as it relates to leveraging the emerging area of machine learning.
The Current State of Monitoring
When it comes to monitoring today, we’ve gotten really good at collecting huge piles of data and creating fairly rudimentary triggers, like the popular “CPU is over 90” alert. From an alerting standpoint, our minds typically go these single- or double-variable triggers, taking a recurring problem and creating a quick, repeatable response action. And while this certainly works, there is a huge missed opportunity to tap into the rich data that typically lies within multiple variables that all feed into an issue. Doing so will only make the DPAR process even smarter.
The reason IT teams are typically stuck in this single- or double-variable zone is that creating monitoring responses for multi-variable data is not a small or simple task. There is rarely one person with the range of experience, depth of knowledge, and ability to sort through vast amounts of data in order to understand all of the connection points and their potential impact. Additionally, disparate IT silos seldom come together to review complicated, multi-part inputs—that is, until something really becomes a problem.
In short, monitoring is much more than a blinking light or a ticket. It’s an ongoing collection of data and metrics from a set of target devices, and it produces a goldmine that IT teams could be utilizing much, much more.
Monitoring’s Expanding Realm
As if having this vast amount of untapped monitoring data isn’t enough of a challenge, the race to the cloud is also bringing new monitoring considerations. Traditionally, data center monitoring practices have involved a systems-centric view—making sure the lights are on, the temperature is good, systems are up and running, and resources are accounted for. Now, cloud-based and hybrid IT organizations have adopted a DevOps mindset, which focuses less on the question of whether (or how well) the technology is running, and more on the business experience being delivered by the technology.
Incorporating DevOps into traditional monitoring practices opens a whole new area for monitoring and ensuring IT performance, and significantly increases the amount of data collected. As organizations go further into the cloud, this definition of monitoring is only going to continue to expand.
Making the Case for Machine Learning
So, if the monitoring data we collect today could be better analyzed to identify and remediate issues, and the volume of monitoring data collected is only expected to grow, what does that mean for the future of monitoring?
It means machine learning is the perfect solution.
At its very core, machine learning is an advanced means of making sense of massive amounts of data, and for this reason, machine learning and monitoring should go hand-in-hand. With the ability to analyze different behavioral patterns and metric points and account for unique elements within businesses, the power of applying machine learning to monitoring is significant.
Consider also that today’s IT teams are not well-versed in analyzing the data they do have to identify the additional data needed to pinpoint issues. For example, many monitoring practitioners still analyze situations based on up-down patterns. However, we know this data isn’t telling the full story—but it’s difficult for humans to imagine what additional data could be there to help complete the picture in the pattern. Machine learning can fill that gap in pattern recognition, providing the holistic up-down-left-right view to detect anomalies in sequences, and even tricky intermittent patterns that lead to an issue.
But this brings us back to the DPAR model. If we leverage machine learning for detection, we must then commit to determining how we can implement changes to prevent the event from occurring in the future—a step that, once again, many IT pros overlook.
Best Practices
To prepare for the future of machine learning-enabled monitoring, IT departments and professionals can take steps today:
- Gain visibility into data and metrics across the IT spectrum: Tools currently exist that allow IT pros to take monitoring data from across the infrastructure and synchronize it into an interrelated mass, looking at vastly different metrics such as bandwidth, PU, disk array performance, database locking and blocking, number of connections to a web server, and more. Using such tools will allow IT pros to understand interdependencies, and is a stepping stone to having machine learning capabilities that will run these analyses automatically.
- Think like a data scientist: In order to understand the interdependencies of data across the IT spectrum, IT professionals must become well-versed in mathematics and statistics, understanding what the numbers mean to conduct thorough analysis.
- Adopt a DevOps mindset: This is the expanded, business-minded version of monitoring that involves stepping out of the systems-centric view and understanding what monitoring data means for metrics like end-user experience and business performance.
Of course, it’s difficult to say when machine learning for monitoring solutions will become a reality, and naysayers may believe this is too theoretical. But skeptics should look no further than current technologies like software-defined networking (SDN) to understand how this could come to life in a meaningful way. If we think about it, SDN actually produces a feedback loop with incredible similarities to the how a machine learning-enabled DPAR cycle would work for IT monitoring. Traffic occurs, monitoring gathers data about the traffic and identifies an issue, the SDN controller then analyzes the data and issues a configuration update to the networking devices, changing the quality of the traffic to better serve the overall business role. Sound familiar?
Versions of this intelligent monitoring and response cycle are already becoming a reality today, and with best practices in place and an eye toward the future, IT teams will be prepared to harness the power of machine learning in their monitoring solutions. We aren’t that far away from seeing this play out, and the benefits will be significant for businesses and IT departments alike.