I have a colleague in networking who said something that captures where we are as a society so well: “Slow is the new down.”

That line always stuck with me as we look to please our customers in the highest way possible. Having an application or the network go down is awful, but even a subpar experience can hinder client and hurt public trust.

Applications bring a high level of complexity to any infrastructure. With new artificial intelligence- (AI) powered applications, we need to be mindful of not only the availability, but also many more things to build a resilient foundation. 

To help out other IT leaders and practitioners, I’ve put together six questions to ask to make sure you’re maximizing the resilience posture you could have as an organization before things get out of control.

Availability: Is it Functional to Terms?

My network colleagues know this area better than anyone because oftentimes the network is the first to experience an outage. However, it’s not always the case (as we’ve seen with the latest outages). An incorrect response would be, “That’s a question for the network team.” 

Applications are carrying more and more of the burden for the enterprise and we can’t ignore the reality. Team up and think about areas where application and network SREs can help bolster the availability to create peak resilience. 

Recoverability: How Fast Does It Recover from Disruptions?

Like teenagers, we often think we’re invincible to damage. The sad truth is that an outage will happen to everyone eventually – even if it’s partial. When it happens, are you ready for it?

Getting to a good resilience posture in this case means you have a remediation tool in place to quickly fix what’s going wrong. AI-powered automation and remediation tools are becoming more popular because there might be a situation where you don’t know what’s happening, but there are tools that do.

I would also say that manual intervention, while certainly works, could move at a snail’s pace and that pace could determine whether a client will be using you long-term or not.

Observability: Does It Give Visibility to Changes?

Observability is on most people’s minds now and in most of my conversations with clients due to the complexity happening in our collective infrastructure. Applications have way too many dependencies for us to know what’s happening across the board. From experience, there’s chance you already have an observability tool. If you don’t, get one.

If you do, is it fully seeing your entire application ecosystem and can it proactively identify and address gaps in your resilience posture? We’re going to be adding so many more AI applications into our ecosystem in the coming years that presence of such tools will be absolutely critical to maintenance of the environment.

Going back to the question above, one of the most common calls I get from clients is “my application is down and I don’t know why. Can you help us figure it out?” The observability tool that you choose needs to have change management capabilities to track everyone’s updates, remediations and adjustments. With a simple view, you should be able to see changes throughout the day.

I would recommend looking at your software after reading this article to ensure you can do this (while you’re not dealing with a fire drill).

Maintainability: Can it be Managed Without Service Disruption?

Simple changes don’t normally impact most clients’ service nowadays, but even minor changes have shut things down. Small changes shouldn’t have you end up on the news if it goes wrong.

Automation is the biggest lifesaver for today’s application ecosystem. It allows teams to set up maintenance windows during slow times but also provide proactive, continuous improvements at any time of the day. It’s all about reducing costly downtime and getting your infrastructure at a constant ready state so that teams can continue to build and deploy changes or applications into the environment with no hiccups.

Scalability: Does It Scale and Expand to Meet Demands?

For retail, being at that ready state now is vastly different than being ready come holiday shopping season. Or if you’re using AI a lot more now, can you keep new AI-based applications running as they keep piling onto your ecosystem?

Either way, take a look at your application landscape. Are there areas where there’s room to grow or will it burst with any new addition (or activity)? That is the most concerning area for you to address.

Usability: Does it Perform?

This one might sound super simple, but as transformations keep happening, it’s always healthy to take a step back and look at the ecosystem in its entirety. How do your applications perform now? Will they need to be adjusted so they can work in different environments to perform even better? Applications can be finicky; make sure they are built in a way that adapts to the hybrid cloud and AI world we live in.

In general, being down or having applications fail is a business risk that we are all trying to mitigate. If you’re like most customers I speak with, you probably have a fragmented toolset and do reactive fixing. Take the time now to ask yourself these six powerful questions. Make sure you have a data-driven approach to how you conduct your resilience assessment. There are solutions out there to build the best resilience posture you can that can positively impact your bottom line and your reputation.

TECHSTRONG TV

Click full-screen to enable volume control
Watch latest episodes and shows

Tech Field Day Showcase

SHARE THIS STORY