It’s time to rethink how IT operations are managed.
Imagine you are an IT operations manager at a government agency. At a critical period when almost the entire country is trying to access your IT systems, a manhole fire brings your server connections down. You get a call: The system seems to have backed on to a backup connection that is much lower than the regular 10 GB connection to the data center. The help-desk team starts receiving frantic requests from citizens trying to submit their last-minute paperwork. The fallback is slowing down the responses, which in turn is leading to connection timeouts. Multiple teams are on the status call, scrambling to figure out the cause and a plan to fix it.
One thing is clear: This will reach the news soon and you will be called into the undersecretary’s office to explain. You see an entire week of meetings ahead of you to prepare a briefing report about the incident. You have been saying all along that the backup systems are not up to date, and the agency does not have the bandwidth or manpower to deal with a major data center connection failure at a critical juncture when you expect high volumes.
The same scenario at an agency that has embraced cloud computing looks different. You would arrive at work Monday refreshed after a relaxing weekend. As you scroll through the weekend notifications, you notice that the system usage dipped for a little bit on Sunday and then went back up. You click on the dip alarm and notice that the systems were backed to an alternate zone due to an emergency on the main system. The backup was seamless and hardly noticeable. You also notice that there was a surge in demand over Sunday as citizens scrambled to submit their last-minute paperwork. The OnDemand nodes had spun up and then the usage had gone down to the usual levels on Monday morning.
You open your dashboard and look at the stats on the number of releases, the number of successful submissions and application performance levels. Your entire DevOps team is now focused on building the new set of features that business has asked for while automation takes care of the day to day issues.
How Cloud and DevOps Are Automating Operational Processes
As illustrated in the scenarios above, digital technologies are not only changing the way we work but the quality of our lives. Cloud and the advent of virtualized hardware have brought about the ability to generate on-demand instances based on pre-qualified triggers. Artificial intelligence is introducing the ability to predict situations even before they happen. Consequently, this rapidly evolving technology should have organizations restructuring how to manage IT ecosystems. Here are just some critical aspects of operations in the future:
Differentiated workforce: Traditionally operations and maintenance have been viewed as the backdoor to IT. Operations were primarily system and network administrators along with helpdesk support who worked with machines in large server rooms. However, the operations of the future will require a workforce skilled in development; adept at understanding and learning AI technologies with the ability to morph into development, testing or operational roles as needed. That means organizations—including federal agencies—need to be ready to train and retrain their workforce as needed.
A laser focus on automation: A cloud environment opens the door to automation and nearly limitless possibilities. The value of automation is not just as a CI/CD pipeline but also to streamline environment creation and deployments, versioning and control of templates and even account creation. Automation, continuous improvement and a DevOps based approach are key to operations of the future. Moreover, as the O&M team frees up from spending a vast amount of their time in creating environments, they will have the ability to focus on mission objectives and efficiencies.
Informational insights with AI and machine learning: Going beyond building automation, artificial intelligence in operations can help us build systems that are self-healing and capable of conducting end-to-end performance management using real-time insights. AI has two components to it: the use of real-time data through log analysis and data streams to improve performance; and use of insights for predictive analytics, forecasting, anomaly detection, root cause determination etc. The desired outcome is continuous insights that can yield continuous improvements.
Technological disruptions are touching every facet of our organizations. As we architect the agencies of the future, we need to make sure that we do not ignore the group that keeps the lights on.
Geetika Tandon is a principal with Booz Allen Hamilton leading their cloud practice within the financial, energy and economic development sector.
NEXT STORY: The Scheduling Woes of Adult Friendship