Cloud-Scale Management “Likes” Social Media
Managing IT resources across complex enterprises has historically been challenging, and those challenges are only increasing in complexity. We’re adding more variables in terms of clouds, automated processes, and people, to name a few. Consider the steady growth of Internet of Things (IoT) and for many organizations, their management complexity can potentially expand by several orders of magnitude. If that’s not bad enough, there’s one variable that IT has never been able to historically control – people. People come and go and often play by their own rules. When it comes to managing an enterprise, we can no longer assume that people will conform to defined enterprise management standards. Instead, IT operations must conform its standards to the customers it serves. That is why going forward, social media can be an effective tool to bridge the gap between traditional management tools and processes, and more collaborative work styles.
Some of you may be envisioning the scenario below, but there are serious and significant use cases for deep social integration into enterprise management.
Consider a typical problem that I hear frequently from our clients – if scheduled maintenance will impact specific application instances (VMs, containers, etc.), how does IT operations notify prospective application owners – or simply members of the organization that care about a particular application or service? That problem may sound easy on the surface, but for many organizations it has long been a struggle. The original application owner may have left the company and it may not be clear who cares about a particular application or service. Experience has already shown that mass emails are rarely effective. This is where social media can bridge the gap.
Consider the following workflow.
As employees and contractors come and go, the social platform can be quickly updated to reflect ownership and interest changes. Social streams can also be monitored to spot potential bugs, performance issues, or pending performance spikes. For example, recently I had met with a client that was building a solution to monitor social streams and news feeds to predict the load on their trading systems and proactively expand capacity before the inevitable performance spike arrives.
There are many advantages to enhancing cloud management with a social fabric, including:
- Cutting down noise: Key stakeholders of any application or service can follow objects relevant to that service (such as VMs, physical hosts, networks, containers, etc.). Instead of getting inundated with notification emails that may not apply to them.
- Aggregation of notifications: Associated group notifications can be aggregated into a single Socialcast post. That single post can include information pulled or pushed from a variety of management tools such as vCenter Server, vRealize Operations and vRealize Log Insight.
- Identifying the right people to be involved in fault resolution: The social fabric can associate machines, networks, clusters, data centers and clouds with the relevant stakeholders in one social network / graph. That makes it easy to identify the right people associated with a data center problem and expedite troubleshooting (collaboration can happen from within the relevant team via embedded Socialcast). Also updates on upcoming outages can be broadcasted to the right folks more easily – making sure the right people are aware upfront.
- Looking at infrastructure history: Social updates generated automatically from management feeds and individual users provides a rich history of problems, making it easy to spot historical trends that impact specific applications or services.
- Determining impact of downtime: Social integration provides the ability to query groups of systems to understand the impact of downtime.
To make this concept a reality, social interaction must be intuitive and native to the tools and apps that application owners, developers and IT administrators use daily, and is something that VMware has been working on for the past several years.
Aside from integrating with enterprise web portals and third party management platforms, the Socialcast REST API can also be used to integrate with VMware’s own management products included in the vCloud Suite.
Our internal solution uses the following architecture.
The heart of the solution is our SCX software project, which is short for Socialcast Extender. Consider SCX to be a stream aggregator that can take inputs from vCenter Server, vRealize Operatons, vRealize Log Insight and vRealize Automation, and use that data to populate social streams in Socialcast. In addition, it also aggregates other sorts of information, similar to how a Database does joins across different tables. For example, imagine vCenter Server knows that a VM named “X” has ID “Y”, and imagine that a log message from LogInsight has the ID “Y” in it. Rather than just publishing a log message with just an ID in it, making the message difficult for a human to understand, SCX can observe both vCenter Server and vRealize Log Insight, and associate the name and the ID. Thus, information from Log Insight can be augmented on-the-fly by information from vCenter Server, leading to richer information pushed to Socialcast. This is just one possible application of our SCX layer, and leads to many interesting use cases.
Here are some screen shots that show how the integration can work.
Administrators can make status updates using the vSphere Web Client and those updates can automatically populate social streams.
Admins can also write a broadcast message to a Socialcast group such as posting an update for a planned maintenance period.
Notifications can easily reach all affected parties in real-time.
As you can see, social media can bring application owners and operations teams much closer together and integration within existing toolsets takes collaboration to a whole new level. The power of what’s possible from an aggregation and alerting perspective can be limited only by your imagination. You no longer have to be in the dark regarding who cares about particular applications and services. Enriching collaboration between development and ops teams can also accelerate DevOps transformations and remove much of the trepidation felt by Ops teams regarding being in the dark. We see tremendous potential in this approach.
Note that this is an internal project that we think has significant applicability in enterprise environments. The purpose of this post is to hear your thoughts. How do you feel about this level of integration in your environment? VMware customers that I have spoken with about this have been extremely excited. I would love to hear your feedback to better understand how we can bring this technology to market.
Special thanks to Ravi Soundararajan, Badi Azad, and Jens Koerner for their contributions to this post. Also, I’d like to thank VMware SE Jason Dion for his work in the field on this project.