Introducing VMware vCenter Log Insight
Today VMware announced its latest analytics product, VMware vCenter Log Insight. The product enables you to easily perform advanced analytics on log data aggregated across your physical, virtualized and cloud infrastructure, leading to across the board improvements in IT metrics. Log Insight is fully integrated with vCenter Operations, and the technology behind it is from our Pattern Insight acquisition last year. The product not only increases the quality and breadth of analysis, but it also significantly improves the productivity of IT admins along with the quality of the services they provide.
If you’re not familiar with log data, think of it as the Twitter feed of the datacenter – each piece of software and hardware emits a constant chatter of status updates. These status updates (short text messages called log messages) provide rich information on the state of the environment, and the actions of individual components within that environment. With log analytics, you not only have the ability to detect trends and issues in real-time, but because of that insight, you can resolve those issues faster.
Log data provides a critical yet greatly underutilized source of information on the health of your IT environment. When an IT issue occurs – say a transaction is failing, a service becomes unavailable, or a significant performance degradation occurs – IT workers must turn to the logs to identify the root cause of an issue. Log analysis is both critical and time-consuming. When we surveyed our customers, we found that the majority of our customers need to analyze logs every week. Even in the smallest IT installations, the volume of log data is large, and analyzing log data with traditional tools is a tedious, time consuming, and error-prone activity.
As the CTO for the Log Insight product, I’ve been eagerly waiting this moment where I can finally talk about what the Log Insight team has been working on! So enough background, let’s take a deep dive (demo-style) into what Log Insight does and how it can help you.
First, we’ve made deploying, configuring, and getting started with Log Insight incredibly easy, particularly if you have a vSphere environment. Just deploy the virtual appliance, point a web browser at it, and walk through a short configuration wizard. Because Log Insight supports all the variants of the syslog protocol, you can quickly configure hardware, OS, and applications to send their logs to Log Insight. To make it really easy for vSphere environments, we provide a utility that configures logging across your vSphere environment with a single step. In less than 30 minutes, you get your first insights! No need to bring in an expert. Want to retain more data? Just add a virtual disk. Want to support more concurrent queries? Add more vCPUs or more vRAM. vSphere makes it really easy!
Once the logs are flowing, Log Insight provides immediate insight into your vSphere environment through packaged knowledge that we’ve gathered from the VMware support and engineering teams. This packaged analysis shows up in Log Insight through a mechanism that we refer to as a “Content Pack.” The built-in vSphere Content Pack includes dashboards, saved queries, field definitions, and alerts. This image gives you a sample of one dashboard in the vSphere content pack:
Through this content pack mechanism, we are putting some of the same tools and knowledge in your hands that our own support teams use to resolve customer support cases. We find that customers learn something interesting (or even concerning) about their vSphere environment during their first use of Log Insight, even if they have never looked at a vSphere log before. For example, in the picture below, the chart on the right allows a customer to immediately see trends in vSphere alarms.
The content pack mechanism also allows third parties to easily create and distribute their own content packs. Every application or hardware component generates unique logs and those logs contain rich information – knowledge about the structure of messages into those logs and the most effective analytics and visualizations for those logs can be packaged into content packs. Content pack authoring is a big topic that we can cover in detail in a later post, but its super easy to export saved analytics and visualizations as content packs for others.
Underneath these colorful charts is a powerful engine that allows you to perform analytical queries on unstructured data. This provides significant agility advantages over a traditional database environment. In a traditional database environment, we are handicapped by the need to make data fit a schema. First we must foresee what questions we want to ask, then design a schema accordingly, and transform the data into that schema (aka ETL – Extract Transform, Load). This time- and resource-consuming process creates significant friction to asking new questions or generating new reports. Log Insight provides a tool that allows database-like analysis of data embedded in log streams without the traditional ETL overhead.
One way that this is achieved is through full-text search. Like “Google for your logs”, but with the additional ability to visualize and analyze the aggregate results of that query. The following screenshot shows how I quickly found all exceptions across my entire infrastructure that occurred in the past six hours.
But that is just the beginning. Where Log Insight sets itself apart is in the ability to dynamically apply structure to unstructured data, and construct database-like queries and visualizations without any query language. To understand this capability, let’s follow an example through the system.
Assume that I’m the administrator of a private cloud powered by vSphere. One of my internal customers reports performance issues experienced during vSphere administrative activities – such as launching virtual machines (VMs), reviewing VM inventory, etc. These management services are provided by VMware vCenter® Server, and most vCenter performance issues can be traced to performance issues with the 3rd-party SQL database that supports it. Suspecting SQL, I point my web browser at Log Insight and search for SQL-related log messages (image below):
Sure enough, I’m seeing some log activity that suggests that vCenter thinks the database server is too slow. The chart at the top gives me some further information – I’m seeing SQL logs happening periodically over the past six hours. To understand better the scope of the potential issue I’d like to answer questions like: “when did this error start?” and “is this only occurring in one datacenter?”
To understand when the error started, I can increase the time range of the search query. When I get to the last last seven days, I can see that I’ve been predictably having SQL performance issues since mid-day May 31st (image below).
Next question – does this affect any instances of vCenter beyond the one that I see in the first 50 results? All log messages that arrive within Log Insight are tagged with the hostname of the node that originated the message. The screenshot below shows the Fields view on the right side of the screen that allows us to quickly do a breakdown of a query by a field.
From the field breakdown chart on the right, we see that this error only occurs on this single vCenter server named ‘strata-vc.’
Now to the last question: “is there a particular query that is causing the issues?” I need to apply structure to the unstructured messages, because the information on the SQL query is embedded in the text of the message and has not been extracted as a field. But no problem! Watch how easy this is. With my mouse I highlight the embedded field in the message (see blue highlight in image below):
Note that when I select text in the message with my mouse, a new ‘Extract Field’ button appears. In this case I’ve highlighted the SQL query “BEGIN; select insert_stats_proc(?, ?, ?, ?); END;”. When I click ‘Extract Field’, Log Insight automatically constructs a pattern to dynamically extract that field. Log insight also gives me a preview of the results of applying the field extraction to the other messages in the view. The dark green highlighted text in the image below indicates the text in each message to be extracted to form the new field. I give this newly discovered field the name ‘slow_sql’.
Now that I have defined this new field, it becomes available on the list of fields on the right side. I can quickly expand it (image below) to discover that there are multiple SQL queries that are experiencing performance issues.
By clicking on the mini-chart, I can also promote it to the primary chart. Based on this visualization (shown in image below), I can see that there is one particular SQL query (insert_stats_proc()) that experiences the overwhelming majority of performance issues.
In a few minutes, Log Insight’s analytical capabilities have helped me to narrow down the scope significantly. Now I know that there is a performance issue caused by a database server. I know which database server. And I even know which SQL query. I could call my database administrator, and leave it to him to figure the rest out. But we’re not done yet! Let’s bring in VMware vCenter Operations Manager, which provides advanced operational analytics for numeric times series data. vCenter Operations Manager and Log Insight is integrated out of the box, starting with Operations Manager 5.7.1. So now, vCenter Operations customers now have the ability to analyze this critical log data and deeper insight into the health of their environment.
vCenter Operations Manager comes with built-in integration with vSphere, and has been monitoring the performance statistics across my vSphere environment. I’m going to use the integration between Log Insight and vCenter Operations Manager to solve two problems at once – first, to leverage vCenter Operation Manager’s advanced analytics to refine the possible root cause of the performance issues. Second to create an alerting mechanism so that in the future I’m notified immediately if any vCenter SQL query experiences slow performance. From the Interactive Analytics page, we create an alert (image below):
The alert dialog allows us to choose how the alert should be delivered. One mechanism is via email. The next mechanism is via a notification event sent to vCenter Operations Manager. From the screenshot below, you can see that I selected both options.
Within a few minutes, I receive an alert within vCenter Operations Manager, shown below.
Most interesting is the “Root Cause” section of the alert window – vCenter Operations Manager has applied its analytics to identify that the most likely culprit for the event was memory pressure on the cluster on which vCenter and the vCenter SQL database are running (actually a host within the cluster). This makes sense because in this cluster, vCenter and the SQL database share a compute and memory pool with other virtual machines. So I now know enough to take action – options include reducing the number of virtual machines in the cluster, creating memory reservations for the SQL database, or adding more RAM to the hosts in the cluster.
In summary, Log Insight extends VMware’s operational analytics into log data with software that is engineered to be very easy to use and quickly provides you with insight into your IT environment. Its integration with vCenter Operations Manager provides unified analytics across numeric and unstructured data, and comes pre-loaded with rich knowledge about how to identify and diagnose issues in your vSphere ecosystem.
Hopefully this blog post has given you enough of a taste of Log Insight and you’re ready to take the next step – check out the product. Log Insight is expected to be generally available for purchase in Q3, but no need to wait until then! A fully functional Beta version of Log Insight is now available for download. Download link and documentation to get you started can be found in the Log Insight Beta Community on My VMware. Or check out below the product videos we created that show demos of how to install and use Log Insight.