The Amazing VM Record/Replay Feature in VMware Workstation 6
VMware Workstation 6 is coming shortly, and we’re quite excited about its many new features: support for paravirtualized Linux kernels, integration with Microsoft Visual Studio and Eclipse, high-speed USB support, multiple monitor support, and the new VIX 1.1 VM scripting API to name a few.
One other new feature that we’re extra excited about is “VM Record/Replay” (shown enabled below). The idea behind Record/Replay is relatively straightforward. When executing software within a virtual machine (VM), our virtualization layer can record the complete execution behavior. Having saved this behavioral information, the user can go back in time (from the VM’s point of view) and replay that exact and complete behavior over and over again. The keywords here are exact and complete, and that’s where both the implementation challenge and the power come from.
First let’s examine why this is so challenging. The execution of any software on a computer is comprised of a fairly complex set of interactions. The computer’s CPU is constantly fetching and executing instructions, accessing memory, and trapping into the operating system for a variety of services. Meanwhile, the computer’s I/O devices (e.g. disks, network cards, mice, keyboards, and timers) are doing things on their own schedule, interrupting the CPU when someone types, a network packet arrives, or they otherwise need the attention of the operating system. When these “asynchronous” events occur, the CPU responds almost immediately, heading down a new execution path to provide the appropriate response for the event. And because the devices are on their own schedules, the complete execution path of the software can be very different from one execution of software to the next. Add multi-threaded programs into the mix and you have a situation where no two executions of software are ever exactly the same.
It’s this non-deterministic behavior that makes computers complex systems and that causes such pain for programmers trying to make their code flawless. Many software problems only occur when one of the many millions of possible execution paths takes place, and it is this class of bug that leads many programmers to tear out their hair claiming “well that customer problem doesn’t occur for me”. The end result is that bugs often go unfixed.
Now let’s discuss how life can be better in a VM. Because our virtualization software sees and controls all of the execution of software within a VM (the “guest” operating system and applications), it can do many things that normal hardware cannot do. One such thing is this VM Record/Replay capability.
When you enable the Record/Replay feature, VMware Workstation immediately takes a snapshot of the full VM state, continues guest software execution, and begins tracking its execution behavior. We’re not talking about a movie of what’s on the screen, but the full system behavior including all CPU and device activity. It notes the exact point in time when every device interrupt or other asynchronous event occurs and records this information to a compressed log file until you tell it to stop. It actually has to save a few other things such as the contents of all incoming networking packets, too.
When you choose to replay the recording, it restarts the VM from the snapshot and faithfully re-creates the recorded execution by feeding the logged events and data back to the VM at the exact points in time when they occurred during the original execution. The result is that the exact same execution path is followed during replay. And since the log is saved to disk, you can share the exact execution scenario with others and replay it over and over and over again. We also allow you to “go live” at any time, aborting the rest of the replay and allowing new interactions and new behaviors to proceed. One analogy is autopilot for an airplane. You can disengage it at any point in the trip, go to manual control, and head off in a new direction from that point.
Following this idea further, the combination of VM Record/Replay and snapshot management allows a user to create a whole tree of execution path alternatives and replay from any point in the tree. Here’s an example tree showing how one might use this to hone in on a problem.
To make VM Record/Replay extra useful for programmers, we’re rolling out a variety of tools and procedures that leverage it in interesting ways:
- We’ve integrated the use of Record/Replay into the gdb debugger. Users can record execution of the VM, and then attach gdb to the guest OS or applications during replay. At that point, they can look at memory, set breakpoints, and single step through the execution. Yes, we can do this in any part of the kernel! Furthermore, we’re able to use this with unmodified kernels (no kdb needed) and the breaking and stepping is completely transparent to the guest OS. The below example shows the debugging of a linux device driver that has an error in its interrupt handler. Unlike traditional debugging, here we can repeatedly hit the issue with replay and then single step through this highly sensitive, privileged code!
- We’ve added the ability to produce detailed instruction traces from the recorded execution. Unlike existing tracing frameworks, the Record/Replay approach is non-intrusive; it can capture extremely detailed information without affecting the system during capture. Heisenberg would be proud.
- We’ve also added “in-guest recording control”. This lets guest software start and stop VM Record/Replay itself. For example, an ISV might ship their beta software in a VM and have it invoke recording when someone uses a specific feature that has a history of sporadic problems in the field. Should a beta user hit a problem while running this feature, the log of full activity is present. They email it to the ISV and voila… there is now a deterministic case of the bug for developers to work on!
We’ll be publishing instructions for using these and other tools with VM Record/Replay shortly so stay tuned. In fact, one of the developers, Vyacheslav (Slava) Malyugin, has already started blogging additional technical nuggets on using tools with this technology.
A few caveats… VM Record/Replay is an experimental feature in VMware Workstation 6.0. This is an extremely challenging problem to solve and our incredible engineers have been hard at work on it for some time now. As an aside, they sure wish they could use record/replay to debug problems in the record/replay implementation. Nonetheless, things are working quite well within a handful of hardware and virtual machine configuration constraints. For example, we’re currently unable to properly record execution when USB or sound devices are in use by the VM.
I should add that the core VM Record/Replay technology is useful for many tasks beyond debugging those nasty race conditions. Some customers are telling us this will help them create better sales and training demos that they’re sure will work time after time. With “go-live” mode, they can play the demo, but also let their customers begin interactions at any point. Other customers have indicated they’ll use this for forensic purposes. For example, with such a record of someone breaking into a VM (a honeypot or a production server), experts can study and understand exactly what went on during the break-in. We have lots of other plans in the works for this technology as well, so stay tuned!
So to all you programmers out there, download the latest release candidate (RC2 is coming shortly), and check out this amazing new capability. Once you’ve experienced debugging in a world of deterministic execution, you may never go back to the non-virtual world!