In ThinApp’s upcoming 4.5 release (currently in beta), we’ve managed to accelerate the startup time for some virtualized applications like Office up to 200% compared with previous versions! One big secret to the optimization comes from internal file format changes that occur when building packages. ThinApp performs static analysis on the applications and rearranges executables and DLLs bits on disk so they can be mapped into memory with minimal disk and cpu overhead. Because this optimization relies on drastic file format changes, it could be nightmare of backward and forward compatibility issues when rolling out to customers and technology partners. But, thanks to some clever design, it’s just another regular release with no process changes for our customers and partners.
File format backward and forward compatibility is a classically difficult problem to solve. Even the most carefully planned file formats evolve after years of use, and the inability and difficulty involved in changing file formats is are large barriers to innovation.
As an example, let’s look at ZIP, the most popular archive file format in the world. In the early days of PCs, 4GB of disk seemed impossibly big so ZIP used 32-bit numbers to represent file sizes and block offsets. As a result, the original ZIP file format could not support archives larger than 4GB or individual compressed files in archives that were larger than 4GB. PKWARE eventually released an update to the file format specification (ZIP64) which used 64-bit numbers instead of 32-bit numbers, allowing files of any size to be supported. While this problem solved one problem, it created another. Older applications (like Explorer in Windows XP) didn’t know anything about ZIP64 and couldn’t read this new format. Explorer in Vista supports ZIP64, but if it creates new files in this format Windows XP users may not be able to read them. Going from 32-bit to 64-bit is one obvious reason to change a file format, but there may be many more subtle reasons. For example, what if PKWARE discovered they could achieve a 10% compression boost by tweaking their compression & decompression algorithms? They would be prevented from making smaller changes because the file format is set in stone and the same adoption problems would result from such as change. At some point there could be so many incompatible file formats and apps to cause developers and users to abandon the format for something simpler. In effect, the need to maintain a widely usable file format prevented PKWARE from further innovating in the very area which brought them original fame: compression.
Most developers turn to pre-made libraries which they can use to open and read ZIP files, and link these libraries into their applications or load as dynamically loaded libraries (DLLs). If a new file format specification for ZIP becomes available, a developers can upgrade the libraries and ship out new versions of their product. If the interface between the library and the application is designed well enough, a developer only needs to replace a single DLL file on end user machines to enable reading of old and new file formats alike. This works, but not without effort:
- The library maintainer must create a new library which reads old and new file formats and make it available to app vendors.
- The app vendor must obtain the latest version of the library and ship it out. Often shipping any changes to end-users, no matter how simple they are, requires a full Testing/Quality Assurance cycle.
- The end-user must update their applications. Even with persistent nag dialogs, users typically have better things to do with their time than sitting around updating their applications.
File format changes take a long time to propagate to apps and users. More often we are stuck with old files and applications forever. The result is that innovation is slowed and everyone from library maintainer, to app vendor, to end user is taxed by time and effort.
VMware ThinApp avoids this tax on innovation in many new ways.
Binding format logic & data together. ThinApp creates self-contained EXE files which are similar to self-extracting ZIP archives. They contain information such as a virtual file system, virtual registry, package options, and most importantly – a copy of the ThinApp runtime. Like self-extracting ZIP files, ThinApp packages have all the logic needed to read their own contents baked into the file itself. Because applications can be run directly from the “archived” state, the time, disk space, and install/configuration changes related to self-extracting ZIPs are eliminated. And because each ThinApp package contains its own copy of the runtime which understands the attached file format, VMWare is free to change the internal file formats with every release. Additionally users and Administrators don’t need to worry about compatibility between runtimes previously deployed and currently available.
ThinApp is able to efficiently bind the runtime with each package because of continuous focus on three key areas.
- Small Runtime. We know if the runtime get’s too large, people may become concerned with the extra disk and bandwidth involved with including a copy with every application. Over the last 9 years of development we’ve managed to keep the runtime to only 600k! Blaise Pascal once wrote “I have made this letter longer than usual, only because I have not had the time to make it shorter”. We expect with more time, we can make the runtime smaller while accomplishing even more.
- Fast Startup time. Often applications and runtimes use the installation process to do expensive, one-time tasks in order to optimize subsequent executions. ThinApp is designed for fast startup every time, so there is no need for a separate runtime installation process. One way this is accomplished is by carefully arranging data and code on disk so it can be mapped into memory without impacting disk or CPU. The ThinApp runtime can be loaded into memory and “mount” the virtual filesystem, registry, etc. in a matter of milliseconds!
- Restricted User-mode only code. ThinApp pioneered device driver-less application virtualization so it can eliminate the installation (and uninstallation) steps needed to perform operations requiring Administrator rights like device driver installs. Today,all new AppVirt solutions are based on this approach because of its clear advantages.
Exposing ThinApp file format libraries to 3rd parties.
Over the last few years, VMWare’s ThinApp team has worked with technology partners in the desktop management systems and security space to provide a library (ThinApp Management SDK) to interrogate information inside ThinApp packages and processes. These partners are using this library to do things like inventory scanning, anti-virus protection, and user management. This library was made available to all VMware partners at the same time as the ThinApp 4.0.4 release. As mentioned earlier, in ThinApp 4.5 we will be changing our internal file format because we have found we can greatly accelerate application startup times by rearranging the data on disk in specific and non-obvious ways. Despite this file format change, neither our partners nor their customers will need to perform any updates in order support the new file format.
How is this possible? The ThinApp Management SDK library doesn’t actually have any logic to read package contents. Rather, it simply loads the runtime from the package into memory and then asks the loaded runtime to read the package contents on its behalf. In this fashion, older versions of the management SDK library can work with newer ThinApp packages even as file formats have changed, and can support both backward and forward compatibility.
Because the ThinApp management SDK is loading and executing code from random packages, we need to make sure the loaded code isn’t booby-trapped to do bad things. Packages may come from random dangerous sources, and code which inspects the packages may be running from elevated security accounts. As a result, we need to ensure the ThinApp runtime inside the package hasn’t been tampered with. This is accomplished by using a public/private key signature system. All runtimes are signed by VMware using a private key, and the management SDK verifies the code has not been modified by checking the code hash with a public key.
ThinApp provides the option to produce MSI (Microsoft Installer) files when creating virtual packages for easier deployment through software delivery systems. The Quantum compression algorithm used in MSI CAB files was one of the first compression algorithms to significantly beat ZIP. Microsoft licensed the algorithm from David Stafford (good friend of mine) in order to save millions of dollars on floppy disks for software distributions. Similar to ZIP, Windows MSI files are limited by their format specifications to 2GB or less. Unlike ZIP, Microsoft never released an update to the MSI format to enable larger packages. Instead, users must create a single MSI file plus multiple separate CAB files, each no larger than 2GB each. This limitation is a pain. If you want to point a user to a large installer located on a website via an http link, you need to provide the files inside another archive format like ZIP or ISO. In ThinApp 4.5, we have solved this problem, and for the first time you can deploy a single MSI file of any size. Our solution is to create a small MSI file which contains only boot-strap installation logic. Then, we append compressed file data to the end of the MSI file in our own format. When the boot-strap logic gets executed by windows installer, it can read and decompress the required file data found at the end of the MSI file.
Besides being limited to 2GB, one additional downside of the CAB format is that it doesn’t offer random access to data inside of a file. In order to read the byte located at 1GB, you need to decompress everything preceding the location. Tthis can be very slow, especially for larger files. We wanted the management SDK to work with both ThinApp EXEs and ThinApp MSIs at the same blazing speed. Since we use our own format for compressed data, the bulk of ThinApp 4.5 MSI data is not actually stored in CAB format, and the ThinApp management SDK can operate on ThinApp produced MSI in an identical manner as ThinApp EXEs.
ThinApp has creatively solved file format compatibility issues in many areas by combining code and data to eliminate discontinuities between old applications and new formats. This results in faster release cycles, faster innovation, and a smoother user experience. Because all of these innovations are invisible to most people, the next time you launch a ThinApp application – it will just work!