Sunday, August 21, 2011

KVM Forum 2011 Highlights

KVM Forum 2011 was co-located with LinuxCon North America in Vancouver, Canada. KVM Forum ran Monday and Tuesday, 16 & 17 of August and featured two tracks packed with developer-oriented talks on KVM and related open source projects.

Here is a summary with links to some of the most interesting talks:

Big picture and roadmaps


Daniel Berrange gave an excellent overview of libvirt and libguestfs. These two tools, along with other virt-tools like virt-v2v, form the user-visible toolkit and APIs around KVM. Anyone who needs to automate KVM or develop custom applications that integrate with KVM should learn about the work being done by the libvirt community to provide both APIs and tools for managing virtualizated guests.

Alon Levy's SPICE Roadmap talk explains how remote graphics, input, sound, and USB are being added to KVM. SPICE goes far beyond today's VNC server, which can only scrape screen updates and send them to the client. SPICE has a deeper insight into the graphics pipeline and is able to provide efficient remote displays. In addition, channels for input, sound, and USB pass-through promise to bring good desktop integration to KVM.

Subsystem status and plumbing


Fixing the USB Disaster explains the state of USB, where Gerd Hoffmann has been working to remove limitations and add support for the USB 2.0 standard. These much-needed improvements will allow USB pass-through to actually work across devices. In the past I've had mixed results when passing through VoIP handsets and consumer electronics devices. Thanks to Gerd's work, I'm hoping that USB pass-through will work more consistently in future releases.

Migration: One year later tells the story of live migration in KVM. Juan Quintela has been working on this area of QEMU. In part due to today's live migration support in the device model, it has been a struggle to provide working live migration as device emulation is extended or fixed. In particular it is a challenging problem to migration from an old qemu-kvm to a new one, and vice versa.

New features


AMD IOMMU Version 2 covers the enhanced I/O Memory Management Unit from AMD. Joerg Roedel, who has been working on nested virtualization on AMD CPUs, presents the key features of this new IOMMU. It adds PCI ATS-based support for demand paging. This eliminates the need to lock guest memory when doing PCI pass-through, since it's now possible to swap in a page and resume the I/O when a fault occurs.

Along the lines of Kemari, Kei Ohmura presents Rapid VM Synchronization with I/O Emulation Logging-Replay. The new trick is that I/O logging enables shared-nothing configurations where the primary and secondary host do not have access to shared storage. Instead the primary sends I/O logs to the secondary, where they are replayed to bring the secondary disk image up to date.

Performance


For a tour of performance tweaks across memory, networking, and storage, check out Mark Wagner's talk on KVM Performance Improvements and Optimizations. He covers Transparent Huge Pages, vhost-net, SR-IOV, block I/O schedulers, Linux AIO, NUMA tuning, and more.

...and more


Each talk was only 30 minutes long, and with two tracks that meant lots of talks. To see all presentations, go to the KVM Forum 2011 website.

This post only covered non-IBM presentations. There were so many IBMers working on KVM around that I'd like to bring together those talks and show all the areas they touch on. In my next post I will give an overview of the KVM presentations given by IBM Linux Technology Center people at KVM Forum and LinuxCon North America.

Wednesday, August 3, 2011

My KVM Architecture guest post is up at Virtualization@IBM

Update: These links no longer work. For a more recent overview of KVM's architecture, check out the slides available here.

Earlier this year IBM launched the Virtualization@IBM blog and I'm pleased to have contributed a guest post on KVM Architecture!

KVM Architecture: The Key Components of Open Virtualization with KVM explains the open virtualization stack built around KVM. It highlights the performance, security, and management characteristics of the architecture. I hope it is a good overview if you want a quick idea of how KVM works.

You can follow @Linux_at_IBM and @OpenKVM on Twitter for more official updates. For example, the Open Virtualization Alliance that HP, IBM, Intel, Red Hat, and many others are coming together around.

Wednesday, June 8, 2011

LinuxCon Japan 2011 KVM highlights

I had the opportunity to attended LinuxCon Japan 2011 in Yokohama. The conference ran many interesting talks, with a virtualization mini-summit taking place through the first two days. Here are highlights from the conference and some of the latest KVM-related presentation slides.

KVM Ecosystem

Jes Sorensen from Red Hat presented the KVM Weather Report, a status update and look at recent KVM work. Check out his slides if you want an overview of where KVM is today. The main points that I identify are:
  • KVM performance is excellent and continues to improve.
  • In the future, expect advanced features that round out KVM's strengths.
Jes' presentation ends on a nice note with a weather forecast that reads "Cloudy with a chance of total world domination" :).

Virtualization End User Panel

The end user panel brought together three commerical users of open virtualization software to discuss their deployments and answer questions from the audience. I hope a video of this panel will be made available because it is interesting to understand how virtualization is put to use.

All three users were in the hosting or cloud market. The users were split across KVM, Xen, and KVM on standard and VMware on premium offerings. It is clear that KVM is becoming the hypervisor of choice for hosting due to its low cost and good Linux integration - the Xen user had started several years ago but is now evaluating KVM. My personal experience with USA and UK hosting is that Xen is still widely deployed although KVM is growing quickly.

All three users rely on custom management tools and web interfaces. Although libvirt is being used the management layers above it are seen as an opportunity to differentiate. The current breed of open cloud and virtualization management tools weren't seen as mature or scalable enough. I expect this area of the virtualization stack to solidify with several of the open source efforts consolidating in order to reach critical mass.

Storage and Networking

Jes Sorensen from Red Hat covered the KVM Live Snapshot Support work that will allow disk snapshots to be taken for backup and other purposes without stopping the VM. My own talk gave An Updated Overview of the QEMU Storage Stack and covered other current work in the storage area, as well as explaining the most important storage configuration settings when using QEMU.

Stephen Hemminger from Vyatta presented an overview of Virtual Networking Performance. KVM is looking pretty good relative to other hypervisors, no doubt thanks to all the work that has gone into network performance under KVM. Michael Tsirkin and many others are still optimizing virtio-net and vhost_net to reduce latency, improve throughput, and reduce CPU consumption and I think the results justify virtio and paravirtualized I/O. KVM is able to continue improving network performance by extending its virtio host<->guest interface in order to work more efficiently - something that is impossible when emulating existing real-world hardware.

Other KVM-related talks

Isaku Yamahata from VA Linux gave his Status Update on QEMU PCI Express Support. His goal is PCI device assignment of PCI Express adapters. I think his work is important for QEMU in the long term since eventually hardware emulation needs to be able to present modern devices in order for guest operating systems to function.

Guangrong Xiao from Fujitsu presented KVM Memory Virtualization Progress, which describes how guest memory access works. It covers both hardware MMU virtualization in modern processors as well as the software solution used on older hardware. Kernel Samepage Merging (KSM) and Transparent Hugepages (THP) are also touched upon. This is a good technical overview if you want to understand how memory management virtualization works.

More slides and videos

The full slides and videos for talks should be available within the next few weeks. Here are the links to the LinuxCon Japan 2011 main schedule and virtualization mini-summit schedule.

Saturday, May 7, 2011

KVM Slides from Red Hat Summit 2011

Update: Added link to Mark Wagner's KVM Performance Improvements and Optimizations slides that Andrew Cathrow posted on IRC.

This year at Red Hat Summit 2011 many presentations touched on KVM and virtualization. Slide decks are mostly online now so I took a look and highlighted those that I found most interesting. It's also worth checking again in a few days, hopefully the remaining slide decks will come online.

Converting, Inspecting, & Modifying Virtual Machines with Red Hat Enterprise Linux 6.1

Slides: libguestfs material, virt-v2v/virt-p2v material

Richard Jones (libguestfs) and Matthew Booth (virt-v2v/virt-p2v) cover the tools they have developed for manipulating virtual machines and their disk images. This looks really, really cool. KVM needs great tools for working with VM data rather than requiring the user to manually stack up disk partitioning, volume management, file system, and other functionality from scratch every time.

libguestfs has a small Linux-based appliance VM containing disk, volume, and file system tools. There are a bunch of command-line tools and a shell for interacting with appliance VM, which can access guest file systems without requiring root privileges on the host. Files can be downloaded/uploaded, partitions can be inspected, guest operating systems can be detected, and even the Windows registry can be edited using libguestfs.

KVM Performance Optimizations

Slides: KVM Performance Optimizations

Rik van Riel gave an overview of recent and future KVM performance optimizations:
  • vhost-net in-kernel virtio-net host accelerator.
  • kernel samepage merging (ksm) memory deduplication.
  • transparent hugepages automatic hugepages without administrator management.
  • pause look exiting as a solution to the lockholder preemption problem.
  • free page hinting and dynamic memory resizing to do intelligent swapping and waste fewer resources.

This is definitely worth reading if you're interested in virtualization internals. The presentation also answers the practical question of how ksm and transparent hugepages interact (both are memory management features that are not trivially compatible with each other). Asynchronous page faults weren't mentioned but I've linked to Gleb Natapov's KVM Forum 2010 slides on this feature since it fits in the same category.

System Resource Management Using Red Hat Enterprise Linux 6 cGroups

Slides: System Resource Management Using Red Hat Enterprise Linux 6 cGroups

Linda Wang and Bob Kozdemba explain the cgroups resource control features that have been added to Linux. Processes can be assigned to control groups which kernel subsystems like the scheduler or block layer take into account when arbitrating resources. Cgroups can be used to divide CPU, memory, block, and network resources. This looks much better than nice(1), ionice(1), and friends although cgroups are a complimentary feature and don't replace them. Next time a compile or a download is affecting interactive foreground processes I'll be sure to try out cgroups.

What does cgroups have to do with KVM? Since KVM is based on Linux and VMs are in fact userspace processes the cgroups features can be used to apply resource controls to VMs. Libvirt will play a role here and take care of setting up the right cgroups behind the scenes, but it is interesting to learn about the underlying mechanism and what it can do.

KVM Performance Improvements and Optimizations

Slides: KVM Performance Improvements and Optimizations

Mark Wagner gives an overview of performance tuning across CPU, memory (NUMA), disk, and network I/O. Lots of keywords and tweaks to dig into for anyone tuning KVM installations.

Saturday, April 23, 2011

How to capture VM network traffic using qemu -net dump

This post describes how to save a packet capture of the network traffic a QEMU
virtual machine sees. This feature is built into QEMU and works with any
emulated network card and any host network device except vhost-net.

It's relatively easy to use tcpdump(8) with tap networking. First the
tap device for the particular VM needs to be identified and then packets can be
captured:
# tcpdump -i vnet0 -s0 -w /tmp/vm0.pcap

The tcpdump(8) approach cannot be easily used with non-tap host network devices, including slirp and socket.

Using the dump net client

Packet capture is built into QEMU and can be done without tcpdump(8). There are some restrictions:
  1. The vhost-net host network device is not supported because traffic does not cross QEMU so interception is not possible.
  2. The old-style -net command-line option must be used instead of -netdev because the dump net client depends on the mis-named "vlan" feature (essentially a virtual network hub).

Without further ado, here is an example invocation:
$ qemu -net nic,model=e1000 -net dump,file=/tmp/vm0.pcap -net user
This presents the VM with an Intel e1000 network card using QEMU's userspace network stack (slirp). The packet capture will be written to /tmp/vm0.pcap. After shutting down the VM, either inspect the packet capture on the command-line:
$ /usr/sbin/tcpdump -nr /tmp/vm0.pcap

Or open the pcap file with Wireshark.

Wednesday, April 13, 2011

KVM-Autotest Install Fest on April 14

Mike Roth has just posted a nice guide to getting started with KVM-Autotest, the suite of acceptance tests that can be run against KVM. KVM-Autotest is able to automate guest installs and prevent regressions being introduced into KVM.

I'm looking forward to participating in the KVM-Autotest Install Fest tomorrow and encourage all QEMU and KVM developers to do the same. I only dabbled with KVM-Autotest once in the past and this is an opportunity to begin using it more regularly and look at contributing tests.

Adam Litke has helped organize the event and set up a wiki page here.

I look forward to see fellow KVM-Autotesters on #qemu IRC tomorrow :).

Saturday, April 9, 2011

How to pass QEMU command-line options through libvirt

An entire virtual machine configuration can be passed on QEMU's extensive
command-line, including everything from PCI slots to CPU features to serial
port settings. While defining a virtual machine from a monster
command-line may seem insane, there are times when QEMU's rich command-line
options come in handy.

And at those times one wishes to side-step libvirt's domain XML and specify
QEMU command-line options directly. Luckily libvirt makes this possible and I
learnt about it from Daniel Berrange and Anthony Liguori on IRC. This libvirt
feature will probably come in handy to others and so I want to share it.

The <qemu:commandline> domain XML tag

There is a special namespace for QEMU-specific tags in libvirt domain XML. You
cannot use QEMU-specific tags without first declaring the namespace. To enable
it use the following:
<domain type='kvm' xmlns:qemu='http://libvirt.org/schemas/domain/qemu/1.0'>

Now you can add command-line arguments to the QEMU invocation. For example, to load an option ROM with -option-rom:
<qemu:commandline>
   <qemu:arg value='-option-rom'/>
   <qemu:arg value='path/to/my.rom'/>
</qemu:commandline>

It is also possible to add environment variables to the QEMU invocation:
<qemu:commandline>
   <qemu:env name='MY_VAR' value='my_value'/>
</qemu:commandline>

Setting qdev properties through libvirt

Taking this a step further we can set qdev properties through libvirt. There is no domain XML for setting the virtio-blk-pci ioeventfd qdev property. Here is how to set it using <qemu:arg> and the -set QEMU option:
<qemu:commandline>
  <qemu:arg value='-set'/>
  <qemu:arg value='device.virtio-disk0.ioeventfd=off'/>
</qemu:commandline>

The result is that libvirt generates a QEMU command-line that ends with -set device.virtio-disk0.ioeventfd=off. This causes QEMU to go back and set the ioeventfd property of device virtio-disk0 to off.

More information

The following libvirt wiki page documents mappings from QEMU command-line options to libvirt domain XML. This is extremely useful if you know which QEMU option to use but are unsure how to express that in domain XML.

That page also reveals the <qemu:commandline> tag and shows how it can be used to invoke QEMU with the GDB stub (-s).