VFIO tips and tricks: OVMF

Showing posts with label OVMF. Show all posts

Monday, September 15, 2014

OVMF split image support

Gerd Hoffmann's Fedora OVMF builds have been updated to support installing the split CODE/VARS binaries. Wherever you get your OVMF binaries, the advantage of this is that the EFI variables, ex. bootloader information, is stored separately from the executable code of the firmware allowing it to be updated without blasting the variable store. The libvirt update mentioned the other day already supports this quite nicely. Rather than having a loader entry with a single read-write image, we switch that to read-only entry and add nvram storage. The XML looks like this:

<domain type='kvm'>
...
<os>
<loader readonly='yes' type='pflash'>/usr/share/edk2.git/ovmf-x64/OVMF_CODE-pure-efi.fd</loader>
<nvram template='/usr/share/edk2.git/ovmf-x64/OVMF_VARS-pure-efi.fd'>
...
</os>
</domain>

Once the guest is started, a copy of the NVRAM templace is made an placed under /var/lib/libvirt/qemu/nvram/$DOMAIN_VARS.fd. This then becomes part of the state of the VM.

On the QEMU commandline, you'll need to manually create a copy of the VARS file for each VM and specify the CODE and VARS as:

/usr/libexec/qemu-kvm ... \

-drive if=pflash,format=raw,readonly,file=/path/to/OVMF_CODE.fd \

-drive if=pflash,format=raw,file=/copy/of/OVMF_VARS.fd

I'm also told that virt-install and virt-manager support for OVMF are coming real soon and the interface will be similar to the XML, allowing selection of both a CODE and template VARS files. The libvirt config file, /etc/libvirt/qemu.conf, also allows a default VARS template image to be specified per code image, so that the <nvram> entry gets filled in automatically based on the file used for the <loader> entry.

Finally, how do you tell whether you have a split or unified image for OVMF? Lacking some sort of parser, apparently the best way to tell is by file size. A unified image will be exactly 2MB while the split CODE image will be 2MB-128KB and the VARS image will be 128KB. Unsurprisingly then, you can also create a split image with dd, taking the first 128K as VARS and the rest as CODE.

Good luck.

Thursday, September 11, 2014

libvirt now supports OVMF

Thanks to the work of Michal Privoznik and support of Laszlo Ersek and others, libvirt can now manage VMs using OVMF natively. If you're on Fedora and using Gerd's OVMF RPMs, you simply need to create a copy of /usr/share/edk2.git/ovmf-x64/OVMF-pure-efi.fd for each VM (put it somewhere like /var/lib/libvirt/images/), and make it writable (support is still new and it doesn't seem to change file permissions for the VM yet). Then, edit the domain XML to include this:

<domain type='kvm'>
...
<os>
...
<loader type='pflash'>/var/lib/libvirt/images/VM1-OVMF.fd</loader>
</os>
</domain>

Since the OVMF image we're using is a "unified" image, it contains both the UEFI code itself as well as variable storage space, so the above adds it as writable by the VM. There are also ways to have a split image so you can maintain the UEFI code separate from the variables, but I'll wait for builds from Gerd that support that before I attempt to document it.

With support for both the kvm=off cpu option and OVMF in libvirt, we're now able to run completely native libvirt VMs with GeForce and Radeon GPU assignment. Support is already underway for virt-manager and virt-install of OVMF.

Also, a VM CPU selection tip, since we don't care about migration with an assigned GPU, there are few reasons left not to want to use the -cpu host option for QEMU. To enable that through libvirt, change the CPU definition in the XML to this:

<domain type='kvm'>
...
<cpu mode='host-passthrough'/>
...
</domain>

Automatic vCPU pinning is also available:

<domain type='kvm'>
...
<cputune>
<vcpupin vcpu='0' cpuset='0'/>
<vcpupin vcpu='1' cpuset='1'/>
</cputune>
...
</domain>

And yes, hugepage support is also available, see libvirt documentation for details. Enjoy.

Thursday, August 28, 2014

Upstream updates for August 28th 2014

qemu.git now includes the MTRR fixes that eliminate the long delay in guest reboot when using OVMF with an assigned device on Intel hardware that does not support IOMMU snoop control.

Wednesday, August 27, 2014

Fixes for Linux Radeon with 440FX guests

The DRM and Radeon drivers in Linux assume that there's always a parent device to the GPU. We can break this assumption easily with either the 440FX or Q35 QEMU machine modules by attaching the GPU to the root bus. This has been one of the problems drawing users to more complicated Q35 models which more accurately reflect the host hardware. We can also fix the driver to avoid such assumptions:

[PATCH] drm: Test for PCI root bus to avoid NULL pointer dereference

[PATCH] radeon: Test for PCI root bus before assuming bus->self

Tuesday, August 26, 2014

Upstream updates for August 26th 2014

A couple updates relevant to Nvidia GeForce assignment:

QEMU

fe08275d is now in qemu.git, decoupling the primary Nvidia GPU device quirk from the x-vga=on option. This means that an Nvidia GPU assigned to a legacy-free OVMF VM will now enable this quirk automatically.

libvirt

d0711642 is now in libvirt.git enabling libvirt support for the kvm=off QEMU cpu option. To enable this in your XML, add this to your VM definition:

<domain type='kvm'...>
<features>
<kvm>
<hidden state='on'/>
</kvm>
</features>
...
</domain>

Monday, August 25, 2014

Primary graphics assignment without VGA

We really have a love-hate relationship with VGA when talking about graphics assignment. We love that it initializes our cards and provides a standard interface, but we hate all the baggage that it brings along. There is however an alternative emerging, UEFI by way of OVMF. UEFI is a legacy-free firmware for PCs (among other things) that aims to replace the BIOS. Let me repeat that, "legacy-free". It doesn't get any more legacy than VGA.

So how do we get primary graphics without VGA? Well, assuming your graphics card isn't too terribly old, it probably already contains support for both UEFI and VBIOS in the ROM and OVMF will use that new entry point to initialize the card. There are some however some additional restrictions in going this route. First, the guest operating system needs to support UEFI. This means a relatively recent version of Linux, Windows 8+, or some of the newer Windows Server versions. A gaming platform is often the target for "enthusiast" use of GPU assignment, so Windows 8/8.1 is probably a good target if you can bear the user interface long enough to start a game. AFAIK, Windows 7 does not support UEFI natively, requiring the CSM (Compatibility Support Module) which I believe defeats the purpose of using UEFI. If one of these guests does not meet your needs, turn away now.

Next up, UEFI doesn't (yet) support the Q35 chipset model. In a previous post I showed Windows 7 happily running on a 440FX QEMU machine with both GeForce and Radeon graphics. The same is true for Windows 8/8.1 here. Linux isn't so happy about this though. The radeon driver in particular will oops blindly looking for an downstream port above the graphics card. You may or may not have better luck with Nvidia, neither nouveau or nvidia play nice with my GT635. The radeon driver problem may be fixable without Q35, but it needs further investigation. fglrx is untested. Therefore, if Windows 8/8.1 is an acceptable guest or you're willing to help test or make Linux guests work, let's move on, otherwise turn back now.

Ok, you're still reading, let's get started. First you need an OVMF binary. You can build this from source using the TianoCore EDK2 tree, but it is a massive pain. Therefore, I recommend using a pre-built binary, like the one Gerd Hoffmann provides. With Gerd's repo setup (or one appropriate to your distribution), you can install the edk2.git-ovmf-x64 package, which gives us the OVMF-pure-efi.fd OVMF image.

Next create a basic libvirt VM using virt-manage or your favorite setup too. We'll need to edit the domain XML before we install, so just point it to a pre-existing (empty) image or whatever gets it to the point that you can have the VM saved without installing it yet. Also, don't assign any devices just yet. Once there, edit the domain XML with virsh edit. To start, we need to add a tag on the first line to make libvirt accept QEMU commandline options. libvirt does not yet have support for natively handling the OVMF image, but it is under active development upstream, so these instructions may change quickly. Make the first line of the XML look like this:

The xmlns specification is the part that needs to be added. Next we need to add OVMF, just before the </domain> close at the end of the file, add something like this:

<qemu:commandline>

<qemu:arg value='-drive'/>

<qemu:arg value='if=pflash,format=raw,readonly,file=/usr/share/edk2.git/ovmf-x64/OVMF-pure-efi.fd'/>

</qemu:commandline>

Adjust the path as necessary if you're using a different installation. If you're using selinux, you may with to copy this file to /usr/share/qemu/ and run restorecon on it to setup permissions for QEMU to use it.

Save the XML and you should now be able to start the VM with OVMF. From here, we can do the rest of the VM management using virt-manager. You can add the GPU and audio as a standard PCI assigned device. If you remove the "Graphics" (ie. VNC/Spice) and "Video" (ie. VGA/QXL/Cirrus) devices from the VM, the assigned GPU will be the primary display.

If you are not assigning the audio function or otherwise need to use the host audio for the guest, I recommend using QXL+Spice. When the guest is running you can configure the QXL display off and use the connection only for sound.

UEFI purists will also note that I'm not providing a second flash drive for EFI variable storage. In simple configurations this is unnecessary, but if you find that you've lost your boot options, this is the reason why.

In the video below I'm running two separate Windows 8.1 VMs using OVMF and managed through libvirt. The one on the left is assigned a GeForce GT635, the one on the right a Radeon HD8570. The host processor is an i5-3470T (dual-core + threads) and each VM is given 2 vCPUs, exposed as a single core with 2 threads.

Astute viewers will note that I'm using the version of the Nvidia driver which requires the kvm=off QEMU options. There are two ways around this currently. The first is to add additional qemu:commandline options:

<qemu:arg value='-cpu'/>

<qemu:arg value='host,hv_time,kvm=off'/>

We can do this because QEMU doesn't explode at the second instance of -cpu on the commandline, but it's not an ideal option. The preferred method, and what I'm using, is support that is just going into libvirt which creates a new feature element for this purpose. The syntax there is:

<kvm>

</kvm>

</features>

This will add kvm=off to the set of parameters that libvirt uses.

Also note that while Radeon doesn't seem to need any of the devices quirks previously enabled with the x-vga=on option, Nvidia still does. This commit, which will soon be available in qemu.git, enable the necessary quirk anytime an Nvidia VGA device is assigned.

To summarize, for Radeon no non-upstream patches are required. For Nvidia, libvirt will soon be updated for the hidden feature above, until then use the additional commandline option and QEMU will soon be updated to always quirk Nvidia VGA devices, until then the referenced commit will need to be applied. Both cases currently need the qemu:commandline OVMF support, which should also be in libvirt soon.

In the video below, I'm running a stock Fedora 20 kernel. The IGD device is used by the host with full DRI support (as noted by glschool running in the backgroups... which I assume doesn't work when DRI is disabled by using the i915 patch). QEMU is qemu.git plus the above referenced Nvidia quirk enablement patch and libvirt is patched with this, both of which have been accepted upstream and should appear in the development tree at any moment. If you're running OVMF from a source other than above, make sure it includes commit b0bc24af from mid-June of this year. If it doesn't, OVMF will abort when it finds an assigned device with a PCIe capability. Also, you'll likely experience a long (~1 minute) delay on guest reboot. This is due to lack of MTRR initialization on reboot, patches have been accepted upstream and will be in qemu.git shortly.

Enjoy.