|
|
Subscribe / Log in / New account

4.6 Merge window part 2

By Jonathan Corbet
March 23, 2016
As of this writing, Linus has pulled 11,118 non-merge changesets into the mainline repository for the 4.6 development cycle; just over 10,000 of those came since last week's summary. As can be seen, the flow of patches can no longer be described as "slow." A number of significant features can be found in that flood of patches.

The most notable user-visible changes include:

  • Support for memory protection keys has been merged. This is an Intel feature allowing user space to partition its memory into zones and apply additional access restrictions to each. The system calls for the manipulation of memory protection keys have not yet been merged; they are waiting for a bit more review. But keys will be used by the kernel in 4.6 to implement truly execute-only memory that cannot be read by the executing process.

  • Control groups are now namespace-aware; there is a new CLONE_NEWCGROUP flag to clone() to create a process in a new control-group namespace. See this patch for documentation on this new feature. The control-group filesystem can also now be mounted within user namespaces.

  • The preadv2() and pwritev2() system calls, which take an extra "flags" argument, have finally been merged, allowing for the addition of new functionality. The first flag is RWF_HIPRI, which enables the use of polling for a high-priority request.

  • Page poisoning has traditionally been a kernel debugging feature; it fills freed pages with a special pattern that is easy to spot when looking for things that went wrong. In 4.6, poisoning can be enabled independently of the debugging options, and the "poison" value can be set to zero; this results in pages being simply cleared when they are freed. This behavior, inspired by the grsecurity/PaX patches, reduces the chances of the kernel leaking sensitive data.

  • The memory-management subsystem's thrash-detection code has never worked properly within control groups; that has been rectified. The result should be better behavior when specific control groups are experiencing memory pressure.

  • The integrity measurement architecture (IMA) subsystem now requires that its policy be signed, and the integrity of that policy is measured prior to loading.

  • The ARM64 architecture now supports the "user access override" feature found in ARMv8.2. It allows user space to be accessed (by the kernel) using ordinary unprivileged instructions that check the owning process's permissions in the normal way. That, in turn, offers extra protection against the kernel being fooled into accessing memory it shouldn't.

  • ARM64 also now supports kernel address-space layout randomization.

  • The kernel's representation of general-purpose I/O (GPIO) devices has been massively reworked; the gpio_chip structure is a proper device within the device model now. There is a new ABI for getting information about the GPIOs on the system, but some work remains to be done. As Linus Walleij noted: "We can now discover GPIOs properly from userspace. We still have not come up with a way to actually *use* GPIOs from userspace." See tools/gpio/lsgpio.c for an example of the new ABI; note that the old sysfs-based ABI is now considered obsolete (even though it has not yet been completely replaced).

  • A process's timer slack value — the amount by which timer requests may be delayed to cause them to coincide with others — can now be seen and modified via /proc/PID/timerslack_ns.

  • The extended BPF virtual machine now implements per-CPU maps for high-speed statistics collection. There is also a new map type to store stack traces.

  • There is a new network-control API called "devlink," intended for the setting of various parameters that are not related to any specific device class. This protocol is, naturally, undocumented; some information can be found in this merge changelog.

  • The kernel connection multiplexer, which allows for certain types of higher-level protocol handling in the kernel, has been merged.

  • A number of network-oriented sysctl knobs (tcp_syn_retries, tcp_synack_retries, tcp_syncookies, tcp_reordering, tcp_retries1, tcp_retries2, tcp_orphan_retries, tcp_fin_timeout, tcp_notsent_lowat, igmp_max_memberships, igmp_max_msf, igmp_llm_reports, and igmp_qrv) have been made network-namespace aware, so that different namespaces can have different values.

  • The "local checksum offload" mechanism (described in this article) has been merged. Local checksum offload speeds checksum calculations, making tunneled protocol implementations faster. See Documentation/networking/checksum-offloads.txt for more information.

  • Netlink support over shared memory segments has been removed; it has never worked correctly and there does not appear to be any user-space code using it.

  • A couple of new filesystem ioctl() commands (Q_GETNEXTQUOTA and Q_XGETNEXTQUOTA) have been added to enable efficient iteration through all of the disk quotas on a filesystem.

  • The Btrfs filesystem has a new mount option, nologreplay, which prevents the replaying of the log tree; this can be used with ro to obtain a truly read-only mount. The new mount option usebackuproot is meant to replace the existing recovery option.

  • New hardware support includes:

    • Audio: Maxim MAX9867 and max98926 codecs, Realtek RT5514 codecs, AMD audio coprocessors, and Allwinner A10 S/PDIF controllers.

    • GPIO: WinSystems WS16C48 GPIO controllers, ACCES 104-DIO-48E GPIO controllers, Technologic TS-4800 FPGA GPIO controllers, TI TPIC2810 8-Bit I2C GPO expanders, TI TPS65218 GPIO controllers, TI TPS65086 GPO controllers, and MEN 16Z127 GPIO controllers.

    • Input: BYD BTP10463 touchpads, MELFAS MIP4 touchscreens, Freescale i.MX25 integrated touchscreens, and numerous devices using the Synaptics "Register Mapped Interface" protocol.

    • Media: TI "camera adaptation layer" capture engines.

    • Miscellaneous: Xilinx NWL PCIe controllers, Cavium ThunderX PEM PCIe host controllers, Microchip PIC32 random number generators, ST Microelectronics adjunct processors, Qualcomm HIDMA DMA engines, Active-semi ACT8945A charger controllers, NXP LPC18XX EEPROM memory, BCM2835 auxiliar mini UARTs, Marvell EBU serial ports, Moxa SmartIO MUE multiport serial cards, AT91 SAMA5D2 analog to digital converters (ADCs), Texas Instruments ADC0831/ADC0832/ADC0834/ADC0838 ADCs, Texas Instruments ADS1015 ADCs, Freescale MX25 ADCs, Analog Devices AD5761/61R/21/21R digital to analog converters, Freescale MPL115A1 pressure sensors, Atlas Scientific pH-SM sensors, TI AFE4404 heart rate and pulse oximeter sensors, TI AFE4403 heart rate monitors, TI TPS65086 power management integrated chips, APM SoC X-Gene SLIMpro mailbox controllers, Rockchip SoC integrated mailboxes, Hisilicon Hi6220 mailboxes, ARM high-definition color LCD controllers, Microchip PIC32MZDA SDHCI controllers, and MediaTek M4U I/O memory-management units.

    • Network: MediaTek MT7623 Gigabit Ethernet controllers and Intel Ethernet X722 iWARP cards.

    • USB: Rockchip EMMC PHYs and Rockchip DisplayPort PHYs.

    • Watchdog: Intel MEI iAMT watchdogs, National Instruments 903x/913x watchdog timers, WinSystems EBC-C384 watchdog timers, and ARM SBSA generic watchdogs.

Changes visible to kernel developers include:

  • The compile-time stack validation patches have been merged, providing a tool that ensures that the call stack will always be valid. The result will be more reliable stack traces for developers; this feature is also needed for the further development of the live-patching mechanism.

  • There is a new function for freeing a set of objects:

        void kfree_bulk(size_t size, void **objects);
    

    It differs from kmem_cache_free_bulk() in that there is no pointer to a kmem_cache structure, meaning that objects from multiple slabs can be freed together. There is a cost to doing things this way, though, so kmem_cache_free_bulk() is preferred in cases where it is applicable.

  • The cpufreq subsystem, charged with setting CPU frequencies to match the current system load, has seen some significant changes. In current kernels, it uses timers to periodically sample the load on the CPU and, perhaps, make changes. As of 4.6, instead, the cpufreq governors will be called directly from the scheduler when things change, eliminating the timers. Eventually the governors will also use the projected load information from the scheduler to make (hopefully) better decisions, but that is work for a future development cycle.

  • sscanf() now has basic support for matching sets of characters using the %[ operator (e.g. "%[abc]" to match any of abc). Only literal sets can be matched; there is, for example, no special meaning for "-" within a character set.

  • The new dtx_diff tool, in the scripts/dtc directory, can calculate the differences between device trees in a number of formats.

  • The generic code supporting encrypted filesystems has been moved into the VFS layer (in fs/crypto) so that it can be used beyond the ext4 and f2fs filesystems.

  • The I2C subsystem has a new pin-controller-based bus demultiplexor allowing runtime selection between multiple I2C controllers. See i2c-demux-pinctrl.txt for an overview.

  • The "kcov" kernel code-coverage analyzer has been merged; it can be useful to ensure that fuzzing and other testing efforts have exercised as much code as possible. See Documentation/kcov.txt for more information.

At this point, it would appear that the bulk of the changes for this development cycle have been merged. The merge window will likely stay open through March 27, though, so one never knows whether something else of interest might turn up. Next week's Kernel Page will summarize any significant changes that appear at the tail end of the 4.6 merge window.

Index entries for this article
KernelReleases/4.6


to post comments


Copyright © 2016, Eklektix, Inc.
This article may be redistributed under the terms of the Creative Commons CC BY-SA 4.0 license
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds