|
|
Subscribe / Log in / New account

How many page flags do we really have?

By Jonathan Corbet
June 3, 2009
The recently-discussed kernel memory sanitization patch was criticized on a number of points, one of which was its use of a dedicated page flag. Andi Kleen's HWPOISON patch (enabling upcoming Intel CPU features for dealing with memory errors) have run into trouble on similar grounds. The desperate shortage of page flags has been an article of faith among kernel developers for years. But, interestingly, not everybody agrees that a problem exists, and almost nobody can answer the simple question of how many flags are available in the first place. So a look at the Linux page flags issue seems in order.

"Page flags" are simple bit flags describing the state of a page of physical memory. They are defined in <linux/page-flags.h>. Flags exist to mark "reserved" pages (kernel memory, I/O memory, or simply nonexistent), locked pages, those under writeback I/O, those which are part of a compound page, pages managed by the slab allocator, and more. Depending on the target architecture and kernel configuration options selected, there can be as many as 24 individual flags defined.

These flags live in the flags field of struct page. This field is declared to be an unsigned long, so one might think that figuring out how much space is left for new flags would be a straightforward task. To a casual observer, it would look like, on a 32-bit system, 24 flags have been used, leaving eight available:

[Page
flags]
In other words, the situation is starting to get tight, but it is not a crisis quite yet.

But little is straightforward when it comes to struct page. One of these structures exists for every physical page in the system; on a 4GB system, there will be one million page structures. Given that every byte added to struct page is amplified a million times, it is not surprising that there is a strong motivation to avoid growing this structure at any cost. So struct page contains no less than three unions and is surrounded by complicated rules describing which fields are valid at which times. Changes to how this structure is accessed must be made with great care.

Unions are not the only technique used to shoehorn as much information as possible into this small structure. Non-uniform memory access (NUMA) systems need to track information on which node each page belongs to, and which zone within the node as well. Rather than add fields to struct page, the NUMA hackers grabbed the free bits at the top of the flags field, yielding something like this:

[Page
flags]

So, on a 32-bit system with 24 page flags defined (a pessimistic scenario), there are eight bits available for the node and zone information, practically limiting 32-bit NUMA systems to 64 nodes, which is almost certainly adequate. But the addition of more page flags would come at the cost of supporting fewer NUMA nodes, and that would be unwelcome.

Things get worse on systems with complicated physical memory layouts. On such systems, memory is not organized into a single, continuous range of physical addresses; instead, it is spread out with holes in the middle. Memory management on these "sparse memory" systems requires that each page have a "section" number associated with it. That section number is stored - you guessed it - in the spare bits at the top of the flags field. If space gets too tight, the kernel will move the node number into a separate array, slowing things down in the process. Either way, it seems clear that there is not a whole lot of spare room in the flags field on these systems.

So the real answer to "how many page flags are free?" is, for all practical purposes, "zero," at least on 32-bit NUMA systems. Making room for more would require expanding struct page, which is a heavy cost to pay. Developers should, thus, not be surprised when proposals to use new page flags run into stiff opposition. It's only one bit, but that bit is in the middle of some of the most sought-after real estate in the entire kernel.

In the case of Andi's HWPOISON patch, this opposition has come in the form of a number of alternative suggestions. One was to simply use the "reserved" bit, but that could lead to difficulties in parts of the code where that usage is not expected. Then it was suggested that the combination of the "reserved" and "writeback" flags could indicate a poisoned page, but Andi claims that this approach cannot work. Andrew Morton has suggested that HWPOISON could be made into a 64-bit-only feature; Andi allows as to how that might be possible, but he clearly doesn't like the idea.

Instead, Andi takes the position that the page flag shortage does not really exist. It's not a problem at all on 64-bit systems, where unsigned long is twice as wide. The number of 32-bit systems with a large number of NUMA nodes is small and shrinking; it's not something that the developers need be concerned about. And, says Andi, if things get really bad, the sparse memory section number can be moved into a separate array like the NUMA node number. Given this view of the problem, worries about adding a useful new feature over concerns about a single page flag bit seem misplaced.

Nobody has challenged Andi's view that the problem is not as severe as most people think, though Andrew Morton has hinted that Andi should go ahead and prove his ideas about moving the section number out of the page structure. That might not be a bad idea. Even if page flags are a little more abundant than most developers think, it still is not hard to foresee a time when they are exhausted, at least on 32-bit systems. Proposals involving new page flags are not particularly rare; unless we want to restrict features needing page flags to 64-bit systems, we'll need to make some more flags available before too long.

Index entries for this article
KernelMemory management/struct page


to post comments

How many page flags do we really have?

Posted Jun 4, 2009 14:17 UTC (Thu) by nlucas (subscriber, #33793) [Link]

Andi position makes sense. In retrospect it seems the really bad idea was to put the NUMA bits on the page flags.

Off course it's always easy to say this things latter than earlier.

BUG()

Posted Jun 5, 2009 9:01 UTC (Fri) by k3ninho (subscriber, #50375) [Link] (2 responses)

At the end of the third-last paragraph, I can't parse the last sentence:

>Andrew Morton has suggested that HWPOISON could be made into a 64-bit-only feature; Andi allows as to how that might be possible, but he clearly doesn't like the idea.

How should 'Andi allows as to how that might be possible' read?

BUG()

Posted Jun 5, 2009 21:22 UTC (Fri) by samlh (subscriber, #56788) [Link] (1 responses)

'Andi says that might be possible'

BUG()

Posted Jun 6, 2009 2:39 UTC (Sat) by xanni (subscriber, #361) [Link]

Not quite. The meaning is closer to "Andi accepts the argument that it might be possible", perserving the fact that it is not Andi who originates the argument.

How many page flags do we really have?

Posted Jun 7, 2009 1:12 UTC (Sun) by giraffedata (guest, #1954) [Link] (2 responses)

Why are people willing to accept 4 bytes per physical page waste on a 64 bit system, but not on a 32 bit system?

How many page flags do we really have?

Posted Jun 7, 2009 9:33 UTC (Sun) by farnz (subscriber, #17727) [Link]

One difference is the size of lowmem. On a 32-bit system, those 4 bytes per page all come out of the first 896MB of physical RAM. On a 64-bit system, they come out of your total RAM. It's a lot easier to accept waste when it's not in such a tightly restricted block of memory; we already have problems with running low on lowmem on 32-bit systems with more than around 8GB RAM, which are best solved by going to 64-bit.

How many page flags do we really have?

Posted Jul 10, 2009 17:45 UTC (Fri) by sethml (guest, #8471) [Link]

Yeah, I was left wondering why the page flags are an unsigned long rather than a uint32_t?
Surely in such a space constrained data structure it's best to use only as many bits as
necessary?


Copyright © 2009, Eklektix, Inc.
This article may be redistributed under the terms of the Creative Commons CC BY-SA 4.0 license
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds