Robert Haas

Thursday, March 15, 2012

Tuning shared_buffers and wal_buffers

I spend a lot of time answering questions about PostgreSQL, and one of the questions I get asked frequently is: how should I set shared_buffers? And, a bit less often, how should I set wal_buffers? I've got canned answers that I can rattle off so fast it'll make your head spin. Exceptions to my canned answers keep popping up, and it's starting to get hard to give an answer that actually captures all the complexity in this area, so here's a longer explanation.

Security Barrier Views

People sometimes want to use PostgreSQL to implement row-level security, and historically it has not been very easy to do that securely. You can try to do it by creating a view which exposes just some of the rows in the underlying table, and grant access to the view but not the underlying table to the user to whom you wish to expose the data; but this turns out not to be secure. Consider the following example.

First Results for Write Performance on IBM POWER7

In a previous blog post, I posted some SELECT-only pgbench results on IBM POWER7, and promised to post read-write results when I had them. That took a little longer than expected due to periodic lock-ups on the machine, which seem to have been resolved by a kernel update (thanks to Brent Baude at IBM for some timely help with this issue). But now I have some.

The Perils of Collation-Aware Comparisons

If you use psql's \l command to list all the databases in your PostgreSQL cluster, you'll notice a column in the output labelled "Collate"; on my system, that column has the value "en_US.UTF-8". This means that when, for example, you sort strings, you'll use the "English" sort order rather than the traditional "C" sort order. You might ask: what's the difference?

In the "C" sort order, all capital letters come before all lower-case letters, whereas in en_US.UTF8, a comes before A which comes before b which comes before B, and so on. In other words, every collation can have its own rules for sorting strings, consistent with the way that the people who speak that language like to alphabetize things; or at least with the way that the people who wrote the locale definitions for that language think that they like to alphabetize things. Collations are OS-dependent: some operating systems don't support them at all, while Windows has a completely different naming convention from every other operating system, and probably different behaviors as well.

The Git Workflow

When the PostgreSQL project decided to migrate to git, we decided not to allow merge commits. A number of people made comments, in a number of different fora, to the effect that we weren't following "the git workflow". While a few commentators seemed to think that was reasonable, many thought that it demonstrated our ignorance and stupidity, and some felt it outright heresy.

So I noted with some interest Julio Hamano's blog post about the forthcoming release of git 1.7.10, which is slated to include a change to the way that merge commits work: users will now be prompted to edit the commit message, rather than just accepting a default one. Actually, what I found most interesting where Linus Torvalds' comments on this change, particularly where he says this: "This change hopefully makes people write merge messages to explain their merges, and maybe even decide not to merge at all when it's not necessary." His comments are quoted more fully in the above-linked blog article; unfortunately I don't know how to link directly to his Google+ post. And Julio Hamano makes this remark: "Merging updated upstream into your work-in-progress topic without having a good reason is generally considered a bad practice. [...] Otherwise, your topic branch will stop being about any particular topic but just a garbage heap that absorbs commits from many sources, both from you to work on a specific goal, and also from the upstream that contains work by others made for random other unfocused purposes."

Performance and Scalability on IBM POWER7

I recently had a chance to run some benchmarks of PostgreSQL 9.2devel on an IBM POWER7 machine provided by IBM and hosted by Oregon State University's Open Source Lab. I'd like to run more tests, but for a first pass I decided to run a SELECT-only pgbench test at various client counts and scale factors. This basically measures how quickly we can do primary key lookups in a completely read-only environment. This generated a pretty graph. Here it is.

My Patches Are Breeding

One of the great things about being a long-term contributor to an open source project like PostgreSQL is that you get to see other people take the stuff you've done and use it as a stepping stone to bigger and better things. One of my early PostgreSQL hacking projects was a patch to extend the syntax of EXPLAIN. Prior to 9.0, the grammar looked like this:

EXPLAIN [ ANALYZE ] [ VERBOSE ] statement

Now, there's nothing particularly wrong with that grammar from a usability perspective, but it turns out to be pretty terrible for extensibility. Let's suppose we want to add a new EXPLAIN option that does something new and different - say, omit the costing information from the output. Then we have to change the grammar to something like this:

EXPLAIN [ ANALYZE ] [ VERBOSE ] [ NOCOSTS ] statement

There are a couple of problems with this. One is that, as the number of options increases, it gets hard to remember the order in which they must be specified. You might think that it would be easy enough to recode the grammar to look like this:

EXPLAIN [ ANALYZE | VERBOSE | NOCOSTS ]... statement

...and it might be, but it's surprisingly easy, when using bison, to create situations that bison finds ambiguous, even though a human being might not. Another problem is that NOCOSTS has to become what's called a keyword, which has a very small but nonzero distributed cost across our entire grammar. Rightly or wrongly, a number of patches that proposed to enhance EXPLAIN in various ways got shot down because of these issues.

Robert Haas

Thursday, March 15, 2012

Tuning shared_buffers and wal_buffers

Wednesday, March 14, 2012

Security Barrier Views

Monday, March 12, 2012

First Results for Write Performance on IBM POWER7

Monday, March 05, 2012

The Perils of Collation-Aware Comparisons

Friday, March 02, 2012

The Git Workflow

Thursday, March 01, 2012

Performance and Scalability on IBM POWER7

Wednesday, February 08, 2012

My Patches Are Breeding