Robert Haas

Tuesday, July 02, 2013

MVCC Catalog Access

Some wag, riffing on Rudyard Kipling, once wrote that "if you can keep your head when all around you are losing theirs, maybe you just don't understand the situation". I thought of that line this morning, while committing a patch to use MVCC snapshots for system catalog access.

Query Planning Gone Wrong

Over the past few years, I've been making notes on pgsql-performance postings, specifically those postings which relate to query performance issues. Today, I gave a talk at PGCon on the data I've been able to gather.

If you attended the talk, please leave feedback through the PGCon web site or feel free to leave a comment below with your thoughts. If not, you can find the slides on my Presentations web page. A few people asked me to post the raw data on which the talk was based, including links to the original threads. I have created a Query Performance section on my Google Site and posted the information there.

The version posted on the web site incorporates a few minor corrections as compared to what I presented in the talk; and I have left out (for the sake of politeness) the cases I attributed to user error. There were actually only 2 such cases, not 3 as I said in the talk, but either way it seems more polite not to post specific links. Please contact me if you find other mistakes in what I have posted and I will correct them.

Many thanks to all those who said nice things about my talk!

Monday, April 01, 2013

Confronting The Big Issues

In the last few years, PostgreSQL has added a number of impressive new features, such as built-in replication (both synchronous and asynchronous), SQL/MED foreign tables (which, in the forthcoming 9.3 release, will also support writes), per-column collation support and in-place upgrade. There are have also been so fairly major performance improvements and a host of new smaller features. While all of this progress is impressive, it is clear that too little time and energy has been spent confronting the project's biggest weakness: its unpronounceable name.

As Wikipedia helpfully explains, PostgreSQL's ancestor was the INGRES project of the University of California at Berkeley. INGRES stands for "interactive graphics retrieval system", but hackers like plays on words, so after a decent amount of hacking on that code had been done, some wit thought it would be a fine idea to replace the prefix "in" with "post", to connote that the project had gone beyond Ingres. Furthermore, since the system was modified to support SQL, some other wit decided it would be a good idea to replace the last syllable (but only half of it) with the name of the query language it now supported. Fortunately, we're moving away from that, but that's how we ended up the name PostgreSQL which, as you can now understand, stands for "beyond the graphics retrieval structured query language".

Absurd Shared Memory Limits

Today, I fixed a problem. Or at least, I think I fixed it. Time will tell. But Thom Brown seems pretty happy, and so does Dan Farina. So let me tell you about it. Here's the executive summary: assuming the patch I committed today holds up, PostgreSQL 9.3 will largely eliminate the need to fiddle with operating system shared memory limits.

perf: the good, the bad, the ugly

For the past few months, I've spent a good deal of time using the Linux utility perf to profile PostgreSQL. On the whole, I've been favorably impressed. Before I started using perf, trying to profile generally meant using oprofile or gprof. Unfortunately, both of those tools have serious problems.

PostgreSQL 9.3 Development Schedule

At the PGCon 2012 Developer Meeting, we discussed the schedule for PostgreSQL 9.3 development. Here's what we agreed:

CommitFest #1: June 15, 2012 - July 15, 2012
CommitFest #2: September 15, 2012 - October 15, 2012
CommitFest #3: November 15, 2012 - December 15, 2012
Planning Week: December 8, 2012 - December 15, 2012
CommitFest #4: January 15, 2013 - February 15, 2013
Final Triage: February 1, 2013 - February 7, 2013

This is basically the same schedule we used for the PostgreSQL 9.2 development cycle, but we've added a "planning week" and a "final triage". The idea of the planning week, and also of the final triage week, is to identify which large patches still have a hope of getting done in time to be committed to PostgreSQL 9.3. In that way, if a patch has no chance of being ready for commit, people can stop working on it and focus their efforts on those that do. Hopefully, this will make the final CommitFest go a little more smoothly than it did this time, with less burn-out and a quicker denouement.

We also agreed on one significant change to the review process: if you submit a patch, you must also volunteer to review a patch. In earlier development cycles, we have been able to rely on volunteer reviewers to do much of the reviewing work, but that didn't work as well during the 9.2 cycle. Although I'm hopeful that this will ease some of the reviewing crunch, I'd also like to underscore the importance of volunteer reviewers. Reviewing is a great way to get involved in the development process without spending too much time on it, and often leads to significant improvements in design and usability as well as code quality. It also helps to ensure a variety of opinions about important patches, which usually leads to a stronger product overall, and, at least in my experience, it is also a great way to learn more about PostgreSQL.

Monday, April 09, 2012

Don't Take Me Too Seriously

My recent blog post showing linear read-scaling out to 64 cores has generated a lot of attention, which I frankly did not expect. To be sure, it's a cool result, but keep in mind that it's testing a certain very specific workload: very short transactions running very simple queries to do repeated primary key lookups on a single table. This, of course, is not an unimportant workload; in fact, you could argue that some of the NoSQL products on the market are basically purpose-built databases for exactly this kind of workload (sometimes coupled with a facility for durable or non-durable writes).

On the flip side, it's also not the only workload. The reason why PostgreSQL 9.2devel so completely stomps the performance of earlier releases on this test is that the transactions are very, very short. If you ran the same test, but with BEGIN before the first query and END after the last one, or if you striped the queries across multiple tables, you would eliminate the lock manager contention that holds performance on PostgreSQL 9.1 in check on this test.

For most people, therefore, there's probably no reason to panic just because you're running one of the existing releases. You may get better performance on PostgreSQL 9.2, when it's released, but chances are that if you had the sort of workload for which these changes make a truly dramatic difference, you wouldn't have picked PostgreSQL in the first place. What I think is exciting about these changes is not so much that existing users will see huge performance benefits (although some will; we have a lot of good changes in this release) but that PostgreSQL will become usable in environments where it currently can't compete.

Of course, that's not to say that we're going to put memcached out of business. There will probably always be cheaper alternatives to an RDBMS if the only work you need to do is primary key lookups and stores, and especially if you don't need durability. But many people need good performance on large numbers of simple queries and additionally need the ability to do some more complex processing, and I'm hopeful that these scalability changes will make it much simpler to deploy PostgreSQL effectively in such environments.

Although for most people there's no huge rush to upgrade, if you're running PostgreSQL 8.3.x or older, it's time to think hard about getting onto a newer version. Community support for PostgreSQL 8.3.x will end in Q1 of next year. If you're running anything older than that, it's already unsupported; moreover, every release from 7.4 through 8.3 featured major performance improvements.