Robert Haas

Tuesday, March 31, 2015

PostgreSQL Shutdown

PostgreSQL has three shutdown modes: smart, fast, and immediate. For many years, the default has been "smart", but Bruce Momjian has just committed a patch to change the default to "fast" for PostgreSQL 9.5. In my opinion, this is a good thing; I have complained about the current, and agreed with others complaining about it, many times, at least as far back as December of 2010. Fortunately, we now seem to have now achieved consensus on this change.

Parallel Sequential Scan for PostgreSQL 9.5

Amit Kapila and I have been working very hard to make parallel sequential scan ready to commit to PostgreSQL 9.5. It is not all there yet, but we are making very good progress. I'm very grateful to everyone in the PostgreSQL community who has helped us with review and testing, and I hope that more people will join the effort. Getting a feature of this size and complexity completed is obviously a huge undertaking, and a significant amount of work remains to be done. Not a whole lot of brand-new code remains to be written, I hope, but there are known issues with the existing patches where we need to improve the code, and I'm sure there are also bugs we haven't found yet.

Parallelism Update

It's been over a year since I last blogged about parallelism, so I think I'm past due for an update, especially because some exciting things are happening.

First, Amit Kapila has published a draft patch for parallel sequential scan. Many things remain to be improved about this patch, which is neither as robust as it needs to be nor as performant as we'd like it to be nor as well-modularized as it really should be. But it exists, and it passes simple tests, and that is a big step forward. Even better, on most of Amit's tests, it shows a very substantial speed-up over a non-parallel sequential scan.

Memory Matters

Database performance and hardware selection are complicated topics, and a great deal has been written on that topic over the years by many very smart people, like Greg Smith, who wrote a whole book about PostgreSQL performance. In many cases, the answers to performance questions require deep understanding of software and hardware characteristics and careful study and planning.

But sometimes the explanation is something very simple, such as "you don't have enough memory".

Linux disables vm.zone_reclaim_mode by default

Last week, Linus Torvalds merged a Linux kernel commit from Mel Gorman disabling vm.zone_reclaim_mode by default. I mentioned that this change might be in the works when I blogged about attending LSF/MM and again when I blogged about how the page cache may not behave quite the way we want even with vm.zone_reclaim_mode disabled.

For those who haven't read previous discussion on this topic, either on my blog, on pgsql-performance, or elsewhere around the Internet, enabling vm.zone_reclaim_mode can cause a lot of problems for applications, such as PostgreSQL, that make use of more page cache than will fit on a single NUMA node. Pages may get evicted from memory in preference to using memory on other nodes, effectively resulting in a page cache that is much smaller than available free memory. See the second of the two blog posts linked above for more details.

PostgreSQL isn't the only application that suffers from non-zero values of this setting, so I think a lot of people will be happy to see this change merged (like the guy who said that this setting is the essence of all evil). It will doubtless take some time for this to make its way into mainstream Linux distributions, but getting the upstream change made is the first step. Thanks to Mel Gorman for pursuing this.

Tuesday, May 13, 2014

Troubleshooting Database Corruption

When your database gets corrupted, one of the most important things to do is figure out why that happened, so that you can try to ensure that it doesn't happen again. After all, there's little point in going to a lot of trouble to restore a corrupt database from backup, or in attempting to repair the damage, if it's just going to get corrupted again. However, there are times when root cause analysis must take a back seat to getting your database back on line.

Why The Clock is Ticking for MongoDB

Last month, ZDNet published an interview with MongoDB CEO Max Schireson which took the position that the document databases, such as MongoDB, are better-suited to today's applications than traditional relational databases; the title of the article implies that the days of relational databases are numbered. But it is not, as Schireson would have us believe, that the relational database community is ignorant of or has not tried the design paradigms which he advocates, but that they have been tried and found, in many cases, to be anti-patterns. Certainly, there are some cases in which the schemaless design pattern that is perhaps MongoDB's most distinctive feature is just the right tool for the job, but it is also misleading to think that such designs must use a document store. Relational databases can also handle such workloads, and their capabilities in this area are improving rapidly.

Robert Haas

Tuesday, March 31, 2015

PostgreSQL Shutdown

Wednesday, March 18, 2015

Parallel Sequential Scan for PostgreSQL 9.5

Monday, December 22, 2014

Parallelism Update

Wednesday, August 06, 2014

Memory Matters

Tuesday, June 10, 2014

Linux disables vm.zone_reclaim_mode by default

Tuesday, May 13, 2014

Troubleshooting Database Corruption

Wednesday, April 16, 2014

Why The Clock is Ticking for MongoDB