Wednesday, December 08, 2021

Surviving Without A Superuser - Part One

PostgreSQL users and developers are generally aware that it is best to minimize the number of tasks performed as superuser, just as at the operating system level most Linux and UNIX users are aware that it's best not to do too many things as root. For that reason, PostgreSQL has over the last few years introduced a number of predefined roles that have special privileges and which in some case can be used in place of the superuser role. For instance, the pg_read_all_data role, new in version 14, has the ability to read all data in every table in the database - not only the tables that currently exist, but any that are created in the future. In earlier versions, you could achieve this effect only by handing out superuser permissions, which is not great, because the superuser role can do much more than just read all the data in the database. The new predefined role allows for a very desirable application of the principle of least privilege.

Monday, November 29, 2021

Collation Stability

When PostgreSQL needs to sort strings, it relies on either the operating system (by default) or the ICU collation library (if your PostgreSQL has been built with support for ICU and you have chosen to use an ICU-based collation) to tell it in what order the strings ought to be sorted. Unfortunately, operating system behaviors are confusing and inconsistent, and they change relatively frequently for reasons that most people can't understand. That's a problem for PostgreSQL users, especially PostgreSQL users who create indexes on text columns. The first step in building a btree index to sort the data, and if this sort order differs from the one used for later index lookups, data that is actually present in the index may not be found, and your queries may return wrong answers.

Monday, June 21, 2021

Talking about the PostgreSQL Optimizer at CMU

Professor Andy Pavlo, at CMU, seems to be a regular organizer of technical talks about databases; this year, he organized the vaccination database tech talks, and invited me to give one about the PostgreSQL query optimizer. So I did. It was great. There were a few PostgreSQL community members present, but more importantly, a bunch of smart people who know a lot about other database systems showed up to the talk, including Andy Pavlo himself, and I got some feedback on where PostgreSQL could perhaps be improved.  Here are the highlights, with links to the relevant portion of the YouTube video.

Tuesday, December 15, 2020

CVE-2019-9193

There's a new article out in Computer Weekly talking about CVE-2019-9193. The PostgreSQL project has issued a statement saying that this is not a security vulnerability, and PostgreSQL core team member Magnus Hagander also wrote a blog about it, saying the same thing. If you're curious about this issue, I suggest reading not only what Magnus wrote but also the comments section of that blog post, where you can see some of the perspectives that other people have on what Magnus said. But, in this blog post, I'd like to comment a bit on what is said in the Computer Weekly article: is there any truth to the allegations offered there?

Monday, May 11, 2020

Don't Manually Modify The PostgreSQL Data Directory!

I was lucky enough to get a chance to give my talk Avoiding, Detecting, and Recovering From Data Corruption at PGCONF.IN in February, before everything got shut down. The conference organizers did an amazing job with the video, which shows both me speaking and the slides I was presenting side by side. That's the first time a PostgreSQL conference has done the video that way, and I love it. One of the points that I raised in that talk was that you should not manually modify the contents of the PostgreSQL data directory in any way. To my surprise, the most frequent question that I was asked after giving the talk was "Really? What if I do XYZ?"

Tuesday, May 05, 2020

Who Contributed to PostgreSQL Development in 2019?

This is my fourth annual post on who contributes to PostgreSQL development. See previous posts in this series for methodology. I calculate that this year, 189 people were primary authors of at least one PostgreSQL commit. 37 of those people accounted for 90% of the new lines, and 12 people accounted for 66% of the new lines. In total, there were 2127 commits by 26 committers. The work of committing patches written by someone other than the committer was principally shared by 5 committers who committed 66% of the lines of non-self-authored code; 10 committers accounted for 90% of the lines of non-self-authored code.

Thursday, February 13, 2020

Useless Vacuuming

In previous blog posts that I've written about VACUUM, and I seem to be accumulating an uncomfortable number of those, I've talked about various things that can go wrong with vacuum, but one that I haven't really covered is when autovacuum seems to be running totally normally but you still have a VACUUM problem. In this blog post, I'd like to talk about how to recognize that situation, how to figure out what has caused it, how to avoid it via good monitoring, and how to recover if it happens.