Last week, a blog post by an Uber engineer explained why Uber chose to move from PostgreSQL to MySQL. This article was widely reported and discussed within the PostgreSQL community, with many users and
developers expressing the clear sentiment that Uber had indeed touched
on some areas where PostgreSQL has room for improvement. I share that
sentiment. I believe that PostgreSQL is a very good database, but I
also believe there are plenty of things about it that can be improved.
When users - especially well-known names like Uber - explain what did
and did not work in their environment, that helps the PostgreSQL
community, and the companies which employ many of its active
developers, figure out what things are most in need of improvement. I'm happy to see the PostgreSQL community, of which I am a member, reacting to this in such a thoughtful and considered way.
Showing posts with label mysql. Show all posts
Showing posts with label mysql. Show all posts
Tuesday, August 02, 2016
Tuesday, April 19, 2011
PostgreSQL East, and The MySQL Conference and Expo
Last month, I attended (and spoke at) PostgreSQL East in New York City, which this year featured a MongoDB track. This past week, I was in Santa Clara at the O'Reilly MySQL Conference & Expo, which had a substantial PostgreSQL track this year, where I also spoke.
Both conferences had some very good talks. The first talk I attended at the MySQL conference turned out to be one of the best - it was entitled Linux and H/W optimizations for MySQL. I had a little difficulty understanding Yoshinori Matsunobu's accent at times, but the slides were excellent, and very detailed. Some of his more interesting findings: (1) SSDs speed things up both on the master and on replication slaves, but the speedup is larger on the slaves; so it's useful to put hard disks on the master and SSDs on the slave to make it possible for single-threaded recovery there to keep up with the master; (2) while SSDs are much faster for random access, they are actually slower for sequential access and fsync, so a RAID array with a battery-backed or flash-backed write cache may still be a better option in those cases, (3) Fusion I/O drives were FAR faster than Intel drives, (4) the Intel Nehalem architecture was much faster than the AMD Opteron architecture when used in combination with SSDs, and (5) HyperThreading helps more in SSD environments than it does otherwise, because the system, overall, becomes more heavily CPU-bound; for the same reasons, mutex contention hurts more.
Another very good talk was Peter Zaitsev's discussion of Innodb and XtraDB Architecture and Performance Optimization, which gave me the feeling of looking into a sort of carnival mirror, where you recognize yourself, but it's all distorted. Two of the problems that give PostgreSQL DBAs heartburn - bloat, and checkpoint I/O spikes (and less frequently, purge not keeping up a la vacuum not keeping up) - are apparently problems for MySQL as well, though with significantly different details. I'm not even going to attempt to summarize the differences, or say which problem is worse or occurs more often, because I honestly have no idea. I was a bit surprised to hear dump-and-reload recommended to recover from certain worst-case scenarios, though.
There were other good talks, too, which helped me understand what's going on in the world of MySQL forks. Apparently, the Drizzle team is busy removing features that they consider half-baked and modularizing the code so that it is easier to understand and improve, while the MariaDB team is busy adding optimizer features, including support for hash joins and persistent statistics. From what I understand, the MySQL optimizer has typically worked by gathering statistics through on-the-fly index probes, which can be a problem in some situations. It's not so easy to categorize the work that Oracle is doing, but it seems to involve a fair amount of filing down of rough edges, and various improvements to replication, including, perhaps most significantly, parallel replication apply.
At PostgreSQL East, I think my favorite talk was Ken Rosensteel's talk, somewhat misleadingly titled Large Customers Want PostgreSQL, Too. This talk turned to be about migrating a large Oracle mainframe application to use PostgreSQL, and the challenges faced during that migration. He, or his team, built an Oracle-to-PostgreSQL converter for stored procedures; it was interesting to see that they got bitten by our bizarre casting rules around the smallint data type. They also ended up doing some very interesting work optimizing the performance of ECPG for small FETCH statements; these are areas of the code that I think don't normally get a lot of attention, and it was great to hear about the optimization work that got done.
I was disappointed that Jon Hoffman's talk on Experiences with Postgres and MongoDB at foursquare.com got cancelled; I think that would have been an interesting talk. I did have an opportunity to attend Jake Luciani's talk Comparing the Apache Cassandra Architecutre to PostgreSQL, which turned out to be more about Cassandra than PostgreSQL, but was nevertheless interesting. I would have been interested to hear a more technical talk, though, about how problems like distributed serialization anomalies and distributed checkpointing are handled.
Next month, I'll be speaking at PGCon 2011 on Using The PostgreSQL System Catalogs and How To Get Your PostgreSQL Patch Accepted. And after that, unlike Bruce, I'm going to stay home for a few months!
Both conferences had some very good talks. The first talk I attended at the MySQL conference turned out to be one of the best - it was entitled Linux and H/W optimizations for MySQL. I had a little difficulty understanding Yoshinori Matsunobu's accent at times, but the slides were excellent, and very detailed. Some of his more interesting findings: (1) SSDs speed things up both on the master and on replication slaves, but the speedup is larger on the slaves; so it's useful to put hard disks on the master and SSDs on the slave to make it possible for single-threaded recovery there to keep up with the master; (2) while SSDs are much faster for random access, they are actually slower for sequential access and fsync, so a RAID array with a battery-backed or flash-backed write cache may still be a better option in those cases, (3) Fusion I/O drives were FAR faster than Intel drives, (4) the Intel Nehalem architecture was much faster than the AMD Opteron architecture when used in combination with SSDs, and (5) HyperThreading helps more in SSD environments than it does otherwise, because the system, overall, becomes more heavily CPU-bound; for the same reasons, mutex contention hurts more.
Another very good talk was Peter Zaitsev's discussion of Innodb and XtraDB Architecture and Performance Optimization, which gave me the feeling of looking into a sort of carnival mirror, where you recognize yourself, but it's all distorted. Two of the problems that give PostgreSQL DBAs heartburn - bloat, and checkpoint I/O spikes (and less frequently, purge not keeping up a la vacuum not keeping up) - are apparently problems for MySQL as well, though with significantly different details. I'm not even going to attempt to summarize the differences, or say which problem is worse or occurs more often, because I honestly have no idea. I was a bit surprised to hear dump-and-reload recommended to recover from certain worst-case scenarios, though.
There were other good talks, too, which helped me understand what's going on in the world of MySQL forks. Apparently, the Drizzle team is busy removing features that they consider half-baked and modularizing the code so that it is easier to understand and improve, while the MariaDB team is busy adding optimizer features, including support for hash joins and persistent statistics. From what I understand, the MySQL optimizer has typically worked by gathering statistics through on-the-fly index probes, which can be a problem in some situations. It's not so easy to categorize the work that Oracle is doing, but it seems to involve a fair amount of filing down of rough edges, and various improvements to replication, including, perhaps most significantly, parallel replication apply.
At PostgreSQL East, I think my favorite talk was Ken Rosensteel's talk, somewhat misleadingly titled Large Customers Want PostgreSQL, Too. This talk turned to be about migrating a large Oracle mainframe application to use PostgreSQL, and the challenges faced during that migration. He, or his team, built an Oracle-to-PostgreSQL converter for stored procedures; it was interesting to see that they got bitten by our bizarre casting rules around the smallint data type. They also ended up doing some very interesting work optimizing the performance of ECPG for small FETCH statements; these are areas of the code that I think don't normally get a lot of attention, and it was great to hear about the optimization work that got done.
I was disappointed that Jon Hoffman's talk on Experiences with Postgres and MongoDB at foursquare.com got cancelled; I think that would have been an interesting talk. I did have an opportunity to attend Jake Luciani's talk Comparing the Apache Cassandra Architecutre to PostgreSQL, which turned out to be more about Cassandra than PostgreSQL, but was nevertheless interesting. I would have been interested to hear a more technical talk, though, about how problems like distributed serialization anomalies and distributed checkpointing are handled.
Next month, I'll be speaking at PGCon 2011 on Using The PostgreSQL System Catalogs and How To Get Your PostgreSQL Patch Accepted. And after that, unlike Bruce, I'm going to stay home for a few months!
Tuesday, February 01, 2011
MySQL vs. PostgreSQL, Part 2: VACUUM vs. Purge
Almost two months ago, I wrote part one of what I indicated would be an occasional series of blog posts comparing the architecture of PostgreSQL to that of MySQL. Here's part two. Please note that the caveats set forth in part one apply to this and all future installments as well, so if you haven't read part one already, please click on the link above and read at least the first two paragraphs before reading this post.
Monday, November 29, 2010
MySQL vs. PostgreSQL, Part 1: Table Organization
I'm going to be starting an occasional series of blog postings comparing MySQL's architecture to PostgreSQL's architecture. Regular readers of this blog will already be aware that I know PostgreSQL far better than MySQL, having last used MySQL a very long time ago when both products were far less mature than they are today. So, my discussion of how PostgreSQL works will be based on first-hand knowledge, but discussion of how MySQL works will be based on research and - insofar as I'm can make it happen - discussion with people who know it better than I do. (Note: If you're a person who knows MySQL better than I do and would like to help me avoid making stupid mistakes, drop me an email.)
Wednesday, November 10, 2010
Rob Wultsch's MySQL Talk at PostgreSQL West
I thought this talk deserved a blog post of its own, so here it is. I have to admit that I approach this topic with some trepidation. The MySQL vs. PostgreSQL debate is one of those things that people get touchy about. Still, I'm pleased that not only Rob, but a number of other MySQL community members who I did not get a chance to meet, came to the conference, and it sounds like it will be our community's turn to visit their conference in April of next year. Rob was kind enough to offer to introduce me to some of the MySQL community members who were there, and I, well, I didn't take him up on it. That's something I'd like to rectify down the road, but unfortunately this was a very compressed trip for me, and the number of people I had time to talk to and meet with was much less than what I would have liked.
Monday, November 08, 2010
PostgreSQL West Talks
As I blogged about before the conference, I gave two talks this year at PostgreSQL West. The first was a talk on the query optimizer, which I've given before, and the second talk was on using the system catalogs, which was new. While the second one was well-attended, the first one was packed. I keep hoping I'll think of something to talk about that people find even more interesting than the query planner, but so far no luck. Slides for both presentations are now posted; I've added two slides to the system catalogs presentation that weren't there when I gave the talk, but probably should have been.
Nearly all the talks I attended were good. Some of the best were Greg Smith's talk on Righting Your Writes (slides), Gabrielle Roth's talk on PostgreSQL monitoring tools, and Joe Conway's talk on Building an Open Geospatial Technology Stack (which was actually given in part by Jeff Hamann, who has a company, and a book). All three of these, and a number of the others, were rich with the sort of anecdotal information that it's hard to get out of the documentation: How exactly do you set this up? How well does it actually work? What are its best and worst points?
Another memorable talk was Rob Wultsch's talk entitled "MySQL: The Elephant in the Room". But that talk really deserves a blog post all of its own. Stay tuned.
Nearly all the talks I attended were good. Some of the best were Greg Smith's talk on Righting Your Writes (slides), Gabrielle Roth's talk on PostgreSQL monitoring tools, and Joe Conway's talk on Building an Open Geospatial Technology Stack (which was actually given in part by Jeff Hamann, who has a company, and a book). All three of these, and a number of the others, were rich with the sort of anecdotal information that it's hard to get out of the documentation: How exactly do you set this up? How well does it actually work? What are its best and worst points?
Another memorable talk was Rob Wultsch's talk entitled "MySQL: The Elephant in the Room". But that talk really deserves a blog post all of its own. Stay tuned.
Subscribe to:
Posts (Atom)