So, what's next? What should our community be focusing on for PostgreSQL 9.1? If you've been following developments on pgsql-hackers, you might be tempted to pick out some of the big patches that weren't completed in time for 9.1, like KNNGIST (use indices to accelerate queries that do ORDER BY
But I think it might be good to step back and look at the problem a bit more broadly. Ignoring for a minute what people actually are working on, what should they be working on? Where is PostgreSQL strong as a product and where does it need improvement? Here are a few things to think about - please leave a comment with your thoughts.
1. Performance and Scalability. When I first started using PostgreSQL, the product had a reputation for being slow, principally because of issues relating to VACUUM. That reputation wasn't entirely justified even back then, and I think we've made enormous progress here in 8.3 and 8.4, but there might be more improvements we can make. Where are the remaining bottlenecks? What features can we provide to make better use of machines with lots and lots of RAM, CPU, and/or I/O bandwidth? Or what if you have more than one machine available? Are there classes of queries that we don't optimize well, and could do better with?
2. SQL Features. We're pretty close to having all of the SQL features required by the standard - and many of the ones that are left are not terribly interesting, which may be why no one has gone to the trouble of implementing them. Still, there are still a few more things that would be nice to have, including updateable views, materialized views, LATERAL(), and global temporary tables Do you need these features? Are there other things we're missing?
3. Indexes and Data Types. We currently support btree, hash, gin, and gist indices; hash indices are somewhat limited at present because they are not WAL-logged, and therefore can become corrupted after a system crash. We also support a broad array of datatypes, including all of the usual base types plus user-defined record and enum types, an XML datatype, and the ability to create new datatypes using PostgreSQL's powerful extensibility model. What else do we need, or what that we already have needs improvement?
4. Procedural Languages. Our core distribution includes support for SQL functions and four procedural languages: PL/pgsql, PL/perl, PL/python, and PL/tcl. Additional languages such as PL/R and the ever-popular PL/LOLCODE are available as extensions. It seems unlikely to me that we need more procedural lanaguages, but the existing ones might need improvement.
5. Security. Security can be further subdivided into connection security (preventing the bad guys from connecting to your database) and database privileges (allowing access to some of the data in the database, but not all of it). Two major features that we don't have in this are row-level security and SE-Linux integration, but there may be other things as well. What are they and which ones are important?
6. Administration. I think that the simplicity of administering PostgreSQL is one of its greatest strengths: installing PostgreSQL doesn't mean that you need to hire a dedicated DBA. Still, there's always something that's hard to do. What is it?
7. Replication and High Availability. I alluded to some possible projects in this area near the top of this post, and of course some of our needs in this area will continue to be met by third-party projects, such as Slony, Bucardo, and Londiste, but there may be other enhancements needed in the core product, also.
8. Other Stuff. What else is there?
I would love to see some native ability to view past versions of a record or resurrect a deleted row without the need to build a soft delete system. I've read some blog postings about how such a feature could be built. See Scott Bailey's slides from PG West 2009 on future enhancements.
ReplyDeletehttp://scottrbailey.files.wordpress.com/2009/10/temporal_data.pdf
I would love to see better documentation around configuration settings as well as the ability to change more from within PostgreSQL.
ReplyDeleteI'd love to see PostgreSQL provide some of the functionality of a graph database.
ReplyDeleteBuilt-in materialized views would be *awesome*.
ReplyDeleteHow about making multiple instances able to operate from separate machines on shared files. In other words, would it be possible to make several machines run the PG daemon but all using the same files on an nfs/SAN/ceph based share safely. Maybe even directly on RADOS files?
ReplyDeleteI'd vote for better JDBC drivers - one may argue that's not core issue - look, if you can't use Postgres's object system in Java - is that a small issue?
ReplyDeleteI'd like to see:
ReplyDelete- automatically maintained clustered indexes (pg clustered indexes are awesome but can be painful to keep up-to-date in terms of load)
- materialized views could be really cool for caching in the DB tho they're easily worked around
- better performance monitoring tools
To my way of thinking, documentation is key at this point and time. People know mysql is the default and use that instead of looking at PG to see if this DB might be a better solution. We used that at DirtyPhoneBook without even taking a look at PG because of that I'm afraid. Now we're scaling well and all, but missing several features that would make life a little easier.
ReplyDeleteAutomatically maintained clustered index would be my favorite.
ReplyDeleteThe feature I would love most is the ability to modify a VIEW, or COMPOSITE TYPE that functions are dependent on. In this way it's not necessary to DROP .. CASCADE, re-create the object, and then re-create the functions.
ReplyDeleteReal procedures that can start and commit transactions would also be a benefit.
ReplyDeleteI would also appreciate to be able to better integrate PostgreSQL databases in information systems, by being able to EASILY and EFFICIENTLY join for instance 2 tables, one belonging to a pg database and the other managed by another RDBMS that respects SQL/MED standart.
ReplyDeletewhat I personally would love to see is the ability to do locale based sorting on a query-by-query basis instead of only being able to set LC_COLLATE on database creation time.
ReplyDeleteEven just being able to define different indexes with varying LC_COLLATE and then choosing which to use would be perfect.
CUBE() is my #1 feature
ReplyDelete1. Extending partitioning.
ReplyDeleteWhat's wanted hasn't entirely solidified yet; there have been proposals to this end.
2. Autonomous stored procedures that can manage transactions.
I suspect that the Right Idea here is to have an approach that is conspicuously different from current Stored Functions, so that people don't waffle about which to use in any given case.
Please focus on these:
ReplyDelete1. The website itself
2. Documentation on the website itself (as tested by finding the correct documentation for the current version via Google)
3. East of administration 'out of the box' (i.e. backup scenario that works with minimal understanding of underlying issues for simple deployments).
4. Cloud tools (i.e. backups going to S3)
5. Deprecating some of the arcane syntax like \dt instead of 'show tables' or something simple and descriptive.
Those are the biggest obstacles for me.
L10N/I18N sorting in tables.
ReplyDeleteFor my Reporting/BI--MERGE statement and Materialized Views
ReplyDeleteTwo things I have hoped to see a long time are:
ReplyDeleteProper collation support since I am a bit of a language nerd and hate the way collation works right now. A global locale is never right.
MERGE to be implemented since it is so common to want to either insert or update. Almost every project includes that.
Adam, I disagree with almost all of your idea. I prefer \dt over SHOW and the documentation and the website of PostgreSQL are some of the best on the Internet. Sure they can always be improved but you make it sound like there would be a problem now. And backing up databases is trivial in simple cases for Postgres. It first gets hard when you try to do replication. Cloud tools also seem out of scope of the Postgres project and instead part of another project.
Only feature we missed so far: pg/sql ability to control transactions!
ReplyDeleteHow about have a Hot Standby that you can pause and resume WAL replay, but also rewind.
ReplyDeleteI could imagine situations where I want to see what my live server looked like at some point in history. So I pop over to the Hot Standby, hit 'rewind' and go back a few days, check the data and then hit 'play' again.
It's probably more an issue for associated projects, but more flexible asynchronous replication is very important for us. In particular:
ReplyDelete- Multi-master, so we can fail over to DR without having to get replication up-to-date first
- Some ability to transform replication events, so we can replicate to a different (incompatible) schema version, and thus perform upgrades without outages.
..... John
1) MERGE statement or some form of upsert would top my list.
ReplyDelete2) CUBE and ROLLUP
3) Dependency invalidation or something to help the view/function dependency hell.
4) XMLTABLE and binary XML data type
5) Better parameter support in plpython. For instance returning record.
@Alex - Our proposal for a period data type was shot down on hackers. Instead they wanted a more generic mechanism to create ranges from many scalar data types not just timestamps. Jeff Davis is working on this for 9.1.
I'm not sure what to say to the folks asking for better documentation. Postgres' docs are top notch. Go look at the website/docs for Firebird and then we'll talk. Perhaps we can do something about getting Google to prefer "current" instead of a particular version.
This request might be more of a 10.0 feature than a 9.1 feature... I would like to have support for historical queries, similar to Oracle's Query Flashback:
ReplyDeletehttp://wiki.oracle.com/page/Query+Flashback
If PostgreSQL supported SELECT ... AS OF TIMESTAMP queries, that would make it much easier to develop applications that require an audit trail. Without that feature today, you could only develop such an application via an awkward convention of INSERTing log entries with timestamps — never UPDATEing or DELETEing — and querying views that "replay" the log. It would be much easier if the database supported it natively.
I would second better drivers, especially asynchronous drivers (see ADBCJ). That would vastly improve scalability and make PG an attractive option for large scale applications.
ReplyDeleteA stronger focus on performance on SSDs. RethinkDB is building a new storage engine for MySQL that's optimized around the performance characteristics of SSDs. It would be great to see a similar initiative in the PG community.
ReplyDeleteCheckout what the guys at RethinkDB are doing for more detail
Comparing to MySQL you definietly needs simpler installation, adding users and creating databases. It is very simple in mysql, can be also done from phpmyadmin easly.
ReplyDeleteBeyond that I think psql is in very good shape.
For me most important is data safety. Maybe you should check on start check for example if all writing barriers and sync, fsync are all properly working. some disks, filesystems, volume manager, crypto layers, just cheats, and do ignore this requests and sometimes doesn't even return proper error codes. :(
Document !!
ReplyDeleteI am a traditional chinese version user. I want to do the chinese version SGML document.
That would create many user in china area with a chinese version document.
I'd like to see better support for databases with thousands or tens of thousands of schemas, especially when most are almost identical.
ReplyDeleteDriver Support.
ReplyDeleteI mainly do Python and http://wiki.python.org/moin/PostgreSQL shows that there's no real consensus on what driver to use (I think there also was a recent discussion on the mailing list).
For the main langauges present I'd love to see "official" drivers from PostgreSQL in a sync and async Variant.
Also Multi-Master without shared storage
MERGE is at the top of my list. After that, probably rounding out the windowing functions, and then materialized views.
ReplyDeleteI don't personally care much for updateable views; we can do that now with RULEs and it's not much work. Materialized views are more problematic IMO to do by hand.
I TOTALLY agree with pilif, this would be a very important feature!
ReplyDeleteYou could create indexes also, on a locale-basis so it's always fast.
"what I personally would love to see is the ability to do locale based sorting on a query-by-query basis instead of only being able to set LC_COLLATE on database creation time.
Even just being able to define different indexes with varying LC_COLLATE and then choosing which to use would be perfect."
Storage optimized for SSD drives
ReplyDeleteNested Transactions
ReplyDeleteHehe, there are 34 comments before mine, listing about 25 different, significant and nontrivial features. Seems like we're not going to run out of things to do anytime soon. ;-)
ReplyDelete1) Manage transaction inside functions (allow commit)
ReplyDelete2) autonomous transactions
3) altering composite types and enums without recrate it.
Much better XPath support.
ReplyDeleteXQuery support.
better features for key/value pairs.
SSD tuning.
I'd like to see more development involved around Postgres-XC: multi-master synchronous pgsql cluster...
ReplyDeleteI have three performance suggestions:
ReplyDelete1. Implement partial sorts. This is useful when you have a fast source of tuples that are sorted by field a, but you need tuples "ORDER BY a, b". Currently, Postgres loads the whole lot into memory and sorts it, whereas it only needs to sort each group of similar values of a.
2. A generalised multi-column R-tree index. What I mean is, in a similar way that you can create a multi-column B-tree index, which can answer queries like "a = 1 AND b > 1", it would be great to have a multi-column R-tree index, which can answer queries like "a < 1 AND b > 1". This may be what Jeff Davis is doing, but I haven't looked.
3. Planner risk analysis. There are queries where picking the lowest cost plan is not the sensible choice, if that plan is more risky than other plans. An example is a query with EXISTS. If matching rows are common in the table, then Postgres will use a sequential scan to answer EXISTS, assuming that it will find a matching row very quickly. However, if the table is ordered in some way, and all the matching rows are at the end of the table, a full sequential scan is performed, which can cause bafflingly slow queries, where an index would be very quick. We see this crop up on the performance mailing list quite frequently.
1) The to use multiple cores to execute a single CPU-heavy transaction.
ReplyDelete2) Concurrent partition reading+processing for a single query
3) Clean fkey/pkey handling in partitions
I don't ask much, do I ;-) . I'm well aware of how absurdly improbable the first two are, and I'm not actually asking anyone to do anything. It's nothing more than a "gee, it'd be nice if..." .
I do think Pg's single-thread-per-backend, single-backend-per-connection design will begin to seriously hurt it as machines stop increasing in single-core performance as much and start sprouting more cores than you can count. At some point the horrid task of getting multi-threaded backends going is going to have to be tackled.
For that matter, separating query executor from connection state may become necessary, too. You can get around the very expensive connections (because each connection has a private executor that's idle most of the time) in Pg using an external pool, but it'd be really nice if people didn't have to worry about poolers to get the most out of Pg in situations with high client counts.
@Peter Eisentraut: it's a basic fact of economics that humans always want (demand) more but resources (people, time, etc.) are scarce.
ReplyDeleteFor those that have lots of resources (CPU, I/O bandwidth, etc.), some form of parallelization of queries would be interesting, e.g., breaking down a query over a partitioned table so that each partition is processed by a separate process (or thread).
Based on past data warehouse experiences:
ReplyDelete1) Materialized views.
2) Merge Statement
3) Multiple storage-engine architecture
4) Simpler Partitioning
I put number 3 in there as being a MySQL fan (though not as much anymore), I love the fact you can change the underlying storage engine and still retain the same interface. It has lead to some very cool projects like RethinkDB, Infobright, PBXT, etc. What I'm looking at specifically is the ability to plug in a column-oriented storage engine as they provide significant benefits for OLAP systems.
Writable CTE. This will change postgres. There are a couple of other things that would be really nice may not ever happen, like an in-process postmaster.
ReplyDeleteI think updateable and materialized views would be useful.
ReplyDeleteAbility to maximize use of multi-core CPUs and I/O partitions in single queries.
More cowbell. :)
@MartinMarcher
ReplyDeleteBoth SQLAlchemy and Django use psycopg2 by default. While there is no absolute consensus (choice is good!) there is, I would say, 90% consensus.
I'd love to see PostgreSQL provide some of the functionality of a graph database.
ReplyDeleteThis. Having CTEs makes it easier to do "graph stuff" in PostgreSQL, but graph operations done this way don't scale particularly well from what I've seen. Putting some more thought into ways to make PostgreSQL excel at storing and querying graphs would be a wonderful thing.
Clustered indexes. Not clustering a table using an index (seems to be some confusion here), but actual clustered indexes aka index organized tables. That is to store the entire table inside an index. The result is that when we're looking up something by primary key we only need to do the index scan to find the data we're interested in. Today we need to scan the index and then actually find the row on the heap.
ReplyDeleteOn a similar (very) note, index only queries. With this we can select data directly from an index without needing to lookup the data from the underlying table.
I agree with many of the suggestions:
ReplyDelete- Materialized/indexed views (e.g. useful for calculated fields in views that don't exist in underlying tables, etc.)
- Built-in partitioning syntax
- Index-only scans (essential for people who like to use natural keys for the primary key)
...But there is another feature that is related to the replication features that I don't hear much about:
- Built-in "database resynchronization"
Not everyone replicates for performance. I need replication for fault-tolerance/fail-over. It's useful to have hot-standby, but what do I do when the "down" server comes back up again? How do I re-sync the databases in a way that will minimize downtime, so that the data in both is the most current?
Materialized views and partitioning in the base syntax without the over-head required to do it today would be positively amazing. Index Organized tables and further scalability/reliability improvements as discussed in the post would also be very very very welcome.
ReplyDeleteConcentrate entirely on user friendliness. Postgres is so much less user friendly than MySQL. People like me are just running simple web sites, and the awesome database stuff beneath the covers in Postgres doesn't make any difference to us. Thus, we end up using MySQL just because it's so much easier. We even use *gasp* the dreaded MyISAM, because it doesn't make any difference for our applications.
ReplyDeleteMake Postgres way way way more user friendly in pretty much every aspect. Now that replication is there, this should be priority number one.
After hearing the effusive praise for Postgres I've been giving it a shot. If I could make a few requests: strengthen the UPDATE capabilities. For instance, allow group-by syntax in a update with joins. Also, I hear there's no "replace into" support. And this: "" doesn't equate into an empty string? Really? It's 2010 not 1980.
ReplyDeleteMy vote: materialized views.
ReplyDeleteCurrently we maintain our own "materialized views" as separate tables that we update via Java code. It works, but it's a pain, and it's easy to mess up. I would love it if the DB handled all that work for us.
Thanks!
I'm no database guy, but my business associate won't use postgres because it doesn't have true record level locking for serial transactions. I know for most uses mvcc and concurrent transactions are faster, but some people have to use the serial transaction features and would really like true record level locking.
ReplyDeleteParallel queries would also be nice.
I would love to see SSI (serializable snapshot isolation) for truly serializable transactions.
ReplyDeleteThen you would know for sure there's no isolation anomalies despite having very complex transactions.
Materialized Views.
ReplyDeleteThey're the only thing I really miss from my days using commercial database software.
live materialized views would be massively useful for web apps.
ReplyDeleteat the moment a postgres user is forced to make tradeoffs between good design (normalization, referential integrity) and performance - basically it's too slow to do all the joins, and you're better off duplicating the data.
a live materialized view would give the best of both worlds - create a well structured DB, but then use materialized views to cache the overhead of flattening the tables.
web sites frequently run at 10,000-1 or higher read/write ratio, so it really doesn't matter a bit if there is significant cost to maintaining the view (as long as it's not a complete rebuild on every change..)
1. autoconfiguration: I would love postgres to suggest or set its best guess at the optimal settings for shared_buffers, work_mem etc given my hardware.
ReplyDelete2. query optimization with stored procedures: while the query optimizer is great for plain SQL queries it is not able to dig down and take into account the queries within stored procedures. That is, I have a SQL query that has a subquery with a call to PL/PGSQL stored procedure. Expand the ultimate query from all the calls and optimize that.
3. documentation: yes, the technical docs are good but what about books for newbs: "postgres for dummies", "postgres in 24 hours" etc. These are important in lowering the barrier to entry to non-DBAs. That your average Barnes & Noble / Borders etc. has 10+ books on MySQL, none on postgres I am sure drives traffic to mysql.
4. schema support: schema are not yet first class concepts with full support. E.g. in 8.3 there is not schema option for log_line_prefix and I cannot specify a schema in psql -c
Can you tackle a big problem?
ReplyDeleteDistributed database with automated load-balancing, optimized querying, data replication, et cetera.
While temporal databases, graph-database-like interaction, data-cubes, and the multitude of RDBMS standards are all very nice to have and support...
RDBMSes are inherently based on relational algebra. Relational algebra ought to be distributable. I would like to be able to access my database as though it were a single server. But, when I need additional performance, add a few systems to a cloud and call it good. I don't want to worry that if one of those systems goes down, the data is lost. I don't want to be limited in my data capacity to that amount of storage available on the smallest machine. This might require that I have a separate disk performing asynchronous saves like git change-sets to a central data store. Each system would have to be ACID compliant with its operations including a commit to the queue on the persistent data store manager.
That's a big task, eh?
I Vote for:
ReplyDeleteParallel Query Optimizer (From Greemplum)
Converting SQL or MapReduce into a physical execution plan.
Using a cost-based optimization algorithm in which it evaluates a vast number of potential plans
and selects the one that it believes will lead to the most efficient query execution.
Take a global view of execution across the cluster, and factors in the cost of moving data between nodes
in any candidate plan.
Polymorphic Data Storage (From Greemplum)
Customers can tune the storage types and compression settings of different partitions within the same table.
A single partitioned table could (for example)
have older data stored as 'column-oriented with deep/archival compression',
more recent data as 'column-oriented with fast/light compression',
and the most recent data as 'read/write optimized' to support fast updates and deletes.
BITMAP INDEX like Oracle 9i Does
create bitmap index
idx_product_local
on
pos_product_local(ag_prod_cod_barra, ag_gdet_local)
from
pos_product_local ag ,pos_product a, pos_local pg
where
ag.ag_prod_cod_barra = a.prod_cod_barra and ag.ag_gdet_local = pg.gon_codigo;
MATERIALIZED VIEWS
Need of Built-in materialized views and updateables.
CONFIGURATION PARAMETER
The use multiple cores to execute a single CPU-heavy transaction.
Fun post Robert! Here's a PostGIS take!
ReplyDeleteThe killer database feature nobody seems to have been able to implement yet is good temporal support. Read "Temporal Data and the Relational Model by Date (no flame wars ;)) and Darwen, the stuff in there would save *tons* of application development time in lots of areas (accounting, contracts, scheduling) and improve application correctness and performance as a bonus.
ReplyDeleteServer-side support for graph algorithms would also be great. DFS, BFS, all-pairs shortest paths, single source shortest paths, connected components, minimum spanning trees, etc all have lots of applications but sometimes require access to huge amounts of data to produce quite small result sets.
I would love to see better integration in .NET. A good LINQ to Postgres implementation would be really useful.
ReplyDeleteBeing able to alter column position in a way that is better than what is listed at http://wiki.postgresql.org/wiki/Alter_column_position would be appreciated.
ReplyDeleteMERGE or a simpler non-standard upsert-y statement.
ReplyDelete+1 for better JDBC drivers. The ability to use postgresql to its full potential with Hibernate and JPA in general would be huge. It's my understanding that it doesn't yet support fully JDBC 4.0.
ReplyDeletePostgres has been kind to me, but I still have two large problems with Postgres:
ReplyDelete1) Partitioning
2) Clustered Indexes
The data sets I deal with are time ordered sets of continuous data 24/7. This naturally leads me to want to both cluster on timestamp and partition by time.
After realizing how much server side code had to be written to support partitioning (building up the next partition at the correct time, dropping old partitions, replacing triggers etc) I was actually a bit shocked. I suspect that many users besides myself are looking to parition by time ranges, so I think that support for automated paritioning based on some simple rules (per day, per 1,000,000 records) would be very useful.
Additionally, the lack of automatic clustering significantly hurts reporting on my latest partition that is still receiving incoming data. It is relatively easy for me to get our "historical" partitions clustered, but the current partition is not (and can't reasonably be) routinely clustered. Right now there is no real solution besides making the clustered range a bit smaller to limit the size of the non clustered data set, but making too many paritions seems to have a significant impact on insert performance. I suspect that this has improved in 9.0, but I haven't yet had time to test.
First of all - shame on me for not having sent you guys a contribution yet. I will remedy this.
ReplyDeleteI've been using postgres for a small personal project: web-based (but socket driven) mmo space strategy game - infinite time / space, no "winner". the client is in silverlight of all things.
anyway. this is my first experience with postgres and i like it! since i'm in a continuous, heavy dev cycle and i'm still in early beta - i haven't learned much about the diagnostics and monitoring...
one thing i have used a lot is pg_dump... trying to automate beta deployments. it would be nice if -c supported "if exists, drop" and if -T also filtered out sequences. other than that...
YAY.
(oh also - if you want to play... i'll make you a beta account)
Although I'm happy with WAL shipping changes in 9.0 I'd still like to see full-blown multi-master replication.
ReplyDeleteTop 10 mentioned features so far:
ReplyDelete1. Materialized views
2. Multiple CPUs/parallel query
3. MERGE
4. Automatically clustered indexes (index-only scans?)
5. Improved partitioning
6. Multimaster replication
7. Easier configuration/administration
8. Temporal features
9. "Real" procedures with transaction handling
10. Locale per column or query
Which is up to small variations almost exactly what http://postgresql.uservoice.com/forums/21853-general says.
Personally I'd like to see more "enterprisey" stuff:
ReplyDelete1. Auditing capabilities similar to oracle. PG is getting pushed aside at my shop because of lack of SOX compliance auditing.
2. Triggers on "create" and "alter" SQL commands. I think this could potentially make replication much much easier to administer (you could trigger the replication system to replicate the change). Triggers on login/logout, startup/shutdown, and on errors (especially privilege errors for auditing) would be extremely useful.
3. More monitoring metrics to measure and tune PG parameters, such as autovac and bgwriter.
4. Read/Write capability while re-indexing. I have a 24x7x365 system and reindexing never happens.
5. Separate logs for connection logs, SQL logs, and system notice/warning/error logs.
I would like to see new features in table partitioning:
ReplyDeletea) Horizontal partitioning. Right now there is some very useful Horizontal partitioning. It would be nice to have support at GUI level.
It would be nice to improve the current Horizontal partitioning. Version 8.2.x needed some triggers and some manual stuff.
I think it would be nice to create a table and define the Horizontal partition with DDL and let postgresql to solve the implementation details.
b) Alter a Horizontal partitioned table. Meaning alter using DDL the partition scheme.
c) Vertical Partitined Table. Same thing: It would be nice to have support at GUI and DDL level.
Perhaps assigned columns or groups of columns to different tablespaces. Same indexes.
d) Alter a vertical partitioned table.
:D
More monitoring features. We want to know which component does a query realy slow down: parser? executer? IO-Time (read/write)? For us log_statement_stats, log_parser_stats, log_planner_stats, log_executor_stats are a little bit too much, but log_min_duration_statement(0) with the duration-time is not enough to drill down the what's realy going on inside of PG.
ReplyDeleteI'll add another one for "more information from the database". This could be:
ReplyDelete* More built in queries/views for determining index/table sizes, which indexes are getting used and which aren't, when things are getting bloated, etc. I've been devouring some of the pgCon talks about these things but the information sometimes seems so difficult to get to unless someone posts some helpful system catalog queries to build that info.
* More options for logging useful information (into separate files that can be rotated/handled differently) that can later be parsed and analyzed in useful ways. Especially if some contrib modules were added to do that analysis.
* Anything to help me make more sense out of EXPLAIN ANALYZE. My company's main product is maturing to the point where I can go back to some of our problem queries to optimize, but I often don't know where to start. Identifying problems and hinting at potential solutions would help me immensely. Documentation, tools, GUIs, I don't care, I just need some help.
* Some way to sanity check various configuration/tuning options. So many of the optimization recommendations and best practices are outdated and confusing. I'd like a simple way to determine either what my best settings are or be able to create test suites to benchmark various settings. Really, even an updated best practices documentation with each release would be awesome.
Simple and easy things that would make my life easier:
* Being able to modify ENUMs without having to rebuild my table would be super dandy.
* Some kind of an "If 0 results on UPDATE, do an INSERT instead" functionality. Currently I implement this in application logic (which makes sense), but being able to fire a single statement at the DB instead would be faster and more efficient code-wise.
Beyond that, next-gen pie-in-the-sky type features I'd like to see include:
* Materialized views would be helpful in some circumstances. I'd actually been investigating rolling my own here without knowing about this generalized functionality.
* Better partitioning support. Being able to split my data up without all the caveats and manual setup would be dandy.
* Multi-master clusters, when combined with that partitioning support would really enhance scalability.
I've been using PostgreSQL since 7.0/7.1 and I have to say the progress that's been made has been incredible. That so many things have dropped off my wishlist since those days is very encouraging and I really look forward to seeing what the future holds.
Thanks for sharing the link, but unfortunately it seems to be offline... Does anybody have a mirror or another source? Please answer to my post if you do!
ReplyDeleteI would appreciate if a staff member here at rhaas.blogspot.com could post it.
Thanks,
William
Thanks for sharing this link, but unfortunately it seems to be offline... Does anybody have a mirror or another source? Please reply to my post if you do!
ReplyDeleteI would appreciate if a staff member here at rhaas.blogspot.com could post it.
Thanks,
Peter
@Anonymous - Which link is not working for you?
ReplyDeleteHello,
ReplyDeleteI have a message for the webmaster/admin here at rhaas.blogspot.com.
Can I use part of the information from this blog post above if I provide a link back to this website?
Thanks,
Peter