Dispelling Myths About PostgreSQL Default Privileges

Posted by Sagar Jadhav in Percona on 2024-08-06 at 12:25

The ALTER DEFAULT PRIVILEGES command allows us to set the privileges that will be applied to objects created in the future. It’s important to note that this does not affect privileges assigned to existing objects; default privileges can be set globally for objects created in the current database or in specified schemas. There are many […]

Enhanced security: Logging PostgreSQL connections

Posted by Hans-Juergen Schoenig in Cybertec on 2024-08-06 at 07:00

Logging connections in PostgreSQL can be incredibly valuable for monitoring and troubleshooting purposes. However, that’s not all: auditing is also an important aspect which should be taken into account. Being able to trace and track activity is therefore vital to ensuring security, compliance, and more. This information can help identify potential security risks, such as unauthorized access or suspicious activity, allowing you to take swift action to mitigate threats.

Using log_connections in PostgreSQL

To achieve this in PostgreSQL, we can make use of a simple configuration variable:

test=# SHOW log_connections;
 log_connections 
-----------------
 on
(1 row)

By default, this configuration variable is set to “off”. However, turning this feature on can give us some really important details into the activity of your PostgreSQL database server.

The purpose of this variable is to instruct the servers to add additional information to the log file and provide us with some valuable insights. In PostgreSQL 17, the output we can expect consists of three log entries, which are as follows:

2024-06-11 11:44:26.429 CEST [19362] LOG:  connection received: 
host=[local]

The first entry tells us a bit more about the connect string. In my case, I tried to connect to a database called “postgres” through a local UNIX socket. The “host” entry confirms that.

However, the second line is even more important. Consider the following output:

2024-06-11 11:44:26.431 CEST [19362] LOG:  connection authenticated: 
user="hs" method=trust (/Users/hs/db17/pg_hba.conf:117)

This information helps us uncover what the authentication process is up to. First, we see which user is trying to connect. In my case, it is my local user called “hs”. The next piece of information indicates that PostgreSQL was configured to use “trust”. In other words, no password had to be supplied. What is really important here is that we can also see which line in which configuration file is responsible for this decision, maki

[...]

Posting Your Patch On pgsql-hackers

Posted by Robert Haas in EDB on 2024-08-05 at 21:16

Sometimes, people post patches to pgsql-hackers and... nothing happens. No replies, no reviews, nothing. Other times, people post to patches to pgsql-hackers and a bunch of discussion ensues, but nothing gets committed. If you're the sort of person who likes to write patches for PostgreSQL, or if you're being paid to do so, you'd probably like to avoid having these things happen to you. In this blog post, I'm going to explain what I think you should do maximize the chances of a good outcome (no guarantees!).Read more »

Open Source AI Database Agent Part 1: Introduction

Posted by Sergey Pronin in Percona on 2024-08-05 at 15:22

Generative AI is top of mind for many engineers. The questions of how it can be applied to solve business problems and boost productivity are still up in the air. Recently I wrote a blog post about the impact of AI on platform engineers, where I talked about various AI Agents and how they can […]

PostgreSQL and Instrumentation

Posted by Grant Fritchey on 2024-08-05 at 13:46

I’m still learning PostgreSQL and one of the things I’ve been looking at a lot lately is instrumentation. What do I mean? Well, if you’re in SQL Server, think, Dynamic Management Views (DMV), Extended Events, Query Store, <hack, spit> Trace <spit>. How do we know how long a query took to run? PostgreSQL can tell […]

The post PostgreSQL and Instrumentation appeared first on Grant Fritchey.

Six Degrees of Kevin Bacon - Postgres Style

Posted by Paul Ramsey in Crunchy Data on 2024-08-05 at 12:00

Back in the 1990s, before anything was cool (or so my children tell me) and at the dawn of the Age of the Meme, a couple of college students invented a game they called the "Six Degrees of Kevin Bacon".

The conceit behind the Six Degrees of Kevin Bacon was that actor Kevin Bacon could be connected to any other actor, via a chain of association of no more than six steps.

Why Kevin Bacon? More or less arbitrarily, but the students had noted that Bacon said in an interview that "he had worked with everybody in Hollywood or someone who's worked with them" and took that statement as a challenge.

Bacon Number

The number of steps necessary to get from some actor to Kevin Bacon is their "Bacon Number".

For example, comedy legend Steve Martin has a Bacon Number of 1, since Kevin Bacon appeared with him in the 1987 road trip comedy Planes, Trains and Automobiles.

Bacon-Martin

Zendaya has as Bacon number of 2. In 2017 she appeared with Marisa Tomei in Spider-Man: Homecoming, and in 2005 Tomei appeared with Bacon in Loverboy (which Bacon also directed).

Bacon-Zendaya

IMDB Data

The challenge of the original '90s Six Degrees of Kevin Bacon was to link up two actors using only the knowledge in your head. This seems improbably difficult to me, but perhaps people were smarter in the '90s.

In our modern age we don't need to be smart, we can attack the Bacon Number problem by combining data and algorithms.

The data half of the problem is relatively straightforward -- the Internet Movie Database (aka IMDB) allows direct download of the information we need.

In particular, the

name.basics.tsv.gz file for actor names and jobs
title.basics.tsv.gz file for movie names and dates
title.principals.tsv.gz file for relationships between actors and movies

The IMDB database files actually include information about every job on a film (writers, directors, producers, casting, etc, etc) and we are only interested in actors for the Kevin Bacon game.

ETL process to download and pr

[...]

Five Months Later – My Talk at SCaLE

Posted by Henrietta Dombrovskaya on 2024-08-05 at 03:06

That was my first presentation of the Security talk, and I hope it won’t be the last one! Still hope, despite many rejections

Does PostgreSQL respond to the challenge of analytical queries?

Posted by Andrei Lepikhov in Postgres Professional on 2024-08-05 at 03:00

This post was triggered by Crunchy Data's recent article and the YugabyteDB approach to using Postgres as an almost stateless entry point, which parses incoming analytic queries and splits the work among multiple instances managed by another database system that fits the task of storing and processing massive data volumes and can execute relatively simple queries.

The emergence of foreign data wrappers (FDW) and partitioning features has made these solutions possible. It seems that being compatible with the Postgres infrastructure and its mature parser/planner is valuable for vendors enough to consider implementing such hybrid data management systems.

So, now we face the fact that Postgres is used to do analytics on big data. An essential question immediately emerges: does it have enough tools to process complex queries?

What Is Analytic Queries

It is quite a general term. But after reading the course materials on the subject, I can summarise that analytic queries typically:

involve multiple joins
use aggregates, often mathematical ones
need to process large subsets of table data in a single query
have ad-hoc nature and are difficult to predict when it comes

So, looking into the Postgres changes, we should discover what has changed in aggregate processing, join ordering and estimation, and table scanning.

What was the rationale?

The technique of using Postgres as a middleware between user and storage has been triggered by the emergence of FDW and partitioning features. Parallel execution doesn't help much with processing foreign tables (partitions). Still, it is beneficial for speeding up the local part of the work.

The basics of these features were introduced in 2010 - 2017. Now, Postgres can push to foreign server queries containing scan operations, joins, and orderings. We also have asynchronous append, which allows us to gather data from foreign instances simultaneously. As a perspectiv

[...]

How to Get or Create in PostgreSQL

Posted by Haki Benita on 2024-08-04 at 21:00

"Get or create" is a very common operation for syncing data in the database, but implementing it correctly may be trickier than you may expect. If you ever had to implement it in a real system with real-life load, you may have overlooked potential race conditions, concurrency issues and even bloat!

In this article I explore ways to "get ot create" in PostgresSQL.

<small>Illustration by Abstrakt Design</small> — Illustration by Abstrakt Design

Table of Contents

Implementing "Get or Create"

Say you have a system that allows users to tag items. You set up the following tags table:

db=# CREATE TABLE tags (
  id INT GENERATED BY DEFAULT AS IDENTITY PRIMARY KEY,
  name VARCHAR(50) NOT NULL
);
CREATE TABLE

The table contains an auto incrementing primary key and a name. To make sure tags are reused, you add a unique constraint on the name:

db=# ALTER TABLE tags ADD CONSTRAINT tags_name_unique UNIQUE(name);
ALTER TABLE

Adding tags to the table is simple:

db=# INSERT INTO tags (name) VALUES ('A'), ('B') RETURNING *;
 id │ name
────┼──────
  1 │ A
  2 │ B
(2 rows)

INSERT 0 2

If someone tries to add a tag with a name which already exists, they will get an error:

db=# INSERT INTO tags (name) VALUES ('B') RETURNING *;
ER

[...]

The state of the Postgres community

Posted by Tomas Vondra on 2024-08-03 at 10:00

About a month ago I presented a keynote at Swiss PGDay 2024 about the state of the Postgres community. My talk included a couple charts illustrating the evolution and current state of various parts of the community - what works fine and what challenges will require more attention.

Judging by the feedback, those charts are interesting and reveal things that are surprising or at least not entirely expected. So let me share them, with a bit of additional commentary.

Contributions of w/c 2024-07-29 (week 31)

Posted by Jimmy Angelakos in postgres-contrib.org on 2024-08-02 at 19:49

Chaired by Karen Jex, the PGConf.EU Program Committee (Afsane Anand, Andreas Karlsson, Derk van Veen, Gülçin Yıldırım Jelínek, and Stefan Fercot) published the schedule for the 2024 conference.

See Karen Jex’s excellent blog post: “Shining a Light on PGConf.EU 2024 Talk Selection”

Chris Ellis has revamped the PGConf.EU 2024 website and schedule with numerous look and feel, usability and accessibility fixes.

The CloudNativePG community has released the first Release Candidate of CloudNativePG 1.24.

Robert Haas has announced the first PostgreSQL Hacking Workshops for August and September 2024: “PostgreSQL Optimizer Methodology” and”Walk-through of Implementing Simple Postgres Patch: From sources to CI”.

Shining a Light on PGConf.EU 2024 Talk Selection

Posted by Karen Jex in Crunchy Data on 2024-08-02 at 11:56

We just published the PGConf.EU 2024 schedule and I couldn't be more excited to share it with everyone. We hope you love the talks we've chosen as much as we do.

A lot of the work of the talk selection committee (a.k.a. program committee, CfP committee or CfPC) happens behind closed doors, and people are understandably interested in how many submissions we had, how the process works, how the talks were chosen etc.

This post brings some of the inner workings of the CfPC out into the open and talks about how we chose 51 talks out of a whopping 380 submissions!

Note that this blog post contains my own personal thoughts and ideas and doesn't necessarily represent the opinions of the rest of the PGConf.EU organisation team or selection committee.

Gathering the Team

I was lucky enough to have plenty of volunteers to be on the committee, so I could choose a team of 5 people (plus me) with diverse backgrounds, experience levels, roles, interests and ideas. That part is really important to make sure we create a program that appeals to all of the different PostgreSQL users that we love to see at PGConf.EU.

If you spot any of the team out and about, please say a huge thank you. The hard work, enthusiasm and professionalism of this amazing group of individuals meant it was a pleasure to work with them. I would absolutely, without hesitation work with any of them again:

Number of Submissions

We limited the number of submissions per person this year, mainly in an attempt to keep the CfPC workload reasonable, but also to improve the overall quality of the submissions.

The limit was designed to:

Encourage speakers to think carefully about what they most want to share.
Encourage speakers to tailor submissions to be relevant to the event.
Reduce the disproportionate time spent reviewing the 10-20 talks each that are submitted by just a handful of people.

Since the vast majority of speakers submit just 1 or 2 talks, we set the limit to 3 to make sure it didn't impact

[...]

Simplify PostgreSQL Execution Plan Analysis with StatsViaExplainAnalyze.

Posted by Deepak Mahto on 2024-08-01 at 22:16

Tuning SQL is an integral part of most developer’s or DBA’s primary responsibilities. Understanding and dissecting problematic execution plans are critical steps in the overall tuning process.

One key ingredient of SQL performance tuning, along with the runtime plan, is object statistics. Looking only at the execution plan is often insufficient for making solid recommendations; we need a fair understanding of current object statistics.As the saying goes,

Behind every optimized SQL query lies a deep understanding of the underlying object stats.

StatsViaExplainAnalyze – All In One HTML Report.

During my Oracle performance tuning days, we used to have a couple of scripts that extracted underlying objects’ current statistics based on the execution plan stored in Plan_Table. Knowing statistics on a single page, along with the execution plan, accelerates analysis and pinpoints lots of assumptions.

Simplifying Execution Plan Analysis in PostgreSQL.

At DataCloudGaze Consulting, we want to simplify execution plan analysis for our PostgreSQL community. Our goal is to accelerate problem identification and promote a stats-driven approach to query tuning, making it easier to share insights within the team or external forums.

StatsViaExplainAnalyze

Analyzing execution plans is made easier by curating execution plans and statistics of database objects, such as tables, indexes, or columns involved in the actual runtime execution plan, all within a single report. This makes it easier to share among team members or external forums and reduces the need for additional information requests.

This tool automates the curation of object statistics when analyzing problematic execution plans in PostgreSQL using an HTML template embedded in an SQL file and the psql command line.

How it Works?

Execution plan is generated either by Explain Analyze Buffers or only with Explain and stored in a plantable.

Using the psql command line, the current statistics

[...]

PGConf.EU 2024 - Get Ready: Schedule Now Live!

Posted by Karen Jex in PostgreSQL Europe on 2024-08-01 at 09:38

We invite you to the 14th annual PostgreSQL Conference Europe that will take place in Athens on October 22-24!

The schedule for the conference has now been published, and as you can see, we’ll have a great, diverse speaker lineup this year:

https://www.postgresql.eu/events/pgconfeu2024/schedule/

The conference is the largest PostgreSQL event in Europe and year after year brings together both renowned and emerging names in the Postgres ecosystem. Database developers, administrators, users, and open-source enthusiasts from all around the globe will come together to discuss the challenges, solutions, future of the database, and the upcoming release.

The first day of the conference is packed with half- and full-day Postgres training sessions, taught by PostgreSQL experts from around the world. These sessions have a limited number of places, so please register for those as soon as possible.

The conference always ends up with a long waiting list, so if you want to guarantee a seat, book your tickets today!

We look forward to seeing you in the Greek capital in October!

Contributions of w/c 2024-07-22 (week 30)

Posted by Jimmy Angelakos in postgres-contrib.org on 2024-07-30 at 20:48

PostgreSQL in Google Summer of Code:
- Akshat Jaimini passed midterm evaluation for project "RPC Sinks for PgWatch3" under mentoring of Pavlo Golub and Mark Wong
- Arjan Marku passed midterm evaluation for project "PostgreSQL JDBC Struct/Array Support" under mentoring of Dave Cramer and Mark Wong
Jimmy Angelakos released version 0.7 of extension pg_statviz with new features.

How to keep a community alive

Posted by Florent Jardin in Dalibo on 2024-07-30 at 09:30

The PG Day France took place on June 11th and 12th in Lille, my hometown. It is the event of the French PostgreSQL community that settles in a different city each year. The opportunity was too good for me and I met many people from all over France and its surroundings, to discuss PostgreSQL during two days of workshops and conferences.

For this edition, I had the pleasure of speaking and sharing my experience on the animation of the local Meetup group that I took over four years ago. In this article, I want to write down the main points discussed during this presentation, while waiting for the video of the conference to be posted online.

Master PostgreSQL in Kubernetes with CloudNativePG at the PostgreSQL European Conference

Posted by Gabriele Bartolini in EDB on 2024-07-30 at 05:40

Join me and PostgreSQL/Kubernetes expert Leonardo Cecchi in Athens on October 22 for the first-ever public training session on CloudNativePG at the 2024 PostgreSQL European Conference, where you’ll master essential day-2 operations for PostgreSQL in Kubernetes. Learn from the founders and maintainers of CloudNativePG through hands-on sessions covering monitoring, scaling, automated failover, and more.

What you should know about Linux memory overcommit in PostgreSQL

Posted by Laurenz Albe in Cybertec on 2024-07-30 at 05:00

If overbooking were handled like memory overcommit: a killer is sent to shoot two airline passengers — © Laurenz Albe 2024

Useful as memory overcommit may be for other applications, it is bad news for your PostgreSQL database. With this article I am trying to raise the awareness of this problem. I'll give you hints for the proper kernel configuration and discuss the implications for a containerized setup.

What is memory overcommit?

Linux tries to conserve memory resources. When you request a chunk of memory from the kernel, Linux does not immediately reserve that memory for you. All you get is a pointer and the promise that you can use the memory at the destination. The kernel allocates the memory only when you actually use it. That way, if you request 1MB of memory, but use only half of it, the other half is never allocated and is available for other processes (or the kernel page cache).

Overbooking is a concept that airlines all over the world have been using for a long time. They sell more seats than are actually in the plane. From experience, airlines know that some passengers don't show up for the flight, so overbooking allows them to make more profit. By default, Linux does the same: it deals out (“commits”) more memory than is actually available in the machine, in the hope that not all processes will use all the memory they allocate. This memory overcommit is great for using resources efficiently, but it has one problem: what if all the flight passengers show up, that is, what if the processes actually use more memory than is available? After all, you cannot offer a computer process a refund or a free overnight hotel room.

If a process tries to use memory that it has allocated, but the kernel cannot provide it, there is only one thing Linux can do: it activates a component called out-of-memory killer that kills a hapless process that uses a lot of memory. Killing processes frees their memory, and Linux now has that memory at its disposal. (Think of that the next time you don't get the flight you booked — it could be worse.)

Why is memory overcommit a problem for Postgre

[...]

Ensuring Safe Data Modifications in PostgreSQL – Part 2

Posted by semab tariq in Stormatics on 2024-07-29 at 10:11

Concurrency control in databases ensures that multiple transactions can occur simultaneously without causing data errors. It’s essential because, without it, two people updating the same information at the same time could lead to incorrect or lost data. There are different ways to manage this, including optimistic locking and pessimistic locking. Optimistic locking assumes that conflicts […]

The post Ensuring Safe Data Modifications in PostgreSQL – Part 2 appeared first on Stormatics.

PostgreSQL Hacking Workshop - August 2024

Posted by Robert Haas in EDB on 2024-07-29 at 02:42

I'm pleased to be able to formally announce the PostgreSQL Hacking Workshop, as well as our first two topics, planned for August and September 2024.

Designing a Prototype: Postgres Plan Freezing

Posted by Andrei Lepikhov in Postgres Professional on 2024-07-29 at 01:00

This story is about a controversial PostgreSQL feature - query plan freezing extension (see its documentation for details) and the code's techniques underpinning it. The designing process had one specific: I had to invent it from scratch in no more than three months or throw this idea away, and because of that, solutions were raised and implemented on the fly. Time limit caused accidental impromptu findings, which can be helpful in other projects.

Developers are aware of the plan cache module in Postgres. It enables a backend to store the query plan in memory for prepared statements, extended protocol queries, and SPI calls, thereby saving CPU cycles and potentially preventing unexpected misfortunes resulting in suboptimal query plans. But what about sticking the plan for an arbitrary query if someone thinks it may bring a profit? May it be useful and implemented without core changes and a massive decline in performance? Could we make such procedure global, applied to all backends? - it is especially important because prepared statements still limited by only backend where they were created.

Before doing anything, I walked around and found some related projects: pg_shared_plans, pg_plan_guarantee, and pg_plan_advsr. Unfortunately, at this time, they looked like research projects and didn't demonstrate any inspiring ideas on credible matching of cached query plans to incoming queries.

My initial reason to commence this project was far from plan caching: at this time I designed distributed query execution based on FDW machinery and postgres_fdw extension in particular. That project is known now as 'Shardman'. Implementing and benchmarking distributed query execution, I found out that the worst issue, which limits the speed up of queries that have to extract a small number of tuples from large distributed tables (distributed OLTP), is a repeating query planning on each remote server side even when you know that your tables distributed uniformly and the plan may be fully identical on each instance. Wor

[...]

PostgreSQL Connection Poolers

Posted by Stefanie Janine on 2024-07-28 at 22:00

PostgreSQL Connection Poolers

The reason ot use connection poolers is, that it is relative expensive to open a new connection to a PostgreSQL server.

This is due to the architecture of PostgreSQL as each connection is a forked process. This is obviously taking some time, in fact longer than checking the user credentials.

A connection pooler can step in to help to solve this problem. When a client is connecting to PostgreSQL through a connection pooler, the pooler takes care to open connections. It keeps unused connections opened to reuse them. That way the cost and time to open a database connection is substantially reduced.

In addition poolers can queue incoming queries and handle more connections than the database server itself capable to handle.

PostgreSQL itself does not come with a connection pooler.

Available Connections

The maximum amount of connections is configured in the configuration file with max_connections.

But to calculate the connections that are really available, one has also to check the reserved connections for superusers configured in superuser_reserved_connections.

With PostgreSQL 16 arrived a new parameter to reserve connections to certain roles, reserved_connections.

These two reduce the amount of connections being available for regular users.

-- default is 100
show max_connections;

-- default is 3
show superuser_reserved_connections;

-- default is 0
show reserved_connections;

-- calculation, usable in all PostgreSQL versions
SELECT sum(
    CASE name
      WHEN 'max_connections' THEN
        setting::int
      ELSE
        setting::int * (-1)
      END)
  AS available_connections
FROM pg_settings
WHERE name IN
  (
    'max_connections',
    'superuser_reserved_connections',
    'reserved_connections'
  )
;

The formula to calculate available connections for regular users is:

max_connections - superuser_reserved_connections - reserved_connections

Client Side Connection Pooler

There are several client side connection pooling O

[...]

Enhancing PostgreSQL Performance Monitoring: A Comprehensive Guide to pg_stat_statements

Posted by Umair Shahid in Stormatics on 2024-07-26 at 18:23

PostgreSQL has a rich set of features designed to handle complex queries efficiently. Much like any database system, however, its performance can degrade over time due to inefficient queries, improper indexing, and various other factors. To tackle these challenges, PostgreSQL provides several tools and extensions, among which `pg_stat_statements` stands out as a critical component for performance monitoring and tuning.

Introduction to pg_stat_statements

`pg_stat_statements` is an extension for PostgreSQL that tracks execution statistics of SQL statements. It is designed to provide insight into the performance characteristics of database queries by collecting data on various metrics such as execution time, number of calls, and I/O operations. This extension is immensely useful for database administrators and developers looking to optimize their SQL queries and improve overall database performance.

The post Enhancing PostgreSQL Performance Monitoring: A Comprehensive Guide to pg_stat_statements appeared first on Stormatics.

Contributions of w/c 2024-07-15 (week 29)

Posted by Jimmy Angelakos in postgres-contrib.org on 2024-07-24 at 16:06

Gülçin Yıldırım Jelinek is organizing DivaConf in Turkey.
- x.com/apatheticmagpie/status/1813142104979177716
- github.com/gulcin/tech-conference-organizer
The PGDay.UK team (Dirk Krautschick, Divya Sharma, Greg Clough, Chris Ellis published the schedule for the 2024 conference.
David Wheeler introduced The PGXN RFC Book.

To Preload, or Not to Preload

Posted by David Wheeler in Tembo on 2024-07-24 at 14:00

To preload, or not to preload, that is the question:
Whether ’tis nobler in the ram to suffer
The slings and arrows of pointer functions,
Or to take arms against a sea of backends,
And by alter role limit them: to session, to user

— William Shakespeare, DBA (probably)

Recently I’ve been trying to figure out when a Postgres extension shared libraries should be preloaded. By “shared libraries” I mean libraries provided or used by Postgres extensions, whether LOADable libraries or CREATE EXTENSION libraries written in C or pgrx. By “preloaded” I mean under what conditions should they be added to one of the Shared Library Preloading variables, especially shared_preload_libraries.

The answer, it turns out, comes very much down to the extension type. Read on for details.

Normal Extensions

If your extension includes and requires no shared libraries, congratulations! You don’t have to worry about this question at all.

If your extension’s shared library provides functionality only via functions called from SQL, you also don’t need to worry about preloading. Custom types, operators, and functions generally follow this pattern. The DDL that creates objects, such as CREATE FUNCTION, uses the AS 'obj_file', 'link_symbol' syntax to tell PostgreSQL what library to load when SQL commands need them.

For certain extensions used by nearly every connection, there are may be performance benefits to preloading them in shared_preload_libraries, but it’s not required. See below for details.

Initializing Extensions

If your shared library needs to perform tasks before PostgreSQL would load it --- or if it would never be loaded implicitly by SQL statements --- then it must be explicitly loaded before it’s used. This is typically the case for libraries that modify the server’s behavior through hooks rather than providing a set of functions.

To accommodate these requirements, PostgreSQL provides three preloading levels that correspond to the configuration variables for

[...]

The fastest way to copy data between Postgres tables

Posted by Anthony Sotolongo León in OnGres on 2024-07-24 at 11:00

Introduction

Data cloning from one table to another in relational databases is a commonly used process to copy data from an existing table to a new or pre-existing table definition within the same database. This process can be performed for various reasons, such as data backup, information replication, and testing, among other purposes. Postgres and other third-party tools offer several techniques to clone data from one table to another.

The purpose of this blog is to try to find the fastest way to clone data in Postgres to be prepared for this type of request in the future. To do this, several examples and the results will be shown.

For the sake of the example, the following table definitions will be used, To simplify the testing activity, the tables will not have indexes or triggers:

postgres=# CREATE TABLE origin_table (i bigint, d date, j int, t character varying (10));
CREATE TABLE
postgres=# INSERT INTO origin_table SELECT i,'2024-02-15', i/10, md5(i::text)::character varying(10) FROM generate_series(1,20000000) as i; 
INSERT 0 20000000
postgres=# ANALYZE origin_table ;
ANALYZE
postgres=# CREATE TABLE target_table (i bigint, d date, j int, t character varying (10));
CREATE TABLE

Tests

INSERT INTO SELECT (IIS)

One of the most common ways to clone data using almost standard SQL. An IIS statement is used to select data from the original table and then insert it into the target table.

postgres=# INSERT INTO target_table SELECT * FROM origin_table ;
INSERT 0 20000000
Duración: 12795,623 ms (00:12,796)
--query plan
postgres=# EXPLAIN (ANALYZE,VERBOSE) INSERT INTO target_table SELECT * FROM origin_table ;
                                                                QUERY PLAN                                                           
------------------------------------------------------------------------------------
 Insert on public.target_table  (cost=0.00..347059.24 rows=0 width=0) (actual time=12757.949..12767.355 rows=0 loops=1)
   ->  Seq Scan on public.origin_table

[...]

Speeding up index creation in PostgreSQL

Posted by Hans-Juergen Schoenig in Cybertec on 2024-07-23 at 08:00

Indexes are, by far, the most feature related to performance provided by every general purpose relational database. Without indexes, there are no such things as efficient search operations, no unique constraints, and no primary keys. Achieving good performance on reasonable amounts of data without indexes is therefore totally impossible.

Generating sample data

The main question now is: what happens during index creation, and how can we speed up the process? Creating indexes on billions of rows is quite costly because data has to be sorted and turned into an efficient search structure. To demonstrate how we can make indexes quickly, I have created a simple set of demo data:

blog=# CREATE TABLE t_data AS 
SELECT  id::int4, 
(random() * 1000000000)::int4 AS i, 
(random() * 1000000000) AS r, 
(random() * 1000000000)::numeric AS n 
FROM    generate_series(1, 1000000000) AS id;
Time: 1569002.499 ms (26:09.002)

This sample data has a couple of interesting properties: the “id” column is an ascending number. During index creation, this makes a huge difference. The second column contains a random value multiplied by the number of rows as integer value. The third column contains a “double precision” number, and finally, at the end, we are storing similar data as “numeric”, which is a floating point number that does not use the FPU (floating point unit of the CPU) internally.

Once this billion of rows has been created, we can easily check the size of the table by using the following command:

blog=# SELECT pg_size_pretty(pg_relation_size('t_data'));
 pg_size_pretty
----------------
 56 GB
(1 row)

Once the test data has been created, it makes sense to set a thing in PostgreSQL called “hint bits” which will allow us to make a fair comparison between various runs. If you are interested in hint bits and their relation to VACUUM, consider checking out our blog post about this topic.

blog=# VACUUM ANALYZE;
VACUUM
Time: 91293.971 ms (01:31.294)

While VACUUM is running, we can check the progr

[...]

Wait a minute! — PostgreSQL extension pg_wait_sampling

Posted by Andrew Atkinson on 2024-07-23 at 00:00

PostgreSQL uses a complex system of locks to balance concurrent operations and data consistency, across many transactions. Those intricacies are beyond the scope of this post. Here we want to specifically look at queries that are waiting, whether on locks or for other resources, and learn how to get more insights about why.

Balancing concurrency with consistency is an inherent part of the MVCC system that PostgreSQL uses. One of the operational problems that can occur with this system, is that queries get blocked waiting to acquire a lock, and that wait time can be excessive, causing errors.

In order to understand what’s happening with near real-time visibility, PostgreSQL provides system views like pg_locks and pg_stat_activity that can be queried to see what is currently executing. Is that level of visibility enough? If not, what other opportunities are there?

Knowledge and Observability

When a query is blocked and waiting to acquire a lock, we usually want to get more information when debugging.

The query holding the lock is the “blocking” query. A waiting query and a blocking query don’t always form a one-to-one relationship though. There may be multiple levels of blocking and waiting.

Real-time observability

In Postgres, we have “real-time” visibility using pg_stat_activity.

We can find queries in a “waiting” state:

SELECT
    pid,
    wait_event_type,
    wait_event,
    LEFT (query,
        60) AS query,
    backend_start,
    query_start,
    (CURRENT_TIMESTAMP - query_start) AS ago
FROM
    pg_stat_activity
WHERE
    datname = 'rideshare_development';

We can combine that information with lock information from the pg_locks catalog.

Combining lock information from pg_locks and active query information from pg_stat_activity becomes powerful. The query below joins these sources together.

https://github.com/andyatkinson/pg_scripts/blob/main/lock_blocking_waiting_pg_locks.sql

The result row fields include:

blocked_pid
blocked_user

[...]

A Follow up on Key PostgreSQL Configuration Parameters for Enhanced Performance – Part 2

Posted by semab tariq in Stormatics on 2024-07-22 at 14:44

In a previous blog post, we configured an EC2 instance and installed PostgreSQL on it. After the initial setup, we ran pgbench to measure the initial TPS (transactions per second). Then, we tuned PostgreSQL and reran pgbench to see how much we could improve the TPS. To tune PostgreSQL, we adjusted various parameters based on the system's available resources. In this blog, we will identify which of those parameters contributed the most to the performance improvements.

We will use the same instance size on AWS EC2 as before, which is t2.large. Here is the list of parameters that we initially tuned...

The post A Follow up on Key PostgreSQL Configuration Parameters for Enhanced Performance – Part 2 appeared first on Stormatics.

pg_statviz 0.7 released with new features, PG17 support

Posted by Jimmy Angelakos on 2024-07-22 at 12:37

I'm pleased to announce release 0.7 of pg_statviz, the minimalist extension and utility pair for time series analysis and visualization of PostgreSQL internal statistics.

This release comes with a huge new feature: the implementation of resampling with pandas to permit effectively unlimited data analysis. Now you can take thousands, potentially millions of full statistics snapshots and squash all those data points to just 100 plot points (by default, using downsampling with mean values) to enable you to get a long-term view of your system's performance and statistics.

Over 7 months of wait event statistics squashed down to 100 plot points The chart above shows what 7 months of wait event statistics (over 300,000 data point snapshots) look like, downsampled to just 100 plot points.

Other new features:

The addition of server I/O stats from the view pg_stat_io, which was added in PostgreSQL 16
Update for upcoming PostgreSQL 17 release (pg_stat_bgwriter is broken up!)
The JSONB in the conf table, which stores Postgres configuration snapshots, has been changed from "setting": "work_mem", "value": "4MB" to "work_mem": "4MB" to save approximately half the storage space on disk.

pg_statviz takes the view that everything should be light and minimal. Unlike commercial monitoring platforms, it doesn't require invasive agents or open connections to the database — it all lives inside your database. The extension is plain SQL and pl/pgSQL and doesn't require modules to be loaded, the visualization utility is separate and can be run from anywhere, and your data is free and easy to export.

You can download and install pg_statviz from the PostgreSQL repositories or PGXN.
The utility can also be installed from PyPi.
Manual installation is also possible.

Latest Blog Posts

Posted by Sagar Jadhav in Percona on 2024-08-06 at 12:25

Posted by Hans-Juergen Schoenig in Cybertec on 2024-08-06 at 07:00

Using log_connections in PostgreSQL

Posted by Robert Haas in EDB on 2024-08-05 at 21:16

Posted by Sergey Pronin in Percona on 2024-08-05 at 15:22

Posted by Grant Fritchey on 2024-08-05 at 13:46

Posted by Paul Ramsey in Crunchy Data on 2024-08-05 at 12:00

Posted by Henrietta Dombrovskaya on 2024-08-05 at 03:06

Posted by Andrei Lepikhov in Postgres Professional on 2024-08-05 at 03:00

What Is Analytic Queries

What was the rationale?

Posted by Haki Benita on 2024-08-04 at 21:00

Posted by Tomas Vondra on 2024-08-03 at 10:00

Posted by Jimmy Angelakos in postgres-contrib.org on 2024-08-02 at 19:49

Posted by Karen Jex in Crunchy Data on 2024-08-02 at 11:56

Gathering the Team

Number of Submissions

Posted by Deepak Mahto on 2024-08-01 at 22:16

StatsViaExplainAnalyze – All In One HTML Report.

Simplifying Execution Plan Analysis in PostgreSQL.

How it Works?

Posted by Karen Jex in PostgreSQL Europe on 2024-08-01 at 09:38

Posted by Jimmy Angelakos in postgres-contrib.org on 2024-07-30 at 20:48

Posted by Florent Jardin in Dalibo on 2024-07-30 at 09:30

Posted by Gabriele Bartolini in EDB on 2024-07-30 at 05:40

Posted by Laurenz Albe in Cybertec on 2024-07-30 at 05:00

What is memory overcommit?

Why is memory overcommit a problem for Postgre

Posted by semab tariq in Stormatics on 2024-07-29 at 10:11

Posted by Robert Haas in EDB on 2024-07-29 at 02:42

Posted by Andrei Lepikhov in Postgres Professional on 2024-07-29 at 01:00

Posted by Stefanie Janine on 2024-07-28 at 22:00

PostgreSQL Connection Poolers

Available Connections

Client Side Connection Pooler

Posted by Umair Shahid in Stormatics on 2024-07-26 at 18:23

Posted by Jimmy Angelakos in postgres-contrib.org on 2024-07-24 at 16:06

Posted by David Wheeler in Tembo on 2024-07-24 at 14:00

Normal Extensions

Initializing Extensions

Posted by Anthony Sotolongo León in OnGres on 2024-07-24 at 11:00

Introduction

Tests

INSERT INTO SELECT (IIS)

Posted by Hans-Juergen Schoenig in Cybertec on 2024-07-23 at 08:00

Generating sample data

Posted by Andrew Atkinson on 2024-07-23 at 00:00

Knowledge and Observability

Real-time observability

Posted by semab tariq in Stormatics on 2024-07-22 at 14:44

Posted by Jimmy Angelakos on 2024-07-22 at 12:37

Top posters

Top teams

Feeds

Planet

Contact