I think 2024.pgconf.dev was a great event. I am really grateful to the organizing team for all the work that they did to put this event together, and I think they did a great job. I feel that it was really productive for me and for the PostgreSQL development community as a whole. Like most things in life, it was not perfect. But it was really good, and I'm looking forward to going back next year. It was also a blast to see Professor Margo Seltzer again; I worked for her as a research assistant many years ago. She gave a wonderful keynote.
Thursday, June 06, 2024
Wednesday, May 01, 2024
Hacking on PostgreSQL is Really Hard
Hacking on PostgreSQL is really hard. I think a lot of people would agree with this statement, not all for the same reasons. Some might point to the character of discourse on the mailing list, others to the shortage of patch reviewers, and others still to the difficulty of getting the attention of a committer, or of feeling like a hostage to some committer's whimsy. All of these are problems, but today I want to focus on the purely technical aspect of the problem: the extreme difficulty of writing reasonably correct patches.
Monday, January 29, 2024
Who Contributed to PostgreSQL Development in 2023?
As in previous years, I've pulled together a few statistics on code contributions to PostgreSQL. See previous posts in this series for methodology and caveats. I calculate that, in 2023, there were 221 people who were the principal author of at least one PostgreSQL commit. 66% of the new lines of code were contributed by one of 18 people, and 90% of the new lines of code were contributed by one of 50 people. Here they are. Asterisks indicate non-committers.
# | author | lines | pct_lines | commits
----+----------------------------------+-------+-----------+---------
1 | Tom Lane | 15686 | 9.27 | 225
2 | Robert Haas | 12272 | 7.25 | 42
3 | Jeff Davis | 9035 | 5.34 | 61
4 | Alvaro Herrera | 8750 | 5.17 | 51
5 | Peter Eisentraut | 8301 | 4.91 | 240
6 | Michael Paquier | 7404 | 4.38 | 111
7 | Nikita Glukhov [*] | 6880 | 4.07 | 3
8 | Andres Freund | 6510 | 3.85 | 114
9 | Hou Zhijie [*] | 4956 | 2.93 | 24
10 | Heikki Linnakangas | 4389 | 2.59 | 48
11 | Bruce Momjian | 4259 | 2.52 | 95
12 | Melanie Plageman [*] | 4220 | 2.49 | 44
13 | Nathan Bossart | 3982 | 2.35 | 69
14 | David Rowley | 3923 | 2.32 | 65
15 | Thomas Munro | 3731 | 2.21 | 83
16 | Bertrand Drouvot [*] | 3398 | 2.01 | 33
17 | Joseph Koshakow [*] | 2893 | 1.71 | 9
18 | Tomas Vondra | 2481 | 1.47 | 29
19 | Georgios Kokolatos [*] | 2464 | 1.46 | 7
20 | Andrey Lepikhov [*] | 2455 | 1.45 | 2
21 | Dean Rasheed | 2382 | 1.41 | 23
22 | Amit Langote | 2117 | 1.25 | 27
23 | Pavel Stehule [*] | 1879 | 1.11 | 2
24 | Bharath Rupireddy [*] | 1825 | 1.08 | 36
25 | Richard Guo [*] | 1710 | 1.01 | 40
26 | Daniel Gustafsson | 1652 | 0.98 | 47
27 | Juan Jose Santamaria Flecha [*] | 1650 | 0.98 | 1
28 | Brar Piening [*] | 1512 | 0.89 | 3
29 | Peter Geoghegan | 1471 | 0.87 | 39
30 | Hayato Kuroda [*] | 1410 | 0.83 | 18
31 | Dag Lem [*] | 1315 | 0.78 | 1
32 | Jacob Champion [*] | 1287 | 0.76 | 10
33 | Jelte Fennema [*] | 1205 | 0.71 | 11
34 | Justin Pryzby [*] | 1018 | 0.60 | 13
35 | Alexander Korotkov | 975 | 0.58 | 27
36 | Jim Jones [*] | 941 | 0.56 | 2
37 | Stephen Frost | 875 | 0.52 | 8
38 | Tommy Pavlicek [*] | 866 | 0.51 | 1
39 | Onder Kalaci [*] | 852 | 0.50 | 4
40 | Anastasia Lubennikova [*] | 830 | 0.49 | 1
41 | Masahiro Ikeda [*] | 780 | 0.46 | 9
42 | Andrei Zubkov [*] | 749 | 0.44 | 2
43 | Alexander Pyhalov [*] | 725 | 0.43 | 2
44 | Matthias van de Meent [*] | 716 | 0.42 | 7
45 | Alexander Lakhin [*] | 695 | 0.41 | 22
46 | Andrew Dunstan | 686 | 0.41 | 20
47 | John Naylor | 653 | 0.39 | 9
48 | Konstantin Knizhnik [*] | 644 | 0.38 | 2
49 | Maxim Orlov [*] | 635 | 0.38 | 5
50 | Vignesh C [*] | 626 | 0.37 | 14
As usual, I'm also interested in which committers did the most work to commit patches for which they themselves were not the principal author. Here's how that looked in 2023.
# | committer | lines | pct_lines | commits
----+--------------------+-------+-----------+---------
1 | Tom Lane | 13527 | 18.24 | 113
2 | Michael Paquier | 10959 | 14.78 | 209
3 | Amit Kapila | 9119 | 12.30 | 78
4 | Alexander Korotkov | 6448 | 8.70 | 26
5 | Alvaro Herrera | 5850 | 7.89 | 18
6 | Tomas Vondra | 4265 | 5.75 | 18
7 | Andres Freund | 4239 | 5.72 | 40
8 | Daniel Gustafsson | 4228 | 5.70 | 55
9 | Dean Rasheed | 3571 | 4.82 | 9
10 | Peter Eisentraut | 2948 | 3.98 | 45
11 | David Rowley | 1914 | 2.58 | 41
12 | Amit Langote | 1398 | 1.89 | 3
13 | Andrew Dunstan | 1021 | 1.38 | 10
14 | Robert Haas | 1007 | 1.36 | 15
15 | Masahiko Sawada | 904 | 1.22 | 7
16 | Nathan Bossart | 600 | 0.81 | 19
17 | Thomas Munro | 497 | 0.67 | 10
18 | Peter Geoghegan | 455 | 0.61 | 7
19 | John Naylor | 234 | 0.32 | 6
20 | Bruce Momjian | 221 | 0.30 | 26
21 | Noah Misch | 212 | 0.29 | 5
22 | Heikki Linnakangas | 208 | 0.28 | 13
23 | Tatsuo Ishii | 121 | 0.16 | 3
24 | Jeff Davis | 98 | 0.13 | 9
25 | Etsuro Fujita | 94 | 0.13 | 1
26 | Stephen Frost | 7 | 0.01 | 1
27 | Fujii Masao | 1 | 0.00 | 1
Finally, here are people who sent at least 100 emails to pgsql-hackers in 2023.
count | name
-------+-----------------------
1772 | Tom Lane
1690 | Andres Freund
1508 | Michael Paquier
1020 | Nathan Bossart
988 | Amit Kapila
793 | Peter Eisentraut
775 | Robert Haas
558 | Tomas Vondra
528 | Thomas Munro
520 | Daniel Gustafsson
516 | Alvaro Herrera
510 | Peter Geoghegan
500 | Jeff Davis
463 | Peter Smith
416 | David Rowley
402 | Andrew Dunstan
384 | Bertrand Drouvot
382 | Hayato Kuroda
372 | Bruce Momjian
340 | Justin Pryzby
337 | Masahiko Sawada
320 | Vignesh C
319 | Kyotaro Horiguchi
316 | Bharath Rupireddy
294 | Pavel Stehule
281 | Richard Guo
263 | Ashutosh Bapat
259 | Melanie Plageman
253 | John Naylor
243 | Aleksander Alekseev
226 | Matthias Van De Meent
212 | Heikki Linnakangas
208 | Zhijie Hou
206 | Jian He
203 | Tristan Partin
197 | Shveta Malik
184 | Jacob Champion
178 | Amit Langote
177 | Laurenz Albe
163 | Jelte Fennema
161 | David G. Johnston
160 | Dean Rasheed
154 | Dilip Kumar
148 | Tatsuo Ishii
144 | Stephen Frost
144 | Jonathan S. Katz
142 | Alexander Korotkov
131 | Karl O. Pinc
124 | Julien Rouhaud
124 | Alexander Lakhin
115 | Noah Misch
113 | Joe Conway
101 | Vik Fearing
100 | Gurjeet Singh
As always, it's important to keep in mind that there are many important contributions to the PostgreSQL project other than development, and that these statistics don't even fully or entirely accurately capture the work that goes into development. I present this just as an aid to understanding some of what goes on in the development community, not in any way the last word.
Tuesday, January 09, 2024
Incremental Backups: Evergreen and Other Use Cases
As of this writing, I know of three ways to make use of the incremental backup feature that I committed near the end of last month. I'll be interested to see how people deploy in practice. The first idea is to replace some of the full backups you're currently doing with incremental backups, saving backup time and network transfer. The second idea is to do just as many full backups as you do now, but add incremental backups between them, so that if you need to do PITR, you can use pg_combinebackup to reach the latest incremental backup before the point to which you want to recover, reducing the amount of WAL that you need to replay, and probably speeding up the process quite a bit. The third idea is to give up on taking full backups altogether and only ever take incremental backups.
Wednesday, January 03, 2024
Incremental Backup: What To Copy?
Five days before Christmas I committed my patch to add incremental backup to PostgreSQL. Actually, I've been committing preparatory patches for some months now, but December 20 saw the two main patches land. Since then, there's been a bunch of bug-fix commits, and there are still a few pending items that need to be addressed, but the core of the feature is now committed. If you want a quick overview of the feature, Lukas Fittl has a great video about that. Here, I'd like to talk about the architecture of the feature itself in a little more detail, and specifically with how we decide which data to copy.
Wednesday, December 20, 2023
Praise, Criticism, and Dialogue
Wednesday, June 14, 2023
The PostgreSQL Documentation and the Limitations of Community
In my opinion, the PostgreSQL documentation is simultaneously excellent and fairly poor, and both its excellence and its shortcomings are direct results of the process by which the documentation is produced. The PostgreSQL documentation is stored in the same git repository as the source code, and anyone who patches the source code so as to change documented behavior must also patch the documentation to match.