Robert Haas: glibc

Showing posts with label glibc. Show all posts

Monday, November 29, 2021

Collation Stability

When PostgreSQL needs to sort strings, it relies on either the operating system (by default) or the ICU collation library (if your PostgreSQL has been built with support for ICU and you have chosen to use an ICU-based collation) to tell it in what order the strings ought to be sorted. Unfortunately, operating system behaviors are confusing and inconsistent, and they change relatively frequently for reasons that most people can't understand. That's a problem for PostgreSQL users, especially PostgreSQL users who create indexes on text columns. The first step in building a btree index to sort the data, and if this sort order differs from the one used for later index lookups, data that is actually present in the index may not be found, and your queries may return wrong answers.

The Perils of Collation-Aware Comparisons

If you use psql's \l command to list all the databases in your PostgreSQL cluster, you'll notice a column in the output labelled "Collate"; on my system, that column has the value "en_US.UTF-8". This means that when, for example, you sort strings, you'll use the "English" sort order rather than the traditional "C" sort order. You might ask: what's the difference?

In the "C" sort order, all capital letters come before all lower-case letters, whereas in en_US.UTF8, a comes before A which comes before b which comes before B, and so on. In other words, every collation can have its own rules for sorting strings, consistent with the way that the people who speak that language like to alphabetize things; or at least with the way that the people who wrote the locale definitions for that language think that they like to alphabetize things. Collations are OS-dependent: some operating systems don't support them at all, while Windows has a completely different naming convention from every other operating system, and probably different behaviors as well.

Linux and glibc Scalability

As some of you probably already know from following the traffic on pgsql-hackers, I've been continuing to beat away at the scalability issues around PostgreSQL. Interestingly, the last two problems I found turned out, somewhat unexpectedly, not to be internal bottlenecks in PostgreSQL. Instead, they were bottlenecks with other software with which PostgreSQL was interacting during the test runs.

Robert Haas

Monday, November 29, 2021

Collation Stability

Monday, March 05, 2012

The Perils of Collation-Aware Comparisons

Thursday, August 04, 2011

Linux and glibc Scalability