Query Optimization in PostgreSQL. EXPLAIN Basics – Part 2

In my previous article, we started to describe the basics of the EXPLAIN command and analyzed what happens in PostgreSQL when executing a query.

I am going to continue writing about the basics of EXPLAIN in PostgreSQL. The information is a short review of Understanding EXPLAIN by Guillaume Lelarge. I highly recommend reading the original since some information is missed out.

Cache

What happens at the physical level when executing our query? Let’s figure it out. I deployed my server on Ubuntu 13.10 and used disk caches of the OS level.

I stop PostgreSQL, commit changes to the file system, clear cache, and run PostgreSQL:

When the cache is cleared, run the query with the BUFFERS option

We read the table by blocks. The cache is empty. We had to access 8334 blocks to read the whole table from the disk.

Buffers: shared read is the number of blocks PostgreSQL reads from the disk.

Run the previous query

Buffers: shared hit is the number of blocks retrieved from the PostgreSQL cache.

With each query, PostgreSQL takes more and more data from the cache, thus, filling its own cache.

Cache read operations are faster than disk read operations. You can see this trend by tracking the Total runtime value.

Cache storage size is defined by the shared_buffers constant in the postgresql.conf file.

WHERE

Add the condition to the query

There are no indexes on the table. When executing the query, each record of the table is scanned sequentially (Seq Scan)and compared with the c1 > 500 condition. If the condition is met, the record is added to the result. Otherwise, it is discarded. Filter indicates this behavior, as well as the cost value increases.

The estimated number of rows decreases.

The original article explains why cost takes this value, and how the estimated number of rows is calculated.

It is time to create indexes.

The estimated number of rows has changed. What about the index?

Only 510 rows of more than 1 million are filtered. PostgreSQL had to read more than 99.9 % of the table.

We will force to use the index by disabling Seq Scan:

In Index Scan and Index Cond, the foo_c1_idx index is used instead of Filter.

When selecting the whole table, using the index will increase the cost and time to execute the query.

Enable Seq Scan:

Modify the query:

Here the planner uses the index.

Now, let’s complicate the value by adding the text field.

As you can see, the foo_c1_idx index is used for c1 < 500. To perform c2 ~~ ‘abcd%’::text, use the filter.

It should be noted that the POSIX format of the LIKE operator is used in the output of the results. If there is only the text field in the condition:

Seq Scan is applied.

Build the index by c2:

The index is not applied because my database for test fields uses the UTF-8 encoding.

When building the index, it is necessary to specify the class of the text_pattern_ops operator:

Great! It worked!

Bitmap Index Scan uses the foo_c2_idx1 index to determine the records we need. Then, PostgreSQL goes to the table (Bitmap Heap Scan) to make sure that these records actually exist. This behavior refers to the versioning of PostgreSQL.

If you select only the field, on which the index is built, instead of the whole row:

Index Only Scan will be performed faster, than Index Scan due to the fact that it is not necessary to read the row of the table:  width=4.

Conclusion

  • Seq Scan reads the whole table
  • Index Scan uses the index for the WHERE statements and reads the table when selecting rows
  • Bitmap Index Scan uses Index Scan and selection control through the table. Effective for a large number of rows.
  • Index Only Scan is the fastest block, which reads only the index.

Further reading:

Query Optimization in PostgreSQL. EXPLAIN Basics – Part 3

Alexandr Shelemetyev

Alexandr Shelemetyev

Since Alexandr was a child, he has been interested in information technologies and has developed simple programs. As time went on, he started to be interested in databases. Alexander got a lot of valuable experience: he developed databases, came up with calculation algorithms, passed data, performed information system integration and was engaged in system administration. Thanks to his experience, Alexander also owned his enterprise.
Alexandr Shelemetyev

Alexandr Shelemetyev

Since Alexandr was a child, he has been interested in information technologies and has developed simple programs. As time went on, he started to be interested in databases. Alexander got a lot of valuable experience: he developed databases, came up with calculation algorithms, passed data, performed information system integration and was engaged in system administration. Thanks to his experience, Alexander also owned his enterprise.