The Database Redemption: Escaping the Shawshank of Poor SQL

Mayur (Do not drink & database)
4 min readJul 5, 2024

--

Every Monday, A financial department’s service piles up a huge backlog of unprocessed events. Application team as usual blamed the poor PostgreSQL database for performance issue and wanted to upscale instance. This single tenant Postgres database is on an EC2 instance, r5b.16xlarge (64vCPUs,512GB) with IO2 block express storage at 32000 iops. Already, it’s costing a fortune to the company.

A beautiful piece of code running in the backbone of the service catering to the finance department looked like this.

with join_tab1 as
(
select ......
from tabl1
where col1 = 'xyz'
),
.
.
.
.
.
join_tab10 as
(
select ......
from tabl10
where col1 = 1234
)
updated_tab1 as (
update tab1
set xx_amount = total_xx_amount,
xy_amount = total_xy_amount
last_modified_at = now()
from ......
where ......
returning ........
),
insert_tab3 as (
insert into tab3
select col1,col2,col3,...coln
from ......
where ......
returning ........
),
updated_tab4 as (
update tab4
set xx_amount = total_xx_amount,
xy_amount = total_xy_amount
last_modified_at = now()
from ......
where ......
returning ........
)
select id_column, count(id_column) processed_ids
from join_tab1, join_tab3, updated_tab4
where join_conditions.....
group by id_column;

Of course, it was longer code and featured a logical naming convention of tables, columns indicating what each section is supposed to be doing for readers. Secops team prevent me from showing it in its full glory.

Yellow line indicates unprocessed events/sources and green indicates processed events/sources.

Unprocessed event pile up on Monday

The database appeared in excellent shape when I examined it. There were no CPU/IO/memory issues or unusual wait events. There were other queries from the same micro-service and Airflow dags that noticed zero performance degradation. In this case, a fluctuating query execution plan seems to be the most logical suspect. A nightmare scenario if you have a query plan that covers more pages than I have ever written on this blog.

Then I begin to take a deep look at the execution plan. Unfortunately, I cannot share with you glorious 3400 lines of explain plan for security reasons despite and would have made my blog insanely long to scroll. I am sharing only a small portion of it but it shows hugh LIO on one of the CTE scans which is repeated throughout the plan on multiple CTEs.

As this is a production issue, and a solution is time-sensitive, I began to split the query into temp tables, analyzed them and ran explain (analyze,buffers) on each separate statement. Surprisingly, each of those separate select statements completed in milliseconds with very low LIO. Obviously, I could not run DMLs on production, but running them as select and explain (analyze,buffers) on them showed no problems. On Monday we got spike of new events that caused parts of execution plan that would go unprocessed to execute. Analyze of temp tables enabled Postgres planner to optimize for the new reality and split helped in avoiding correlated subqueries.

CTEs are great but not always.

There was no doubt about the next steps. We split up queries and used insert into temp tables instead of the WITH clause. As an added benefit of splitting up, you could analyze temp tables on the fly and add indexes if necessary. This time, we only split queries and added intermediate analyses to temp tables. The modified code looked as follows:

create temporary table join_tab1 as
select ......
from tabl1
where col1 = 'xyz';

analyze join_tab1;

.
.
.
.

create temporary table join_tab10 as
select ......
from tabl10
where col1 = 1234;

analyze join_tab10;
.
.
.
.
update tab4
set xx_amount = total_xx_amount,
xy_amount = total_xy_amount
last_modified_at = now()
from ......
where ......;

--follow up query since we removed returning while separating insert/updates
create temporary table updated_tab4 as
select ....
from tab4
where ......;

analyze updated_tab4;
.
.
.
.

select id_column, count(id_column) processed_ids
from join_tab1, join_tab3, updated_tab4
where join_conditions.....
group by id_column;

13:40 is the time the redesigned visually unappealing but efficient code was deployed. I have not heard back from the application team for months, and no expensive hardware upgrade was required.

Processing increased and unprocessed events remained very low even after new event creation rate spike.

So the next time you’re tempted to blame your database for slow performance, pause and ask yourself: Is it the database that’s guilty, or is your code doing the time for crimes it did commit?

PS: Large number of sessions and continuous recreation of temporary tables can bloat pg_attribute. Below simple trick minimizes bloat of pg_catalog tables in case same session is executing queries repeatedly (Ours was a java based service doing exactly this).

-- At the start of processing if not exists avoids recreation within same session
create IF NOT EXISTS temporary table temp_xyz (col_1 datatype………col_n datatype)
-- if same session calls procedure again then cleaning up previous data since we do not use on commit delete.
truncate temp_xyz;

Insert into temp_xyz (col_1……col_n) select col_1…..col_n from abcd…;

-- don't forget analyze which gives edge against complex plans generated by CTE
analyze temp_xyz;

--

--

Mayur (Do not drink & database)

Database Specialist @Veeam, Ex-Wise. Data, Scaling and Database Comedy.