When designing large relational databases, we often make a decision to diverge from a normal form, i.e. denormalization.
The reasons for this can be different, such as an attempt to speed up access to the specified data, constraints of the used platform/framework/development tools, and lack of skills of a database developer/designer.
Strictly speaking, a reference to the framework constraints, etc. is actually an attempt to justify the lack of skills.
The denormalized data is a vulnerability, through which it is easy to bring our database to a non-consistent (non-integral) state.
What can we do with this?
In a database, there is a table with some financial operations: the receipt and disposal of funds on different accounts.
We always need to know the balance of the account.
In the normalized data, the fund balance is always a calculated value. We are going to calculate the total of the receipts without debiting.
However, it is too expensive to calculate the balance each time when there are many operations. Therefore, it was decided to store the actual balance in a separate table. How do we update the data in this table?
The solution is ‘as usual’
Almost in all information systems with which I had to work, this task was performed by an external application, which implemented the business logic. You are lucky if the application is simple and there is only one data change point, from the form in the user interface. However, what if there are some imports, APIs, third-party applications, etc. performed by different people and teams? What if there are several tables with totals instead of one? What if there is more than one table with operations?
It is getting harder to monitor whether a developer updated a bunch of tables when updating operations. The data loses integrity. The account balance does not correspond to operations. Of course, testing must reveal such situations. However, our world is not ideal.
As an alternative, triggers are used to control the integrity of denormalized data.
I heard that triggers slow down a database greatly, so using them makes no sense.
The second argument was that all the logic lies in a separate application and keeping business logic in different places is unreasonable.
Let’s figure out.
A trigger fires inside the transaction that modifies the data in the table. The transaction cannot be completed until the trigger has executed the required steps. Therefore, the conclusion is that triggers must be ‘light’.
The example of the ‘heavy’ query in the trigger is as follows:
set total = select sum(operations.amount) from operations where operations.account = current_account
where totals.account = current_account
A query refers to the table with operations and sums up the total amount of operations for the account.
When the database increases, such a query will consume more and more time and resources. However, we can receive the same result using the light query of the following type:
set total = totals.total + current_amount
where totals.account = current_account
When adding a new row, this trigger will simply increase the total by the account without calculating it. The total does not depend on the data amount in tables. It does not make any sense to calculate the total again, as we can be sure that the trigger fires each time when adding a new operation.
Removing or modifying rows is processed in the same way. The triggers of this type will not slow down operations, however, will ensure data coupling and integrity.
Every time I experienced “lags” when adding data to a table with a trigger, it was an example of such a “heavy” query. In most cases, it was possible to rewrite it in an “easy” query.
We must distinguish functions that provide data integrity from the business logic. In each case, I ask a question if the data were normalized, would we need such a function? If positive, the function is business logic. If negative, the function is to provide data integrity. You can wrap these functions into triggers.
However, there is an opinion that it is easy to implement all the business logic through DBMS, such as PostgreSQL or Oracle.
I hope this article will help reduce the number of bugs in your information system.
Of course, I’m far from thinking that everything written here is the ultimate truth. In the real life, of course, everything is much more complicated. Therefore, you must make a decision in each specific case. Use your engineering thinking!
- In the article, I drew attention to the only aspect of using triggers as a powerful tool.
- The approach described in the article allows avoiding indexes in the Operations table, which, in turn, can speed up the process of adding data to this table. At high volumes, this approach easily compensates the time spent on the trigger.
- It is important to understand what tools we need to use for. In this case, you will avoid many issues.
Latest posts by Alexandr Shelemetyev (see all)
- Query Optimization in PostgreSQL. EXPLAIN Basics – Part 3 - November 17, 2017
- Query Optimization in PostgreSQL. EXPLAIN Basics – Part 2 - November 6, 2017
- Query Optimization in PostgreSQL. EXPLAIN Basics – Part 1 - October 23, 2017