How to Write Complex Queries in SQL

Typical queries in the SELECT * FROM table format are sometimes not enough. When the data for a query is not in one table, but in several, or when it is necessary to specify several selection parameters at once, you will need more sophisticated queries.

This article will explain how to build such queries and provide examples of complex SQL queries.

CosingSight - How to Write Complex Queries in SQL

What Does a Complex Query Look Like?

First, let’s define the conditions for composing the SQL query. In particular, you will need to use the following selection parameters:

  • the names of the tables from which you want to extract data;
  • the values of fields that must be returned to the original ones after making changes to the database;
  • the relationships between tables;
  • the sampling conditions;
  • the auxiliary selection criteria (restrictions, ways of presenting information, type of sorting).

To better understand the topic, let’s consider an example that uses the following four simple tables. The first line is the name of the table that in complex queries acts as a foreign key. We will consider this in detail further with an example:

conditions for composing the SQL query

Each table has rows related to some other tables. We’ll explain why it is necessary further.

Now, let’s look at the basic SQL query:

SELECT * FROM companies WHERE companies_name %STARTSWITH 'P';

The %STARTSWITH predicate selects rows beginning with the specified character/characters.

The result looks like this:

The %STARTSWITH predicate selects rows beginning with the specified character/characters.

Now, let’s consider a complex SQL query:

SELECT 
	companies.companies_name,
	SUM(CASE WHEN call.id IS NOT NULL THEN 1 ELSE 0 END) AS calls,
	AVG(ISNULL(DATEDIFF(SECOND, calls.start_time, calls.end_time),0)) AS avgdifference
FROM companies 
LEFT JOIN offices ON offices.companies_id = companies.id
LEFT JOIN customers ON offices.id = customers.offices_id
LEFT JOIN calls ON calls.customers_id = customers.id
GROUP BY 
	companies.id,
	companies.companies_name
HAVING AVG(ISNULL(DATEDIFF(SECOND, calls.start_time, calls.end_time),0)) > (SELECT AVG(DATEDIFF(SECOND, calls.start_time, calls.end_time)) FROM calls)
ORDER BY calls DESC, companies.id ASC;

The result is the following table:

output of a complex SQL query

The table shows the companies, the corresponding number of phone calls, and their approximate duration.

Further, it lists only those company names where the average call duration is greater than the average call duration in other companies.

What Are the Main Rules for Creating Complex SQL Query?

Let’s try to create a multipurpose algorithm for composing complex queries.

First of all, you need to decide on the tables consisting of the data that takes part in the query.

The example above involves the companies and calls tables. If the tables with the required data are not directly related to each other, you also need to include the intermediate tables that join them.

For this reason, we also connect tables, such as offices and customers, using foreign keys. Therefore, any result of the query with tables from this example will always include the below lines:

SELECT 
	...
FROM companies 
LEFT JOIN offices ON offices.companies_id = companies.id
LEFT JOIN customers ON offices.id = customers.offices_id
LEFT JOIN calls ON calls.customers_id = customers.id
...;

After that, you must test the correctness of the behavior in the following part of the query:

SELECT * FROM companies 
LEFT JOIN offices ON offices.companies_id = companies.id
LEFT JOIN customers ON offices.id = customers.offices_id
LEFT JOIN calls ON calls.customers_id = customers.id;

A combined table suggests the three most important points:

  • Pay attention to the list of fields after SELECT. The operation of reading data from joined tables requires that you specify the name of the table to be joined in the name field.
  • Your complex query will always have the main table (companies). Most of the fields are read from it. The attached table, in our example, uses three tables – offices, customers, and calls. The name is determined after the JOIN operator.
  • In addition to specifying the name of the second table, be sure to specify the condition for performing the join. We will discuss this condition further.
  • The query will display a table with a large number of rows. There is no need to publish it here, as it displays intermediate results. However, you can always check its output yourself. This is very important, as it helps to avoid mistakes in the final result.

Now let’s look at the part of the query that compares call durations within each company and between all companies. We need to calculate the average duration of all calls. Use the following query:

SELECT AVG(DATEDIFF(SECOND, calls.start_time, calls.end_time)) FROM calls
the average duration of all calls

Note that we used the DATEDIFF function which outputs the difference between the specified periods. In our case, the average call duration is equal to 335 seconds.

Now let’s add data on calls from all companies to the query.

SELECT 
	companies.companies_name,
	SUM(CASE WHEN calls.id IS NOT NULL THEN 1 ELSE 0 END) AS calls,
	AVG(ISNULL(DATEDIFF(SECOND, calls.start_time, calls.end_time),0)) AS avgdifference
FROM companies 
LEFT JOIN offices ON offices.companies_id = companies.id
LEFT JOIN customers ON offices.id = customers.offices_id
LEFT JOIN calls ON calls.customers_id = customers.id
GROUP BY 
	companies.id,
	companies.companies_name
ORDER BY calls DESC, companies.id ASC;

In this query,

  • SUM (CASE WHEN calls.id IS NOT NULL THEN 1 ELSE 0 END) – to avoid unnecessary operations, we summarize only existing calls – when the number of calls in a company is not zero. This is very important in large tables with possible null values.
  • AVG (ISNULL (DATEDIFF (SECOND, calls.start_time, calls.end_time), 0)) – the query is identical to the AVG query above. However, here we use the ISNULL operator which replaces NULL with 0. It is necessary for companies with no calls at all.

Our results:

DATEDIFF function which outputs the difference between the specified periods.

We are almost done. The above table presents the list of companies, the corresponding number of calls for each of them, and the average call duration in each of them.

The only thing left is to compare the numbers from the last column with the average duration of all calls from all companies (335 seconds).

If you enter the query that we presented at the very beginning, just adding the HAVING part, you will get what you need.

We strongly recommend adding comments on each line so that you won’t get confused in the future when you need to correct some existing complex SQL queries.

Final Thoughts

Though each complex SQL query requires an individual approach, some recommendations are suitable for the preparation of most such queries.

  • determine which tables will participate in the query;
  • create complex queries from simpler parts;
  • check the accuracy of queries sequentially, in parts;
  • test the accuracy of your query with smaller tables;
  • write detailed comments on each line containing the operand, using the symbols ‘-‘.

Specialized tools make this job much more straightforward. Among them, we’d recommend using the Query Builder – a visual tool that allows constructing even the most complex queries much faster in a visual mode. This tool is available as a stand-alone solution or as a part of the multi-featured dbForge Studio for SQL Server.

We hope that this article has helped you clarify this specific issue.

Philipp Zhenov

Philipp Zhenov

Philipp expertly writes articles on technical topics. He has an extensive experience with both large companies and small businesses in the field of IT, engineering and mechanical engineering.

Leave a Reply

Your email address will not be published. Required fields are marked *