Last Updated on: 27th February 2024, 08:22 pm
In the realm of SQL, subqueries, often referred to as nested queries, stand as versatile tools that allow you to nest one query within another. This capability empowers you to perform intricate data manipulations and extract valuable insights from your databases. Here we will delve into the world of SQL subqueries, exploring their syntax, types, and practical examples to unravel their true potential.
What are SQL Subqueries
An SQL subquery is a query nested within another query, enabling you to execute complex operations by breaking them down into smaller, more manageable components. These subqueries can appear in various parts of a SQL statement, such as the SELECT, FROM, WHERE, or HAVING clauses, depending on the specific use case.
Types of SQL Subqueries
Let’s explore the different types of subqueries and delve into each with examples:
1. Scalar Subquery
A scalar subquery returns a single value and can be used in places where an expression is expected to produce a single value.
Examples
SELECT column1, (SELECT MAX(column2) FROM table2) AS max_value FROM table1;
In this query, the subquery (SELECT MAX(column2) FROM table2)
calculates the maximum value of column2
from table2
and returns it as max_value
.
Example 1: Retrieving Customer Information with Total Orders
SELECT customer_id, customer_name, (SELECT COUNT(*) FROM orders WHERE orders.customer_id = customers.customer_id) AS total_orders FROM customers;
- Explanation:
- This query selects
customer_id
andcustomer_name
from thecustomers
table. - Within the SELECT clause, there’s a scalar subquery
(SELECT COUNT(*) FROM orders WHERE orders.customer_id = customers.customer_id)
which calculates the total number of orders for each customer. It counts the rows in theorders
table where thecustomer_id
matches thecustomer_id
in the outer query. - The result of the scalar subquery, i.e., the total number of orders for each customer, is aliased as
total_orders
.
Example 2: Finding the Highest Salary for Each Employee
SELECT employee_id, employee_name, (SELECT MAX(salary) FROM salaries WHERE salaries.employee_id = employees.employee_id) AS highest_salary FROM employees;
- Explanation:
- This query selects
employee_id
andemployee_name
from theemployees
table. - Within the SELECT clause, there’s a scalar subquery
(SELECT MAX(salary) FROM salaries WHERE salaries.employee_id = employees.employee_id)
which calculates the highest salary for each employee. It finds the maximum value of thesalary
column in thesalaries
table where theemployee_id
matches theemployee_id
in the outer query. - The result of the scalar subquery, i.e., the highest salary for each employee, is aliased as
highest_salary
.
Example 3: Calculating the Age of Employees
SELECT employee_id, employee_name, (SELECT YEAR(NOW()) - YEAR(date_of_birth)) AS age FROM employees;
- Explanation:
- This query selects
employee_id
andemployee_name
from theemployees
table. - Within the SELECT clause, there’s a scalar subquery
(SELECT YEAR(NOW()) - YEAR(date_of_birth))
which calculates the age of each employee. It subtracts the birth year extracted from thedate_of_birth
column of theemployees
table from the current year (retrieved using theNOW()
function). - The result of the scalar subquery, i.e., the age of each employee, is directly used in the main query.
Example 4: Determining the Total Price of All Orders
SELECT (SELECT SUM(price) FROM order_items) AS total_price;
- Explanation:
- This query selects the total price of all orders.
- The scalar subquery
(SELECT SUM(price) FROM order_items)
calculates the sum of theprice
column in theorder_items
table, which represents the total price of all orders. - Since this query doesn’t involve a main query, the result of the scalar subquery directly represents the total price.
Example 5: Counting Employees in Each Department
SELECT department_id, department_name, (SELECT COUNT(*) FROM employees WHERE employees.department_id = departments.department_id) AS num_employees FROM departments;
- Explanation:
- This query selects
department_id
anddepartment_name
from thedepartments
table. - Within the SELECT clause, there’s a scalar subquery
(SELECT COUNT(*) FROM employees WHERE employees.department_id = departments.department_id)
which calculates the number of employees in each department. It counts the rows in theemployees
table where thedepartment_id
matches thedepartment_id
in the outer query. - The result of the scalar subquery, i.e., the number of employees in each department, is aliased as
num_employees
.
2. Row Subquery
A row subquery returns a single row of data and can be used in places where a single row is expected.
Example
SELECT column1, column2 FROM table1 WHERE (column1, column2) = (SELECT column1, column2 FROM table2);
Here, the subquery (SELECT column1, column2 FROM table2)
returns a single row, which is then compared with each row of table1
to filter the results.
Example 1: Finding Customers with Specific Orders
SELECT customer_id, customer_name FROM customers WHERE (customer_id, customer_name) IN (SELECT customer_id, customer_name FROM orders WHERE total_amount > 1000);
- Explanation:
- This query selects
customer_id
andcustomer_name
from thecustomers
table. - In the WHERE clause, there’s a row subquery
(SELECT customer_id, customer_name FROM orders WHERE total_amount > 1000)
which retrieves thecustomer_id
andcustomer_name
of customers who have placed orders with a total amount greater than 1000. - The outer query filters customers based on whether their
customer_id
andcustomer_name
are present in the result set of the row subquery.
Example 2: Selecting Employees with Highest Salaries
SELECT employee_id, employee_name FROM employees WHERE (employee_id, employee_name) = (SELECT employee_id, employee_name FROM salaries ORDER BY salary DESC LIMIT 1);
- Explanation:
- This query selects
employee_id
andemployee_name
from theemployees
table. - In the WHERE clause, there’s a row subquery
(SELECT employee_id, employee_name FROM salaries ORDER BY salary DESC LIMIT 1)
which retrieves theemployee_id
andemployee_name
of the employee with the highest salary by sorting salaries in descending order and selecting the top record using LIMIT. - The outer query filters employees based on whether their
employee_id
andemployee_name
match those of the employee with the highest salary.
Example 3: Retrieving Products with the Most Sales
SELECT product_id, product_name FROM products WHERE (product_id, product_name) = (SELECT product_id, product_name FROM sales ORDER BY quantity_sold DESC LIMIT 1);
- Explanation:
- This query selects
product_id
andproduct_name
from theproducts
table. - In the WHERE clause, there’s a row subquery
(SELECT product_id, product_name FROM sales ORDER BY quantity_sold DESC LIMIT 1)
which retrieves theproduct_id
andproduct_name
of the product with the highest quantity sold by sorting sales in descending order of quantity sold and selecting the top record. - The outer query filters products based on whether their
product_id
andproduct_name
match those of the product with the highest quantity sold.
Example 4: Identifying Employees with Birthdays Today
SELECT employee_id, employee_name FROM employees WHERE (EXTRACT(MONTH FROM date_of_birth), EXTRACT(DAY FROM date_of_birth)) = (SELECT EXTRACT(MONTH FROM CURRENT_DATE), EXTRACT(DAY FROM CURRENT_DATE));
- Explanation:
- This query selects
employee_id
andemployee_name
from theemployees
table. - In the WHERE clause, there’s a row subquery
(SELECT EXTRACT(MONTH FROM CURRENT_DATE), EXTRACT(DAY FROM CURRENT_DATE))
which retrieves the month and day of the current date. - The outer query filters employees based on whether their birth month and day match those of the current date.
Example 5: Filtering Orders Placed in the Last Month
SELECT order_id, order_date FROM orders WHERE (EXTRACT(MONTH FROM order_date), EXTRACT(YEAR FROM order_date)) = (SELECT EXTRACT(MONTH FROM DATE_SUB(CURRENT_DATE, INTERVAL 1 MONTH)), EXTRACT(YEAR FROM DATE_SUB(CURRENT_DATE, INTERVAL 1 MONTH)));
- Explanation:
- This query selects
order_id
andorder_date
from theorders
table. - In the WHERE clause, there’s a row subquery
(SELECT EXTRACT(MONTH FROM DATE_SUB(CURRENT_DATE, INTERVAL 1 MONTH)), EXTRACT(YEAR FROM DATE_SUB(CURRENT_DATE, INTERVAL 1 MONTH)))
which retrieves the month and year of the date one month ago. - The outer query filters orders based on whether their order month and year match those of the date one month ago.
3. Table Subquery
A table subquery returns a result set of rows and columns, just like a regular table, and can be used in places where a table or view is expected.
Example
SELECT column1 FROM (SELECT column1, column2 FROM table2) AS subquery_table WHERE column2 = 'value';
The subquery (SELECT column1, column2 FROM table2)
generates a temporary table subquery_table
, which is then used in the outer query to filter rows based on the condition column2 = 'value'
.
Example 1: Filtering Products with Sales Greater Than Average
SELECT product_id, product_name FROM products WHERE product_id IN (SELECT product_id FROM sales WHERE total_sales > (SELECT AVG(total_sales) FROM sales));
- Explanation:
- This query selects
product_id
andproduct_name
from theproducts
table. - In the WHERE clause, there’s a table subquery
(SELECT AVG(total_sales) FROM sales)
which calculates the average total sales across all products. - The outer query filters products based on whether their
product_id
is present in the result set of the inner subquery, where the total sales are greater than the average total sales.
Example 2: Retrieving Employees in Departments with More Than Two Employees
SELECT employee_id, employee_name FROM employees WHERE department_id IN (SELECT department_id FROM (SELECT department_id, COUNT(*) AS num_employees FROM employees GROUP BY department_id) AS dept_count WHERE num_employees > 2);
- Explanation:
- This query selects
employee_id
andemployee_name
from theemployees
table. - In the WHERE clause, there’s a table subquery
(SELECT department_id, COUNT(*) AS num_employees FROM employees GROUP BY department_id)
which calculates the number of employees in each department. - The outer query filters employees based on whether their
department_id
is present in the result set of the inner subquery, where the number of employees in the department is greater than 2.
Example 3: Finding Orders Placed by High-Value Customers
SELECT order_id, order_date FROM orders WHERE customer_id IN (SELECT customer_id FROM (SELECT customer_id, SUM(total_amount) AS total_spent FROM orders GROUP BY customer_id) AS customer_spend WHERE total_spent > 1000);
- Explanation:
- This query selects
order_id
andorder_date
from theorders
table. - In the WHERE clause, there’s a table subquery
(SELECT customer_id, SUM(total_amount) AS total_spent FROM orders GROUP BY customer_id)
which calculates the total amount spent by each customer. - The outer query filters orders based on whether their
customer_id
is present in the result set of the inner subquery, where the total amount spent by the customer is greater than 1000.
Example 4: Identifying Employees with No Assigned Tasks
SELECT employee_id, employee_name FROM employees WHERE employee_id NOT IN (SELECT DISTINCT employee_id FROM tasks);
- Explanation:
- This query selects
employee_id
andemployee_name
from theemployees
table. - In the WHERE clause, there’s a table subquery
(SELECT DISTINCT employee_id FROM tasks)
which retrieves the distinctemployee_id
values from thetasks
table, representing employees with assigned tasks. - The outer query filters employees based on whether their
employee_id
is not present in the result set of the inner subquery, indicating employees with no assigned tasks.
Example 5: Selecting Products with No Sales
SELECT product_id, product_name FROM products WHERE product_id NOT IN (SELECT DISTINCT product_id FROM sales);
- Explanation:
- This query selects
product_id
andproduct_name
from theproducts
table. - In the WHERE clause, there’s a table subquery
(SELECT DISTINCT product_id FROM sales)
which retrieves the distinctproduct_id
values from thesales
table, representing products with sales. - The outer query filters products based on whether their
product_id
is not present in the result set of the inner subquery, indicating products with no sales.
4. Correlated Subquery
A correlated subquery depends on the outer query for its values and executes once for each row processed by the outer query.
Example
SELECT column1 FROM table1 t1 WHERE column2 = (SELECT MAX(column2) FROM table1 WHERE t1.column3 = table1.column3);
In this correlated subquery, the inner query (SELECT MAX(column2) FROM table1 WHERE t1.column3 = table1.column3)
is executed for each row of table1
in the outer query.
Example 1: Finding Employees with Salaries Higher Than Average
SELECT employee_id, employee_name FROM employees e_outer WHERE salary > (SELECT AVG(salary) FROM employees e_inner WHERE e_outer.department_id = e_inner.department_id);
- Explanation:
- This query selects
employee_id
andemployee_name
from theemployees
table. - In the WHERE clause, there’s a correlated subquery
(SELECT AVG(salary) FROM employees e_inner WHERE e_outer.department_id = e_inner.department_id)
which calculates the average salary within each employee’s department. - The outer query filters employees based on whether their salary is higher than the average salary of their respective department, hence the correlation between the outer and inner queries.
Example 2: Selecting Orders Placed by High-Value Customers
SELECT order_id, order_date FROM orders o_outer WHERE total_amount > (SELECT AVG(total_amount) FROM orders o_inner WHERE o_outer.customer_id = o_inner.customer_id);
- Explanation:
- This query selects
order_id
andorder_date
from theorders
table. - In the WHERE clause, there’s a correlated subquery
(SELECT AVG(total_amount) FROM orders o_inner WHERE o_outer.customer_id = o_inner.customer_id)
which calculates the average total amount spent by each customer. - The outer query filters orders based on whether their total amount is higher than the average total amount spent by their respective customer, establishing a correlation between the outer and inner queries.
Example 3: Identifying Customers with More Orders Than Average
SELECT customer_id, customer_name FROM customers c_outer WHERE (SELECT COUNT(*) FROM orders WHERE customer_id = c_outer.customer_id) > (SELECT AVG(num_orders) FROM (SELECT COUNT(*) AS num_orders FROM orders GROUP BY customer_id) o_inner);
- Explanation:
- This query selects
customer_id
andcustomer_name
from thecustomers
table. - In the WHERE clause, there are two correlated subqueries:
(SELECT COUNT(*) FROM orders WHERE customer_id = c_outer.customer_id)
calculates the number of orders placed by each customer.(SELECT AVG(num_orders) FROM (SELECT COUNT(*) AS num_orders FROM orders GROUP BY customer_id) o_inner)
calculates the average number of orders across all customers.
- The outer query filters customers based on whether they have more orders than the average number of orders, establishing a correlation between the outer and inner queries.
Example 4: Selecting Employees with Higher Tenure Than Average
SELECT employee_id, employee_name FROM employees e_outer WHERE (SELECT DATEDIFF(CURDATE(), hire_date)) > (SELECT AVG(DATEDIFF(CURDATE(), hire_date)) FROM employees e_inner WHERE e_outer.department_id = e_inner.department_id);
- Explanation:
- This query selects
employee_id
andemployee_name
from theemployees
table. - In the WHERE clause, there’s a correlated subquery
(SELECT AVG(DATEDIFF(CURDATE(), hire_date)) FROM employees e_inner WHERE e_outer.department_id = e_inner.department_id)
which calculates the average tenure (in days) within each employee’s department. - The outer query filters employees based on whether their tenure (calculated as the difference between the current date and the hire date) is higher than the average tenure of their respective department, establishing a correlation between the outer and inner queries.
Example 5: Retrieving Orders with Total Amounts Higher Than the Average of the Previous Month
SELECT order_id, order_date FROM orders o_outer WHERE total_amount > (SELECT AVG(total_amount) FROM orders o_inner WHERE MONTH(o_outer.order_date) = MONTH(o_inner.order_date) - 1);
- Explanation:
- This query selects
order_id
andorder_date
from theorders
table. - In the WHERE clause, there’s a correlated subquery
(SELECT AVG(total_amount) FROM orders o_inner WHERE MONTH(o_outer.order_date) = MONTH(o_inner.order_date) - 1)
which calculates the average total amount of orders placed in the previous month. - The outer query filters orders based on whether their total amount is higher than the average total amount of orders placed in the previous month, establishing a correlation between the outer and inner queries.
Best Practices for Using SQL Subqueries
When working with subqueries, it’s essential to follow best practices to ensure efficiency and readability:
- Optimize Subqueries: Avoid nesting excessive subqueries and optimize them for performance by ensuring proper indexing and using appropriate join techniques.
- Use Meaningful Aliases: Assign meaningful aliases to subqueries and their columns to enhance readability and maintainability.
- Consider Performance Impact: Be mindful of the performance implications of using subqueries, especially in scenarios involving large datasets.
Conclusion
SQL subqueries serve as powerful tools for breaking down complex operations into manageable components, allowing for more flexible and expressive queries. By understanding the various types of subqueries and their practical applications, you can leverage their capabilities to extract valuable insights and manipulate data with ease. Remember to apply best practices and considerations when incorporating subqueries into your SQL statements to ensure optimal performance and maintainability.