fbpx
Skip to content

SQL Subqueries: Queries within Queries

    SQL Subqueries: Queries within Queries

    Last Updated on: 27th February 2024, 08:22 pm

    In the realm of SQL, subqueries, often referred to as nested queries, stand as versatile tools that allow you to nest one query within another. This capability empowers you to perform intricate data manipulations and extract valuable insights from your databases. Here we will delve into the world of SQL subqueries, exploring their syntax, types, and practical examples to unravel their true potential.

    What are SQL Subqueries

    An SQL subquery is a query nested within another query, enabling you to execute complex operations by breaking them down into smaller, more manageable components. These subqueries can appear in various parts of a SQL statement, such as the SELECT, FROM, WHERE, or HAVING clauses, depending on the specific use case.

    Types of SQL Subqueries

    Let’s explore the different types of subqueries and delve into each with examples:

    1. Scalar Subquery

    A scalar subquery returns a single value and can be used in places where an expression is expected to produce a single value.

    Examples

    SELECT column1, (SELECT MAX(column2) FROM table2) AS max_value
    FROM table1;

    In this query, the subquery (SELECT MAX(column2) FROM table2) calculates the maximum value of column2 from table2 and returns it as max_value.

    Example 1: Retrieving Customer Information with Total Orders
    SELECT customer_id, 
           customer_name, 
           (SELECT COUNT(*) FROM orders WHERE orders.customer_id = customers.customer_id) AS total_orders
    FROM customers;
    • Explanation:
    • This query selects customer_id and customer_name from the customers table.
    • Within the SELECT clause, there’s a scalar subquery (SELECT COUNT(*) FROM orders WHERE orders.customer_id = customers.customer_id) which calculates the total number of orders for each customer. It counts the rows in the orders table where the customer_id matches the customer_id in the outer query.
    • The result of the scalar subquery, i.e., the total number of orders for each customer, is aliased as total_orders.
    Example 2: Finding the Highest Salary for Each Employee
    SELECT employee_id, 
           employee_name, 
           (SELECT MAX(salary) FROM salaries WHERE salaries.employee_id = employees.employee_id) AS highest_salary
    FROM employees;
    • Explanation:
    • This query selects employee_id and employee_name from the employees table.
    • Within the SELECT clause, there’s a scalar subquery (SELECT MAX(salary) FROM salaries WHERE salaries.employee_id = employees.employee_id) which calculates the highest salary for each employee. It finds the maximum value of the salary column in the salaries table where the employee_id matches the employee_id in the outer query.
    • The result of the scalar subquery, i.e., the highest salary for each employee, is aliased as highest_salary.
    Example 3: Calculating the Age of Employees
    SELECT employee_id, 
           employee_name, 
           (SELECT YEAR(NOW()) - YEAR(date_of_birth)) AS age
    FROM employees;
    • Explanation:
    • This query selects employee_id and employee_name from the employees table.
    • Within the SELECT clause, there’s a scalar subquery (SELECT YEAR(NOW()) - YEAR(date_of_birth)) which calculates the age of each employee. It subtracts the birth year extracted from the date_of_birth column of the employees table from the current year (retrieved using the NOW() function).
    • The result of the scalar subquery, i.e., the age of each employee, is directly used in the main query.
    Example 4: Determining the Total Price of All Orders
    SELECT (SELECT SUM(price) FROM order_items) AS total_price;
    • Explanation:
    • This query selects the total price of all orders.
    • The scalar subquery (SELECT SUM(price) FROM order_items) calculates the sum of the price column in the order_items table, which represents the total price of all orders.
    • Since this query doesn’t involve a main query, the result of the scalar subquery directly represents the total price.
    Example 5: Counting Employees in Each Department
    SELECT department_id, 
           department_name, 
           (SELECT COUNT(*) FROM employees WHERE employees.department_id = departments.department_id) AS num_employees
    FROM departments;
    • Explanation:
    • This query selects department_id and department_name from the departments table.
    • Within the SELECT clause, there’s a scalar subquery (SELECT COUNT(*) FROM employees WHERE employees.department_id = departments.department_id) which calculates the number of employees in each department. It counts the rows in the employees table where the department_id matches the department_id in the outer query.
    • The result of the scalar subquery, i.e., the number of employees in each department, is aliased as num_employees.

    2. Row Subquery

    A row subquery returns a single row of data and can be used in places where a single row is expected.

    Example

    SELECT column1, column2
    FROM table1
    WHERE (column1, column2) = (SELECT column1, column2 FROM table2);

    Here, the subquery (SELECT column1, column2 FROM table2) returns a single row, which is then compared with each row of table1 to filter the results.

    Example 1: Finding Customers with Specific Orders
    SELECT customer_id, 
           customer_name 
    FROM customers
    WHERE (customer_id, customer_name) IN 
          (SELECT customer_id, customer_name FROM orders WHERE total_amount > 1000);
    • Explanation:
    • This query selects customer_id and customer_name from the customers table.
    • In the WHERE clause, there’s a row subquery (SELECT customer_id, customer_name FROM orders WHERE total_amount > 1000) which retrieves the customer_id and customer_name of customers who have placed orders with a total amount greater than 1000.
    • The outer query filters customers based on whether their customer_id and customer_name are present in the result set of the row subquery.
    Example 2: Selecting Employees with Highest Salaries
    SELECT employee_id, 
           employee_name 
    FROM employees 
    WHERE (employee_id, employee_name) = 
          (SELECT employee_id, employee_name FROM salaries ORDER BY salary DESC LIMIT 1);
    • Explanation:
    • This query selects employee_id and employee_name from the employees table.
    • In the WHERE clause, there’s a row subquery (SELECT employee_id, employee_name FROM salaries ORDER BY salary DESC LIMIT 1) which retrieves the employee_id and employee_name of the employee with the highest salary by sorting salaries in descending order and selecting the top record using LIMIT.
    • The outer query filters employees based on whether their employee_id and employee_name match those of the employee with the highest salary.
    Example 3: Retrieving Products with the Most Sales
    SELECT product_id, 
           product_name 
    FROM products 
    WHERE (product_id, product_name) = 
          (SELECT product_id, product_name FROM sales ORDER BY quantity_sold DESC LIMIT 1);
    • Explanation:
    • This query selects product_id and product_name from the products table.
    • In the WHERE clause, there’s a row subquery (SELECT product_id, product_name FROM sales ORDER BY quantity_sold DESC LIMIT 1) which retrieves the product_id and product_name of the product with the highest quantity sold by sorting sales in descending order of quantity sold and selecting the top record.
    • The outer query filters products based on whether their product_id and product_name match those of the product with the highest quantity sold.
    Example 4: Identifying Employees with Birthdays Today
    SELECT employee_id, 
           employee_name 
    FROM employees 
    WHERE (EXTRACT(MONTH FROM date_of_birth), EXTRACT(DAY FROM date_of_birth)) = 
          (SELECT EXTRACT(MONTH FROM CURRENT_DATE), EXTRACT(DAY FROM CURRENT_DATE));
    • Explanation:
    • This query selects employee_id and employee_name from the employees table.
    • In the WHERE clause, there’s a row subquery (SELECT EXTRACT(MONTH FROM CURRENT_DATE), EXTRACT(DAY FROM CURRENT_DATE)) which retrieves the month and day of the current date.
    • The outer query filters employees based on whether their birth month and day match those of the current date.
    Example 5: Filtering Orders Placed in the Last Month
    SELECT order_id, 
           order_date 
    FROM orders 
    WHERE (EXTRACT(MONTH FROM order_date), EXTRACT(YEAR FROM order_date)) = 
          (SELECT EXTRACT(MONTH FROM DATE_SUB(CURRENT_DATE, INTERVAL 1 MONTH)), 
                  EXTRACT(YEAR FROM DATE_SUB(CURRENT_DATE, INTERVAL 1 MONTH)));
    • Explanation:
    • This query selects order_id and order_date from the orders table.
    • In the WHERE clause, there’s a row subquery (SELECT EXTRACT(MONTH FROM DATE_SUB(CURRENT_DATE, INTERVAL 1 MONTH)), EXTRACT(YEAR FROM DATE_SUB(CURRENT_DATE, INTERVAL 1 MONTH))) which retrieves the month and year of the date one month ago.
    • The outer query filters orders based on whether their order month and year match those of the date one month ago.

    3. Table Subquery

    A table subquery returns a result set of rows and columns, just like a regular table, and can be used in places where a table or view is expected.

    Example

    SELECT column1
    FROM (SELECT column1, column2 FROM table2) AS subquery_table
    WHERE column2 = 'value';

    The subquery (SELECT column1, column2 FROM table2) generates a temporary table subquery_table, which is then used in the outer query to filter rows based on the condition column2 = 'value'.

    Example 1: Filtering Products with Sales Greater Than Average
    SELECT product_id, 
           product_name 
    FROM products 
    WHERE product_id IN 
          (SELECT product_id FROM sales WHERE total_sales > (SELECT AVG(total_sales) FROM sales));
    • Explanation:
    • This query selects product_id and product_name from the products table.
    • In the WHERE clause, there’s a table subquery (SELECT AVG(total_sales) FROM sales) which calculates the average total sales across all products.
    • The outer query filters products based on whether their product_id is present in the result set of the inner subquery, where the total sales are greater than the average total sales.
    Example 2: Retrieving Employees in Departments with More Than Two Employees
    SELECT employee_id, 
           employee_name 
    FROM employees 
    WHERE department_id IN 
          (SELECT department_id FROM 
              (SELECT department_id, COUNT(*) AS num_employees FROM employees GROUP BY department_id) AS dept_count 
           WHERE num_employees > 2);
    • Explanation:
    • This query selects employee_id and employee_name from the employees table.
    • In the WHERE clause, there’s a table subquery (SELECT department_id, COUNT(*) AS num_employees FROM employees GROUP BY department_id) which calculates the number of employees in each department.
    • The outer query filters employees based on whether their department_id is present in the result set of the inner subquery, where the number of employees in the department is greater than 2.
    Example 3: Finding Orders Placed by High-Value Customers
    SELECT order_id, 
           order_date 
    FROM orders 
    WHERE customer_id IN 
          (SELECT customer_id FROM 
              (SELECT customer_id, SUM(total_amount) AS total_spent FROM orders GROUP BY customer_id) AS customer_spend 
           WHERE total_spent > 1000);
    • Explanation:
    • This query selects order_id and order_date from the orders table.
    • In the WHERE clause, there’s a table subquery (SELECT customer_id, SUM(total_amount) AS total_spent FROM orders GROUP BY customer_id) which calculates the total amount spent by each customer.
    • The outer query filters orders based on whether their customer_id is present in the result set of the inner subquery, where the total amount spent by the customer is greater than 1000.
    Example 4: Identifying Employees with No Assigned Tasks
    SELECT employee_id, 
           employee_name 
    FROM employees 
    WHERE employee_id NOT IN (SELECT DISTINCT employee_id FROM tasks);
    • Explanation:
    • This query selects employee_id and employee_name from the employees table.
    • In the WHERE clause, there’s a table subquery (SELECT DISTINCT employee_id FROM tasks) which retrieves the distinct employee_id values from the tasks table, representing employees with assigned tasks.
    • The outer query filters employees based on whether their employee_id is not present in the result set of the inner subquery, indicating employees with no assigned tasks.
    Example 5: Selecting Products with No Sales
    SELECT product_id, 
           product_name 
    FROM products 
    WHERE product_id NOT IN (SELECT DISTINCT product_id FROM sales);
    • Explanation:
    • This query selects product_id and product_name from the products table.
    • In the WHERE clause, there’s a table subquery (SELECT DISTINCT product_id FROM sales) which retrieves the distinct product_id values from the sales table, representing products with sales.
    • The outer query filters products based on whether their product_id is not present in the result set of the inner subquery, indicating products with no sales.

    4. Correlated Subquery

    A correlated subquery depends on the outer query for its values and executes once for each row processed by the outer query.

    Example

    SELECT column1
    FROM table1 t1
    WHERE column2 = (SELECT MAX(column2) FROM table1 WHERE t1.column3 = table1.column3);

    In this correlated subquery, the inner query (SELECT MAX(column2) FROM table1 WHERE t1.column3 = table1.column3) is executed for each row of table1 in the outer query.

    Example 1: Finding Employees with Salaries Higher Than Average
    SELECT employee_id, 
           employee_name 
    FROM employees e_outer 
    WHERE salary > 
          (SELECT AVG(salary) FROM employees e_inner WHERE e_outer.department_id = e_inner.department_id);
    • Explanation:
    • This query selects employee_id and employee_name from the employees table.
    • In the WHERE clause, there’s a correlated subquery (SELECT AVG(salary) FROM employees e_inner WHERE e_outer.department_id = e_inner.department_id) which calculates the average salary within each employee’s department.
    • The outer query filters employees based on whether their salary is higher than the average salary of their respective department, hence the correlation between the outer and inner queries.
    Example 2: Selecting Orders Placed by High-Value Customers
    SELECT order_id, 
           order_date 
    FROM orders o_outer 
    WHERE total_amount > 
          (SELECT AVG(total_amount) FROM orders o_inner WHERE o_outer.customer_id = o_inner.customer_id);
    • Explanation:
    • This query selects order_id and order_date from the orders table.
    • In the WHERE clause, there’s a correlated subquery (SELECT AVG(total_amount) FROM orders o_inner WHERE o_outer.customer_id = o_inner.customer_id) which calculates the average total amount spent by each customer.
    • The outer query filters orders based on whether their total amount is higher than the average total amount spent by their respective customer, establishing a correlation between the outer and inner queries.
    Example 3: Identifying Customers with More Orders Than Average
    SELECT customer_id, 
           customer_name 
    FROM customers c_outer 
    WHERE (SELECT COUNT(*) FROM orders WHERE customer_id = c_outer.customer_id) > 
          (SELECT AVG(num_orders) FROM (SELECT COUNT(*) AS num_orders FROM orders GROUP BY customer_id) o_inner);
    • Explanation:
    • This query selects customer_id and customer_name from the customers table.
    • In the WHERE clause, there are two correlated subqueries:
      • (SELECT COUNT(*) FROM orders WHERE customer_id = c_outer.customer_id) calculates the number of orders placed by each customer.
      • (SELECT AVG(num_orders) FROM (SELECT COUNT(*) AS num_orders FROM orders GROUP BY customer_id) o_inner) calculates the average number of orders across all customers.
    • The outer query filters customers based on whether they have more orders than the average number of orders, establishing a correlation between the outer and inner queries.
    Example 4: Selecting Employees with Higher Tenure Than Average
    SELECT employee_id, 
           employee_name 
    FROM employees e_outer 
    WHERE (SELECT DATEDIFF(CURDATE(), hire_date)) > 
          (SELECT AVG(DATEDIFF(CURDATE(), hire_date)) FROM employees e_inner WHERE e_outer.department_id = e_inner.department_id);
    • Explanation:
    • This query selects employee_id and employee_name from the employees table.
    • In the WHERE clause, there’s a correlated subquery (SELECT AVG(DATEDIFF(CURDATE(), hire_date)) FROM employees e_inner WHERE e_outer.department_id = e_inner.department_id) which calculates the average tenure (in days) within each employee’s department.
    • The outer query filters employees based on whether their tenure (calculated as the difference between the current date and the hire date) is higher than the average tenure of their respective department, establishing a correlation between the outer and inner queries.
    Example 5: Retrieving Orders with Total Amounts Higher Than the Average of the Previous Month
    SELECT order_id, 
           order_date 
    FROM orders o_outer 
    WHERE total_amount > 
          (SELECT AVG(total_amount) FROM orders o_inner WHERE MONTH(o_outer.order_date) = MONTH(o_inner.order_date) - 1);
    • Explanation:
    • This query selects order_id and order_date from the orders table.
    • In the WHERE clause, there’s a correlated subquery (SELECT AVG(total_amount) FROM orders o_inner WHERE MONTH(o_outer.order_date) = MONTH(o_inner.order_date) - 1) which calculates the average total amount of orders placed in the previous month.
    • The outer query filters orders based on whether their total amount is higher than the average total amount of orders placed in the previous month, establishing a correlation between the outer and inner queries.

    Best Practices for Using SQL Subqueries

    When working with subqueries, it’s essential to follow best practices to ensure efficiency and readability:

    • Optimize Subqueries: Avoid nesting excessive subqueries and optimize them for performance by ensuring proper indexing and using appropriate join techniques.
    • Use Meaningful Aliases: Assign meaningful aliases to subqueries and their columns to enhance readability and maintainability.
    • Consider Performance Impact: Be mindful of the performance implications of using subqueries, especially in scenarios involving large datasets.

    Conclusion

    SQL subqueries serve as powerful tools for breaking down complex operations into manageable components, allowing for more flexible and expressive queries. By understanding the various types of subqueries and their practical applications, you can leverage their capabilities to extract valuable insights and manipulate data with ease. Remember to apply best practices and considerations when incorporating subqueries into your SQL statements to ensure optimal performance and maintainability.

    Share this post on social!

    Comment on Post

    Your email address will not be published. Required fields are marked *