Complete MySQL Tutorial
Master MySQL database management with comprehensive explanations, practical examples, and professional SQL practices.
Introduction to MySQL
Why MySQL?
MySQL provides several advantages for database management:
- Open Source: Free to use with a large community support
- Cross-Platform: Runs on various operating systems
- High Performance: Optimized for speed and efficiency
- Scalability: Handles large amounts of data
- Security: Robust data protection and user management
Key Features of MySQL
MySQL offers a comprehensive set of features for database management:
1. Relational Database
MySQL organizes data into tables with rows and columns, allowing relationships between different data entities.
2. SQL Support
Full support for SQL standards, making it easy to learn and use with existing SQL knowledge.
3. ACID Compliance
Ensures database transactions are processed reliably (Atomicity, Consistency, Isolation, Durability).
4. Client/Server Architecture
Supports multiple clients connecting to a central database server simultaneously.
Basic MySQL Terminology
- Database: A collection of related tables
- Table: A structure that organizes data into rows and columns
- Column: A specific field in a table that holds a particular type of data
- Row: A single record in a table containing values for each column
- Primary Key: A unique identifier for each row in a table
- Foreign Key: A field that links to the primary key of another table
Select a topic from the left sidebar to dive deeper into specific MySQL concepts.
MySQL Installation & Setup
Prerequisites
Before installing MySQL, ensure your system meets these requirements:
- Operating System: Windows 10/11, macOS 10.14+, or Linux (Ubuntu 18.04+, CentOS 7+)
- RAM: Minimum 2GB (4GB recommended for development)
- Storage: At least 2GB free space
- Administrator/root access for installation
- Internet connection for downloading packages
Installation Methods by Platform
Choose the installation method that best fits your operating system:
| Platform | Recommended Method | Alternative Methods |
|---|---|---|
| Windows | MySQL Installer | ZIP Archive, Docker |
| macOS | DMG Package | Homebrew, Docker |
| Linux (Ubuntu/Debian) | APT Repository | Docker, Tarball |
| Linux (CentOS/RHEL) | YUM Repository | Docker, Tarball |
Step-by-Step Installation Guide
1. Windows Installation
Follow these detailed steps for Windows installation:
• Visit https://dev.mysql.com/downloads/installer/
• Download the MySQL Installer (recommended: web community version)
# Step 2: Run the Installer
• Double-click the downloaded .msi file
• Choose "Developer Default" for full development setup
• Click "Execute" to install required products
# Step 3: Product Configuration
• After installation, click "Next" to configure products
• For MySQL Server, choose "Standalone MySQL Server"
• Select "Config Type": Development Computer
• Use default TCP/IP port: 3306
# Step 4: Authentication Method
• Recommended: Use Strong Password Encryption
• Set root password (minimum 8 characters, mix of letters, numbers, symbols)
• Important: Remember this password!
# Step 5: Windows Service
• Configure MySQL as a Windows Service
• Service Name: MySQL80 (default)
• Start MySQL Server at System Startup
# Step 6: Apply Configuration
• Click "Execute" to apply all configurations
• Wait for all operations to complete successfully
• Click "Finish" to complete installation
2. macOS Installation
For macOS users, here's the installation process:
1. Download macOS DMG from MySQL website
2. Double-click the .dmg file to mount it
3. Run the .pkg installer and follow prompts
4. Remember the temporary root password shown at the end
5. Open Terminal and secure installation:
mysql_secure_installation
# Method 2: Homebrew (Alternative)
1. Install Homebrew if not already installed:
/bin/bash -c "$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/HEAD/install.sh)"
2. Install MySQL:
brew install mysql
3. Start MySQL service:
brew services start mysql
4. Secure installation:
mysql_secure_installation
3. Linux Installation (Ubuntu)
For Ubuntu and other Debian-based systems:
sudo apt update
# Install MySQL Server
sudo apt install mysql-server
# Run security script
sudo mysql_secure_installation
# Secure installation prompts:
• Validate Password Plugin: Press y (recommended)
• Password Strength: Choose 2 (STRONG)
• Set root password and confirm
• Remove anonymous users: Press y
• Disallow root login remotely: Press y
• Remove test database: Press y
• Reload privilege tables: Press y
Post-Installation Verification
After installation, verify that MySQL is running correctly:
sudo systemctl status mysql
# Or on older systems:
sudo service mysql status
# Connect to MySQL server
mysql -u root -p
# Once connected, run these verification commands:
SELECT VERSION();
SHOW DATABASES;
STATUS;
+-----------+
| VERSION() |
+-----------+
| 8.0.33 |
+-----------+
# Expected databases:
+--------------------+
| Database |
+--------------------+
| information_schema |
| mysql |
| performance_schema |
| sys |
+--------------------+
Initial Configuration
Configure MySQL for optimal development use:
# Location depends on OS:
# Windows: C:\ProgramData\MySQL\MySQL Server 8.0\my.ini
# Linux: /etc/mysql/my.cnf
# macOS: /usr/local/mysql/my.cnf
# Add these basic settings for development:
[mysqld]
default_authentication_plugin=mysql_native_password
character-set-server=utf8mb4
collation-server=utf8mb4_unicode_ci
max_connections=100
[client]
default-character-set=utf8mb4
Installing MySQL Workbench (GUI Tool)
For a graphical interface, install MySQL Workbench:
# Windows:
• Download .msi installer and run it
• Choose complete installation
• Launch Workbench after installation
# macOS:
• Download .dmg file and drag to Applications
• Launch from Applications folder
# Linux (Ubuntu):
sudo apt install mysql-workbench
Troubleshooting Common Installation Issues
- Port 3306 already in use: Stop other MySQL instances or change port in configuration
- Access denied for root user: Use sudo on Linux/macOS or reset root password
- Service won't start: Check error logs in data directory
- Forgot root password: Use --skip-grant-tables option to reset
- Always run mysql_secure_installation after fresh install
- Use strong passwords for all user accounts
- Regularly update MySQL to latest version
- Configure firewall to restrict database access
- Create separate user accounts for different applications
Next Steps
After successful installation, you're ready to:
- Create your first database
- Learn basic SQL commands
- Set up user accounts and permissions
- Begin developing database-driven applications
Creating Databases in MySQL
Understanding Databases
Before creating databases, it's important to understand what they are:
- Container: A database is a container that holds multiple tables
- Organization: Helps organize related data together
- Security: Provides a security boundary - you can control access per database
- Backup Unit: Makes it easy to backup related data together
Prerequisites
Before creating databases, ensure you have:
- MySQL server installed and running
- Access to MySQL command line or GUI tool
- Appropriate privileges (usually as root user initially)
- Basic understanding of SQL syntax
Step 1: Connect to MySQL Server
First, you need to connect to your MySQL server:
mysql -u root -p
# Alternative: Connect with password in command (less secure)
mysql -u root -pYourPassword
# If successful, you'll see MySQL prompt:
mysql>
Your MySQL connection id is 15
Server version: 8.0.33 MySQL Community Server
mysql>
Step 2: View Existing Databases
Before creating new databases, let's see what already exists:
| Database |
+--------------------+
| information_schema |
| mysql |
| performance_schema |
| sys |
+--------------------+
4 rows in set (0.00 sec)
• information_schema: Contains metadata about all databases
• mysql: Stores user privileges and system settings
• performance_schema: Performance monitoring data
• sys: Provides readable views of performance data
Don't modify these system databases directly!
Step 3: Create Your First Database
Now let's create a simple database. We'll use a school management system as an example:
Understanding the CREATE DATABASE Syntax
Let's break down the command:
- CREATE DATABASE - The SQL command to create a new database
- school_db - The name we're giving to our database
- ; - Required semicolon to end the SQL statement
Step 4: Verify Database Creation
Let's confirm our database was created successfully:
| Database |
+--------------------+
| information_schema |
| mysql |
| performance_schema |
| school_db |
| sys |
+--------------------+
5 rows in set (0.00 sec)
Step 5: Select the Database
Before working with tables, you need to tell MySQL which database to use:
Advanced Database Creation Options
CREATE DATABASE has several optional parameters for more control:
CREATE DATABASE company_db
CHARACTER SET utf8mb4
COLLATE utf8mb4_unicode_ci;
-- Create database only if it doesn't exist (prevents errors)
CREATE DATABASE IF NOT EXISTS inventory_db;
Understanding Character Sets and Collations
- Character Set: Defines what characters can be stored (utf8mb4 supports all Unicode characters)
- Collation: Defines how characters are sorted and compared
- utf8mb4_unicode_ci is recommended for most applications
More Practical Examples
Let's create databases for different types of applications:
CREATE DATABASE ecommerce_db
CHARACTER SET utf8mb4
COLLATE utf8mb4_unicode_ci;
-- Blog application database
CREATE DATABASE blog_db;
-- Employee management system
CREATE DATABASE employee_management;
-- Check all our databases
SHOW DATABASES;
| Database |
+------------------------+
| information_schema |
| blog_db |
| ecommerce_db |
| employee_management |
| mysql |
| performance_schema |
| school_db |
| sys |
+------------------------+
8 rows in set (0.00 sec)
Viewing Database Information
You can get detailed information about a specific database:
SHOW CREATE DATABASE school_db;
-- Check which database is currently selected
SELECT DATABASE();
| Database | Create Database |
+----------+-------------------------------------------------------------------+
| school_db| CREATE DATABASE `school_db` /*!40100 DEFAULT CHARACTER SET utf8mb4 */ |
+----------+-------------------------------------------------------------------+
1 row in set (0.00 sec)
+------------+
| DATABASE() |
+------------+
| school_db |
+------------+
1 row in set (0.00 sec)
Deleting Databases
To remove a database (use with extreme caution!):
DROP DATABASE blog_db;
-- Safer: Only delete if it exists
DROP DATABASE IF EXISTS temp_db;
The DROP DATABASE command permanently deletes the entire database and ALL its contents (tables, data, everything). There is NO UNDO for this operation. Always double-check the database name before executing this command!
Best Practices for Database Creation
- Use descriptive, meaningful names (school_db, not db1)
- Use lowercase letters and underscores (my_database, not MyDatabase)
- Avoid spaces and special characters
- Keep names reasonably short but descriptive
- Always specify CHARACTER SET and COLLATION for consistency
- Use utf8mb4 for international applications
- Plan your database structure before creation
- Create separate databases for different applications
Common Mistakes to Avoid
- Forgetting the semicolon (;) at the end of SQL statements
- Using reserved keywords as database names (like SELECT, TABLE, etc.)
- Not selecting a database with USE before creating tables
- Using DROP DATABASE without double-checking the name
- Creating too many databases when one would suffice
Practice Exercise
Try these exercises to practice what you've learned:
- Create a database named "library_system"
- Create a database named "hospital_management" with utf8mb4 character set
- Use SHOW DATABASES to verify both were created
- Select the "library_system" database
- Use SELECT DATABASE() to confirm it's selected
Creating Tables in MySQL
Understanding Table Structure
Before creating tables, let's understand the components:
- Table Name: Unique identifier for the table within a database
- Columns/Fields: Define the type of data stored (name, age, email, etc.)
- Rows/Records: Individual entries in the table
- Data Types: Specify what kind of data each column can hold
- Constraints: Rules that enforce data integrity
Prerequisites
Before creating tables, ensure you have:
- A database created (we'll use "school_db" from previous lesson)
- Selected the database using USE command
- Basic understanding of data types
- MySQL command line or GUI tool access
Step 1: Select Your Database
First, make sure you're working in the right database:
Step 2: Basic Table Creation Syntax
The basic CREATE TABLE statement follows this pattern:
column1_name data_type constraints,
column2_name data_type constraints,
column3_name data_type constraints,
...
);
Step 3: Create Your First Table - Students
Let's create a simple students table to track basic information:
student_id INT PRIMARY KEY AUTO_INCREMENT,
first_name VARCHAR(50) NOT NULL,
last_name VARCHAR(50) NOT NULL,
email VARCHAR(100) UNIQUE,
date_of_birth DATE,
enrollment_date DATE DEFAULT (CURRENT_DATE)
);
Understanding Our Students Table Structure
Let's break down each column in a proper table format:
| Column Name | Data Type | Constraints | Description |
|---|---|---|---|
| student_id | INT | PRIMARY KEY AUTO_INCREMENT | Unique ID that automatically increases |
| first_name | VARCHAR(50) | NOT NULL | Student's first name (required) |
| last_name | VARCHAR(50) | NOT NULL | Student's last name (required) |
| VARCHAR(100) | UNIQUE | Email address (must be unique) | |
| date_of_birth | DATE | None | Birth date (optional) |
| enrollment_date | DATE | DEFAULT (CURRENT_DATE) | Auto-fills with today's date |
Step 4: Verify Table Creation
Let's confirm our table was created successfully:
SHOW TABLES;
-- View detailed structure of students table
DESCRIBE students;
| Tables_in_school_db |
+--------------------+
| students |
+--------------------+
1 row in set (0.00 sec)
+----------------+--------------+------+-----+------------+----------------+
| Field | Type | Null | Key | Default | Extra |
+----------------+--------------+------+-----+------------+----------------+
| student_id | int | NO | PRI | NULL | auto_increment |
| first_name | varchar(50) | NO | | NULL | |
| last_name | varchar(50) | NO | | NULL | |
| email | varchar(100) | YES | UNI | NULL | |
| date_of_birth | date | YES | | NULL | |
| enrollment_date| date | YES | | curdate() | |
+----------------+--------------+------+-----+------------+----------------+
More Practical Table Examples
Let's create additional tables for our school database:
Example 1: Courses Table
course_id INT PRIMARY KEY AUTO_INCREMENT,
course_name VARCHAR(100) NOT NULL,
course_code VARCHAR(10) UNIQUE NOT NULL,
credits INT DEFAULT 3,
description TEXT
);
Example 2: Teachers Table
teacher_id INT PRIMARY KEY AUTO_INCREMENT,
first_name VARCHAR(50) NOT NULL,
last_name VARCHAR(50) NOT NULL,
email VARCHAR(100) UNIQUE,
phone VARCHAR(20),
hire_date DATE NOT NULL,
salary DECIMAL(10, 2)
);
Understanding Common Data Types
| Data Type | Description | Example |
|---|---|---|
| INT | Whole numbers | 25, -10, 1000 |
| VARCHAR(n) | Variable-length text (n = max characters) | "John", "Computer Science" |
| DECIMAL(p,s) | Exact decimal numbers (p=precision, s=scale) | 99999.99 (DECIMAL(7,2)) |
| TEXT | Large text data | Long descriptions, notes |
| BOOLEAN | True/False values | TRUE, FALSE |
Understanding Common Constraints
| Constraint | Description | Example |
|---|---|---|
| PRIMARY KEY | Uniquely identifies each row | student_id INT PRIMARY KEY |
| AUTO_INCREMENT | Automatically generates sequential numbers | id INT AUTO_INCREMENT |
| NOT NULL | Column must have a value | name VARCHAR(50) NOT NULL |
| UNIQUE | All values must be different | email VARCHAR(100) UNIQUE |
| DEFAULT | Sets default value if none provided | status VARCHAR(20) DEFAULT 'active' |
| FOREIGN KEY | Links to primary key in another table | FOREIGN KEY (course_id) REFERENCES courses(course_id) |
Creating Tables with Foreign Keys
Let's create a table that relates students to courses:
enrollment_id INT PRIMARY KEY AUTO_INCREMENT,
student_id INT NOT NULL,
course_id INT NOT NULL,
enrollment_date DATE DEFAULT (CURRENT_DATE),
grade CHAR(2),
FOREIGN KEY (student_id) REFERENCES students(student_id),
FOREIGN KEY (course_id) REFERENCES courses(course_id),
UNIQUE (student_id, course_id)
);
Foreign keys create relationships between tables. In this example:
• student_id in enrollments table references student_id in students table
• course_id in enrollments table references course_id in courses table
• This ensures we only enroll existing students in existing courses
Modifying Existing Tables
You can modify tables after creation using ALTER TABLE:
ALTER TABLE students
ADD COLUMN phone_number VARCHAR(20);
-- Modify an existing column
ALTER TABLE students
MODIFY COLUMN email VARCHAR(150) NOT NULL;
-- Drop a column
ALTER TABLE students
DROP COLUMN phone_number;
-- Add a constraint
ALTER TABLE students
ADD CONSTRAINT chk_email CHECK (email LIKE '%@%');
Viewing Table Information
Various ways to inspect your tables:
SHOW TABLES;
-- Show table structure
DESCRIBE students;
-- OR
SHOW COLUMNS FROM students;
-- Show how table was created
SHOW CREATE TABLE students;
Deleting Tables
To remove a table (use with caution!):
DROP TABLE enrollments;
-- Safer: Only drop if table exists
DROP TABLE IF EXISTS temp_table;
DROP TABLE permanently deletes the table and ALL its data. This action cannot be undone. Always backup important data before dropping tables.
Best Practices for Table Creation
- Use plural nouns for table names (students, courses)
- Use singular nouns for column names (first_name, not first_names)
- Use lowercase and underscores (student_id, not StudentID)
- Be consistent across your database
- Always define a primary key for each table
- Use appropriate data types for each column
- Add NOT NULL constraint for required fields
- Use VARCHAR instead of CHAR for variable-length text
- Plan relationships between tables before creating them
Common Mistakes to Avoid
- Forgetting to select a database before creating tables
- Using reserved keywords as table or column names
- Not specifying primary keys for tables
- Using wrong data types (VARCHAR for dates, etc.)
- Creating tables without planning relationships
- Forgetting commas between column definitions
Practice Exercise
Create a complete library management system with these tables:
- books table with: book_id, title, author, isbn, publication_year
- members table with: member_id, name, email, join_date, membership_type
- borrowings table with: borrowing_id, book_id, member_id, borrow_date, return_date
- Add appropriate data types and constraints
- Create foreign key relationships between borrowings and the other tables
Now that you can create tables, the next step is learning how to insert data into them. You'll learn how to add students, courses, and enrollment records to make your database useful.
MySQL Data Types
Why Data Types Matter
Selecting appropriate data types provides several benefits:
- Data Integrity: Prevents invalid data from being stored
- Storage Efficiency: Optimizes disk space usage
- Performance: Improves query speed and efficiency
- Validation: Automatically validates data format
- Memory Usage: Reduces memory consumption
Numeric Data Types
Used for storing numbers - integers, decimals, and floating-point values:
| Data Type | Storage | Range | Description | Example |
|---|---|---|---|---|
| TINYINT | 1 byte | -128 to 127 | Very small integer | age, status flags |
| SMALLINT | 2 bytes | -32,768 to 32,767 | Small integer | year, small counts |
| INT | 4 bytes | -2.1B to 2.1B | Standard integer | id, quantity, price |
| BIGINT | 8 bytes | ±9.2 quintillion | Large integer | population, big counts |
| DECIMAL(p,s) | Varies | Depends on precision | Exact decimal numbers | money, measurements |
| FLOAT | 4 bytes | ±3.4E+38 | Single-precision floating point | scientific data |
| DOUBLE | 8 bytes | ±1.8E+308 | Double-precision floating point | scientific calculations |
• DECIMAL(5,2) = 999.99 (5 total digits, 2 after decimal)
• DECIMAL(10,2) = 99,999,999.99 (good for currency)
• Use DECIMAL for money to avoid floating-point errors
String Data Types
Used for storing text and character data:
| Data Type | Max Size | Description | When to Use | Example |
|---|---|---|---|---|
| CHAR(n) | 255 chars | Fixed-length string | Fixed-size codes | country codes, status |
| VARCHAR(n) | 65,535 chars | Variable-length string | Most text data | names, addresses |
| TINYTEXT | 255 bytes | Small text data | Short descriptions | short notes |
| TEXT | 65,535 bytes | Standard text data | Long descriptions | articles, comments |
| MEDIUMTEXT | 16MB | Medium text data | Large documents | books, manuals |
| LONGTEXT | 4GB | Very large text data | Huge documents | encyclopedias |
• CHAR(10) always uses 10 characters storage
• VARCHAR(10) uses only needed characters + 1 byte
• Use CHAR for fixed-length data (like "US", "CA" for country codes)
• Use VARCHAR for variable-length data (like names, addresses)
Date and Time Data Types
Used for storing dates, times, and timestamps:
| Data Type | Format | Range | Description | Example |
|---|---|---|---|---|
| DATE | YYYY-MM-DD | 1000-01-01 to 9999-12-31 | Date only | birth_date, order_date |
| TIME | HH:MM:SS | -838:59:59 to 838:59:59 | Time only | duration, time_of_day |
| DATETIME | YYYY-MM-DD HH:MM:SS | 1000-01-01 00:00:00 to 9999-12-31 23:59:59 | Date and time | created_at, updated_at |
| TIMESTAMP | YYYY-MM-DD HH:MM:SS | 1970-01-01 00:00:01 to 2038-01-19 03:14:07 | Automatic timestamp | last_login, modified_date |
| YEAR | YYYY | 1901 to 2155 | Year value | graduation_year, founding_year |
• DATETIME - Larger range, no timezone conversion
• TIMESTAMP - Smaller range, automatic timezone conversion
• Use TIMESTAMP for auto-updating fields (like last_modified)
• Use DATETIME for future dates beyond 2038
Binary Data Types
Used for storing binary data like files, images, or encrypted data:
| Data Type | Max Size | Description | When to Use |
|---|---|---|---|
| BINARY(n) | 255 bytes | Fixed-length binary | Encrypted fixed data |
| VARBINARY(n) | 65,535 bytes | Variable-length binary | Encrypted variable data |
| TINYBLOB | 255 bytes | Small binary object | Very small files |
| BLOB | 65,535 bytes | Binary large object | Small images, documents |
| MEDIUMBLOB | 16MB | Medium binary object | Medium files, images |
| LONGBLOB | 4GB | Large binary object | Large files, videos |
Storing large files in databases is generally not recommended. Consider storing files on disk and saving file paths in the database instead. This improves performance and makes backups easier.
Other Special Data Types
MySQL also provides specialized data types for specific use cases:
| Data Type | Description | Example | Usage |
|---|---|---|---|
| ENUM | Predefined list of values | ENUM('small','medium','large') | Size categories, status |
| SET | Multiple choices from predefined list | SET('red','green','blue') | Tags, multiple categories |
| BOOLEAN | True/False values (alias for TINYINT(1)) | TRUE, FALSE | Flags, yes/no fields |
| JSON | JSON formatted data | {"name": "John", "age": 30} | Flexible schema data |
Practical Examples with Different Data Types
Let's create tables using various data types:
product_id INT PRIMARY KEY AUTO_INCREMENT,
product_name VARCHAR(100) NOT NULL,
description TEXT,
price DECIMAL(10, 2) NOT NULL,
quantity INT DEFAULT 0,
size ENUM('small', 'medium', 'large'),
is_available BOOLEAN DEFAULT TRUE,
created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
metadata JSON
);
user_id INT PRIMARY KEY,
username VARCHAR(50) UNIQUE NOT NULL,
email VARCHAR(100) UNIQUE NOT NULL,
birth_date DATE,
phone VARCHAR(20),
bio TEXT,
last_login DATETIME,
login_count INT DEFAULT 0,
preferences JSON
);
Data Type Selection Guidelines
- For IDs: INT or BIGINT with AUTO_INCREMENT
- For names/titles: VARCHAR with appropriate length
- For money: DECIMAL to avoid floating-point errors
- For dates: DATE for dates only, DATETIME for date+time
- For true/false: BOOLEAN or TINYINT(1)
- For large text: TEXT or LONGTEXT
- For categories: ENUM for fixed choices
Common Data Type Mistakes
- Using VARCHAR for all text: Use CHAR for fixed-length codes
- Using FLOAT for money: Use DECIMAL to avoid rounding errors
- Oversizing VARCHAR: Choose appropriate max length
- Using TEXT for short strings: Use VARCHAR for better performance
- Storing dates as strings: Use DATE/DATETIME for date operations
- Using INT for small numbers: Use TINYINT or SMALLINT when appropriate
Practice Exercise
Create a table for a blog system with appropriate data types:
- posts table with: post_id, title, content, author_id, created_date, published_status, view_count
- categories table with: category_id, name, description, created_date
- tags table with: tag_id, name, color_code
- Choose the most appropriate data types for each column
- Consider storage efficiency and performance
MySQL Constraints
Why Constraints Matter
Using constraints provides several important benefits:
- Data Integrity: Prevents invalid or inconsistent data
- Data Accuracy: Ensures data follows business rules
- Relationship Management: Maintains links between related tables
- Application Reliability: Reduces application-level validation code
- Database Consistency: Ensures data follows defined rules
Types of Constraints in MySQL
| Constraint | Description | Usage Example | When to Use |
|---|---|---|---|
| PRIMARY KEY | Uniquely identifies each row in a table | student_id INT PRIMARY KEY | Every table should have one |
| FOREIGN KEY | Links to primary key in another table | FOREIGN KEY (course_id) REFERENCES courses(course_id) | When relating tables |
| UNIQUE | Ensures all values in a column are different | email VARCHAR(100) UNIQUE | Email, username, product codes |
| NOT NULL | Ensures a column cannot have NULL values | first_name VARCHAR(50) NOT NULL | Required fields |
| CHECK | Ensures values meet specific conditions | CHECK (age >= 18) | Business rules validation |
| DEFAULT | Sets a default value when no value is specified | status VARCHAR(20) DEFAULT 'active' | Common default values |
| AUTO_INCREMENT | Automatically generates sequential numbers | id INT AUTO_INCREMENT | Primary key columns |
PRIMARY KEY Constraint
Uniquely identifies each record in a table. A table can have only one primary key.
CREATE TABLE students (
student_id INT PRIMARY KEY AUTO_INCREMENT,
first_name VARCHAR(50) NOT NULL
);
-- Composite primary key (multiple columns)
CREATE TABLE enrollments (
student_id INT,
course_id INT,
enrollment_date DATE,
PRIMARY KEY (student_id, course_id)
);
- Every table should have a primary key
- Use AUTO_INCREMENT for surrogate keys
- Keep primary keys simple and stable
- Use composite keys only when necessary
- Primary key values cannot be NULL
FOREIGN KEY Constraint
Creates a relationship between two tables and maintains referential integrity.
CREATE TABLE orders (
order_id INT PRIMARY KEY AUTO_INCREMENT,
customer_id INT NOT NULL,
order_date DATE,
FOREIGN KEY (customer_id) REFERENCES customers(customer_id)
);
-- Foreign key with actions
CREATE TABLE order_items (
order_id INT,
product_id INT,
quantity INT,
FOREIGN KEY (order_id) REFERENCES orders(order_id)
ON DELETE CASCADE
ON UPDATE CASCADE
);
Foreign Key Actions
| Action | Description | When to Use |
|---|---|---|
| RESTRICT | Prevents deletion/update of parent record | Default behavior, safe option |
| CASCADE | Deletes/updates child records when parent changes | When child records depend completely on parent |
| SET NULL | Sets foreign key to NULL when parent is deleted | When relationship is optional |
| NO ACTION | Similar to RESTRICT (standard SQL) | Standard compliance |
UNIQUE Constraint
Ensures all values in a column are different from each other.
CREATE TABLE users (
user_id INT PRIMARY KEY AUTO_INCREMENT,
username VARCHAR(50) UNIQUE NOT NULL,
email VARCHAR(100) UNIQUE NOT NULL
);
-- Composite unique constraint (multiple columns)
CREATE TABLE class_rosters (
class_id INT,
student_id INT,
semester VARCHAR(10),
UNIQUE (class_id, student_id, semester)
);
NOT NULL Constraint
Ensures a column cannot contain NULL values.
employee_id INT PRIMARY KEY AUTO_INCREMENT,
first_name VARCHAR(50) NOT NULL,
last_name VARCHAR(50) NOT NULL,
email VARCHAR(100) NOT NULL,
phone VARCHAR(20), -- This can be NULL
hire_date DATE NOT NULL
);
CHECK Constraint
Ensures that values in a column satisfy a specific condition (MySQL 8.0+).
product_id INT PRIMARY KEY AUTO_INCREMENT,
product_name VARCHAR(100) NOT NULL,
price DECIMAL(10, 2) CHECK (price > 0),
quantity INT CHECK (quantity >= 0),
category VARCHAR(50) CHECK (category IN ('electronics', 'clothing', 'books'))
);
-- Named CHECK constraint
CREATE TABLE employees (
employee_id INT PRIMARY KEY,
salary DECIMAL(10, 2),
CONSTRAINT chk_salary CHECK (salary >= 0)
);
DEFAULT Constraint
Sets a default value for a column when no value is specified.
order_id INT PRIMARY KEY AUTO_INCREMENT,
order_date DATE DEFAULT (CURRENT_DATE),
status VARCHAR(20) DEFAULT 'pending',
priority INT DEFAULT 1,
is_active BOOLEAN DEFAULT TRUE
);
Complete Example: E-commerce Database
Let's create a complete example using multiple constraints:
customer_id INT PRIMARY KEY AUTO_INCREMENT,
email VARCHAR(100) UNIQUE NOT NULL,
first_name VARCHAR(50) NOT NULL,
last_name VARCHAR(50) NOT NULL,
phone VARCHAR(20),
registration_date DATE DEFAULT (CURRENT_DATE)
);
CREATE TABLE products (
product_id INT PRIMARY KEY AUTO_INCREMENT,
product_name VARCHAR(100) NOT NULL,
price DECIMAL(10, 2) CHECK (price > 0),
stock_quantity INT DEFAULT 0 CHECK (stock_quantity >= 0),
is_available BOOLEAN DEFAULT TRUE
);
CREATE TABLE orders (
order_id INT PRIMARY KEY AUTO_INCREMENT,
customer_id INT NOT NULL,
order_date DATETIME DEFAULT CURRENT_TIMESTAMP,
status ENUM('pending', 'confirmed', 'shipped', 'delivered') DEFAULT 'pending',
total_amount DECIMAL(10, 2) CHECK (total_amount >= 0),
FOREIGN KEY (customer_id) REFERENCES customers(customer_id)
ON DELETE RESTRICT
ON UPDATE CASCADE
);
Adding Constraints to Existing Tables
You can add constraints to existing tables using ALTER TABLE:
ALTER TABLE students
ADD PRIMARY KEY (student_id);
-- Add foreign key
ALTER TABLE orders
ADD FOREIGN KEY (customer_id) REFERENCES customers(customer_id);
-- Add unique constraint
ALTER TABLE users
ADD CONSTRAINT uc_email UNIQUE (email);
-- Add check constraint
ALTER TABLE products
ADD CONSTRAINT chk_price CHECK (price > 0);
-- Add not null constraint
ALTER TABLE employees
MODIFY COLUMN email VARCHAR(100) NOT NULL;
Removing Constraints
Constraints can be removed when needed:
ALTER TABLE students
DROP PRIMARY KEY;
-- Drop foreign key
ALTER TABLE orders
DROP FOREIGN KEY orders_ibfk_1;
-- Drop unique constraint
ALTER TABLE users
DROP INDEX uc_email;
-- Drop check constraint
ALTER TABLE products
DROP CONSTRAINT chk_price;
Constraint Best Practices
- Always define primary keys for all tables
- Use foreign keys to maintain referential integrity
- Add NOT NULL constraints for required fields
- Use UNIQUE constraints for columns that should be unique
- Use CHECK constraints for business rule validation
- Set meaningful DEFAULT values where appropriate
- Name your constraints for easier management
Common Constraint Mistakes
- Not using primary keys: Every table needs a primary key
- Forgetting foreign keys: Leads to orphaned records
- Overusing CASCADE: Can accidentally delete important data
- Not using NOT NULL: Allows incomplete data
- Ignoring unique constraints: Allows duplicate data where it shouldn't be
- Complex check constraints: Can impact performance
Practice Exercise
Create a library management system with proper constraints:
- books table with: book_id (PK), isbn (UNIQUE), title (NOT NULL), publication_year (CHECK > 1450)
- members table with: member_id (PK), email (UNIQUE, NOT NULL), join_date (DEFAULT CURRENT_DATE)
- borrowings table with: borrowing_id (PK), book_id (FK), member_id (FK), borrow_date (DEFAULT), return_date (CHECK > borrow_date)
- Add appropriate foreign key relationships with CASCADE actions
- Add CHECK constraints for business rules
Inserting Data in MySQL
Prerequisites
Before inserting data, ensure you have:
- Tables created with proper structure
- Appropriate permissions to insert data
- Understanding of your table's columns and data types
- Selected the correct database using USE statement
Basic INSERT Syntax
The basic INSERT statement has several forms:
| Syntax Type | Description | When to Use |
|---|---|---|
| INSERT INTO table VALUES (...) | Insert values for all columns in order | When providing values for all columns |
| INSERT INTO table (columns) VALUES (...) | Insert values for specific columns | When providing values for some columns |
| INSERT INTO table SET column=value | Insert using SET syntax | Alternative syntax, less common |
| INSERT INTO table SELECT ... | Insert data from another table | When copying data between tables |
Step 1: Basic Single Row Insertion
Let's start with inserting a single record into our students table:
INSERT INTO students (
first_name, last_name, email, date_of_birth
) VALUES (
'John', 'Doe', 'john.doe@email.com', '2000-05-15'
);
Step 2: Insert Multiple Rows
You can insert multiple rows with a single INSERT statement:
first_name, last_name, email, date_of_birth
) VALUES
('Jane', 'Smith', 'jane.smith@email.com', '1999-08-22'),
('Bob', 'Johnson', 'bob.johnson@email.com', '2001-03-10'),
('Alice', 'Brown', 'alice.brown@email.com', '2000-11-30');
Records: 3 Duplicates: 0 Warnings: 0
Step 3: Verify Inserted Data
Let's check our inserted data using SELECT statement:
| student_id | first_name | last_name | email | date_of_birth | enrollment_date |
+------------+------------+-----------+-------------------------+---------------+----------------+
| 1 | John | Doe | john.doe@email.com | 2000-05-15 | 2023-10-25 |
| 2 | Jane | Smith | jane.smith@email.com | 1999-08-22 | 2023-10-25 |
| 3 | Bob | Johnson | bob.johnson@email.com | 2001-03-10 | 2023-10-25 |
| 4 | Alice | Brown | alice.brown@email.com | 2000-11-30 | 2023-10-25 |
+------------+------------+-----------+-------------------------+---------------+----------------+
4 rows in set (0.00 sec)
Different INSERT Syntax Methods
Method 1: Column List (Recommended)
product_name, price, category
) VALUES (
'Laptop', 999.99, 'Electronics'
);
Method 2: Without Column List
NULL, 'Smartphone', 699.99, 'Electronics', 50, TRUE
);
Method 3: SET Syntax
SET product_name = 'Tablet',
price = 299.99,
category = 'Electronics';
Handling Different Data Types
| Data Type | Insert Format | Example | Notes |
|---|---|---|---|
| VARCHAR/CHAR | Single quotes | 'John Doe' | Use single quotes for text |
| INT/DECIMAL | No quotes | 25, 99.99 | Numbers without quotes |
| DATE | Single quotes | '2023-10-25' | YYYY-MM-DD format |
| DATETIME | Single quotes | '2023-10-25 14:30:00' | YYYY-MM-DD HH:MM:SS |
| BOOLEAN | TRUE/FALSE or 1/0 | TRUE, 1, FALSE, 0 | Both formats work |
| NULL | NULL keyword | NULL | No quotes for NULL |
Working with AUTO_INCREMENT Columns
When a column has AUTO_INCREMENT, MySQL automatically generates the value:
INSERT INTO students (first_name, last_name)
VALUES ('Charlie', 'Wilson');
-- Explicitly set the ID (not recommended)
INSERT INTO students (student_id, first_name, last_name)
VALUES (100, 'David', 'Lee');
-- Get the last auto-generated ID
SELECT LAST_INSERT_ID();
Working with DEFAULT Values
Columns with DEFAULT values can be omitted from INSERT statements:
INSERT INTO orders (customer_id, total_amount)
VALUES (1, 99.99);
-- order_date will use DEFAULT CURRENT_TIMESTAMP
-- Explicitly use DEFAULT keyword
INSERT INTO products (product_name, price, stock_quantity)
VALUES ('Mouse', 29.99, DEFAULT);
-- Override the default value
INSERT INTO orders (customer_id, order_date, total_amount)
VALUES (2, '2023-10-20 10:00:00', 149.99);
Inserting Data with Foreign Keys
When inserting into tables with foreign keys, reference existing records:
INSERT INTO courses (course_name, course_code, credits)
VALUES
('Mathematics', 'MATH101', 3),
('Computer Science', 'CS101', 4);
-- Then, insert into child table with valid foreign keys
INSERT INTO enrollments (student_id, course_id, grade)
VALUES
(1, 1, 'A'),
(1, 2, 'B+'),
(2, 1, 'A-');
INSERT with SELECT (Copying Data)
You can insert data from one table into another:
CREATE TABLE students_backup (
student_id INT PRIMARY KEY,
first_name VARCHAR(50),
last_name VARCHAR(50)
);
-- Copy data from students to backup
INSERT INTO students_backup (student_id, first_name, last_name)
SELECT student_id, first_name, last_name FROM students;
Handling Errors and Constraints
| Constraint | Potential Error | Solution |
|---|---|---|
| PRIMARY KEY | Duplicate entry error | Use AUTO_INCREMENT or ensure unique values |
| FOREIGN KEY | Cannot add or update child row | Ensure referenced record exists in parent table |
| UNIQUE | Duplicate entry for key | Use unique values or update existing record |
| NOT NULL | Column cannot be null | Provide value for required columns |
| CHECK | Check constraint violation | Ensure value meets condition |
INSERT IGNORE and ON DUPLICATE KEY UPDATE
Advanced techniques for handling duplicate entries:
INSERT IGNORE INTO students (student_id, first_name, last_name)
VALUES (1, 'John', 'Updated');
-- If student_id 1 exists, this insert is silently ignored
-- Update on duplicate key
INSERT INTO students (student_id, first_name, last_name)
VALUES (1, 'John', 'Updated')
ON DUPLICATE KEY UPDATE
first_name = 'John', last_name = 'Updated';
Best Practices for Inserting Data
- Always specify column names for clarity and safety
- Use multiple row inserts for better performance with large datasets
- Validate data before insertion when possible
- Use transactions for multiple related inserts
- Handle AUTO_INCREMENT columns properly
- Be mindful of foreign key constraints
- Use INSERT IGNORE or ON DUPLICATE KEY UPDATE for duplicate handling
Common INSERT Mistakes
- Forgetting column lists: Can cause errors if table structure changes
- Wrong data types: Inserting strings without quotes, etc.
- Violating constraints: Not checking foreign key relationships
- Date format errors: Using wrong date formats
- Missing required fields: Forgetting NOT NULL columns
- Single quotes in text: Not escaping special characters
Practice Exercise
Populate the library management system with sample data:
- Insert 5 books into the books table with different categories
- Insert 3 library members into the members table
- Insert borrowing records linking books to members
- Use multiple row insertion for books and members
- Verify all inserts worked by selecting the data
- Try inserting a duplicate ISBN and handle it with INSERT IGNORE
SELECT Queries in MySQL
Why SELECT Queries Matter
SELECT queries are essential for:
- Data Retrieval: Extracting specific information from databases
- Data Analysis: Generating reports and insights
- Application Development: Displaying data in applications
- Decision Making: Supporting business decisions with data
- Data Validation: Checking data quality and integrity
Basic SELECT Syntax
The SELECT statement has several components that can be combined:
| Clause | Purpose | Required |
|---|---|---|
| SELECT | Specifies columns to retrieve | Yes |
| FROM | Specifies table(s) to query | Yes (in most cases) |
| WHERE | Filters rows based on conditions | No |
| ORDER BY | Sorts the result set | No |
| GROUP BY | Groups rows with same values | No |
| HAVING | Filters grouped rows | No |
| LIMIT | Restricts number of rows returned | No |
Step 1: Basic SELECT Queries
Let's start with simple queries to retrieve data from our students table:
SELECT * FROM students;
-- Select specific columns
SELECT first_name, last_name, email FROM students;
-- Select with column aliases
SELECT
first_name AS 'First Name',
last_name AS 'Last Name',
email AS 'Email Address'
FROM students;
| First Name | Last Name | Email Address |
+------------+-----------+-------------------------+
| John | Doe | john.doe@email.com |
| Jane | Smith | jane.smith@email.com |
| Bob | Johnson | bob.johnson@email.com |
| Alice | Brown | alice.brown@email.com |
+------------+-----------+-------------------------+
4 rows in set (0.00 sec)
Step 2: Using DISTINCT for Unique Values
Get unique values from a column:
SELECT DISTINCT last_name FROM students;
-- Get unique combinations of first and last names
SELECT DISTINCT first_name, last_name FROM students;
Step 3: Basic Calculations and Expressions
Perform calculations in SELECT statements:
SELECT
first_name,
last_name,
date_of_birth,
YEAR(CURRENT_DATE) - YEAR(date_of_birth) AS age
FROM students;
-- Mathematical operations
SELECT
product_name,
price,
price * 0.9 AS discounted_price,
price * quantity AS total_value
FROM products;
Common SELECT Patterns
| Pattern | Example | Use Case |
|---|---|---|
| Select all columns | SELECT * FROM table | Quick data inspection |
| Select specific columns | SELECT col1, col2 FROM table | Performance optimization |
| Column aliases | SELECT col AS alias FROM table | Better column names |
| Calculated columns | SELECT price * quantity AS total | Dynamic calculations |
| Distinct values | SELECT DISTINCT col FROM table | Unique value analysis |
Working with Multiple Tables
Query data from multiple tables using JOINs:
SELECT
s.first_name,
s.last_name,
c.course_name,
e.grade
FROM students s
INNER JOIN enrollments e ON s.student_id = e.student_id
INNER JOIN courses c ON e.course_id = c.course_id;
-- Left join to include all students (even without enrollments)
SELECT
s.first_name,
s.last_name,
c.course_name
FROM students s
LEFT JOIN enrollments e ON s.student_id = e.student_id
LEFT JOIN courses c ON e.course_id = c.course_id;
JOIN Types Overview
| JOIN Type | Description | Result | Use Case |
|---|---|---|---|
| INNER JOIN | Returns matching rows from both tables | Intersection | Find related records |
| LEFT JOIN | Returns all rows from left table, matched rows from right | All left + matching right | Find records with optional relationships |
| RIGHT JOIN | Returns all rows from right table, matched rows from left | All right + matching left | Less common, similar to LEFT JOIN |
| FULL OUTER JOIN | Returns all rows when there's a match in either table | Union of both tables | MySQL doesn't support directly |
| CROSS JOIN | Returns Cartesian product of both tables | All combinations | Generate test data, combinations |
Using WHERE Clause for Filtering
Filter rows based on specific conditions:
SELECT * FROM students
WHERE last_name = 'Doe';
-- Multiple conditions with AND/OR
SELECT * FROM students
WHERE last_name = 'Doe' AND first_name = 'John';
-- Pattern matching with LIKE
SELECT * FROM students
WHERE email LIKE '%@gmail.com';
-- Range filtering
SELECT * FROM students
WHERE date_of_birth BETWEEN '2000-01-01' AND '2000-12-31';
-- IN operator for multiple values
SELECT * FROM students
WHERE last_name IN ('Doe', 'Smith', 'Johnson');
Common WHERE Clause Operators
| Operator | Description | Example |
|---|---|---|
| = | Equal to | WHERE age = 25 |
| != or <> | Not equal to | WHERE status != 'inactive' |
| >, <, >=, <= | Comparison operators | WHERE price > 100 |
| BETWEEN | Range inclusive | WHERE age BETWEEN 18 AND 65 |
| LIKE | Pattern matching | WHERE name LIKE 'J%' |
| IN | Multiple possible values | WHERE status IN ('active', 'pending') |
| IS NULL | Null check | WHERE email IS NULL |
| IS NOT NULL | Not null check | WHERE phone IS NOT NULL |
Sorting Results with ORDER BY
Control the order of your results:
SELECT * FROM students
ORDER BY last_name;
-- Sort by single column descending
SELECT * FROM students
ORDER BY enrollment_date DESC;
-- Sort by multiple columns
SELECT * FROM students
ORDER BY last_name ASC, first_name ASC;
-- Sort by calculated column
SELECT
first_name, last_name, date_of_birth,
YEAR(CURRENT_DATE) - YEAR(date_of_birth) AS age
FROM students
ORDER BY age DESC;
Limiting Results with LIMIT
Restrict the number of rows returned:
SELECT * FROM students
LIMIT 5;
-- Get 5 rows starting from row 3 (offset)
SELECT * FROM students
LIMIT 3, 5;
-- Alternative syntax (MySQL 8.0+)
SELECT * FROM students
LIMIT 5 OFFSET 3;
-- Top 3 oldest students
SELECT * FROM students
ORDER BY date_of_birth ASC
LIMIT 3;
Aggregate Functions
Perform calculations on multiple rows and return single values:
SELECT COUNT(*) AS total_students FROM students;
-- Count students with email
SELECT COUNT(email) AS students_with_email FROM students;
-- Average, minimum, maximum prices
SELECT
AVG(price) AS average_price,
MIN(price) AS min_price,
MAX(price) AS max_price,
SUM(price) AS total_value
FROM products;
Common Aggregate Functions
| Function | Description | Example |
|---|---|---|
| COUNT() | Counts number of rows | COUNT(*), COUNT(column) |
| SUM() | Calculates sum of values | SUM(price), SUM(quantity) |
| AVG() | Calculates average value | AVG(salary), AVG(score) |
| MIN() | Finds minimum value | MIN(price), MIN(age) |
| MAX() | Finds maximum value | MAX(price), MAX(age) |
| GROUP_CONCAT() | Concatenates values from group | GROUP_CONCAT(name) |
Best Practices for SELECT Queries
- Use specific column names instead of * for better performance
- Add indexes on columns used in WHERE and JOIN conditions
- Use LIMIT to restrict large result sets
- Avoid functions on indexed columns in WHERE clauses
- Use EXISTS instead of IN for subqueries when possible
- Test queries with EXPLAIN to understand performance
- Use appropriate data types for better indexing
Common SELECT Mistakes
- Using SELECT * in production: Can cause performance issues
- Missing JOIN conditions: Results in Cartesian products
- Confusing AND/OR logic: Use parentheses for complex conditions
- Forgetting NULL handling: NULL values behave differently
- Inefficient LIKE patterns: Leading wildcards can't use indexes
- Overusing subqueries: Consider JOINs for better performance
- Ignoring data types: Can cause implicit conversions
Practice Exercise
Practice these SELECT query scenarios:
- Retrieve all students born after 2000
- Find students with Gmail email addresses
- List students in alphabetical order by last name, then first name
- Count how many students enrolled each year
- Find the average age of students
- Get the top 5 most recently enrolled students
- Join students with their course enrollments and grades
- Find students who are not enrolled in any courses
WHERE Clause in MySQL
Why WHERE Clause Matters
The WHERE clause is essential for:
- Data Filtering: Extracting specific subsets of data
- Performance: Reducing data transfer by filtering at database level
- Data Analysis: Focusing on relevant data for reports
- Application Logic: Implementing business rules in queries
- Security: Restricting data access based on conditions
Basic WHERE Clause Syntax
The WHERE clause follows the FROM clause in a SELECT statement:
FROM table_name
WHERE condition;
Comparison Operators
Used to compare values in WHERE conditions:
| Operator | Description | Example | Result |
|---|---|---|---|
| = | Equal to | WHERE age = 25 | Rows where age is exactly 25 |
| <> or != | Not equal to | WHERE status <> 'inactive' | Rows where status is not 'inactive' |
| > | Greater than | WHERE price > 100 | Rows where price exceeds 100 |
| < | Less than | WHERE age < 18 | Rows where age is under 18 |
| >= | Greater than or equal to | WHERE score >= 80 | Rows where score is 80 or higher |
| <= | Less than or equal to | WHERE quantity <= 10 | Rows where quantity is 10 or less |
Step 1: Basic Comparison Examples
Let's start with simple comparison operations on our students table:
SELECT * FROM students
WHERE last_name = 'Doe';
-- Find students born before 2000
SELECT * FROM students
WHERE date_of_birth < '2000-01-01';
-- Find students with specific student_id
SELECT * FROM students
WHERE student_id >= 5;
| student_id | first_name | last_name | email | date_of_birth | enrollment_date |
+------------+------------+-----------+-----------------------+---------------+----------------+
| 1 | John | Doe | john.doe@email.com | 2000-05-15 | 2023-10-25 |
+------------+------------+-----------+-----------------------+---------------+----------------+
1 row in set (0.00 sec)
Logical Operators (AND, OR, NOT)
Combine multiple conditions in WHERE clauses:
| Operator | Description | Example | Result |
|---|---|---|---|
| AND | All conditions must be true | WHERE age > 18 AND status = 'active' | Rows where both conditions are true |
| OR | At least one condition must be true | WHERE city = 'NYC' OR city = 'LA' | Rows where either condition is true |
| NOT | Negates a condition | WHERE NOT status = 'inactive' | Rows where status is not 'inactive' |
Step 2: Combining Conditions with AND/OR
Use logical operators to create complex filters:
SELECT * FROM students
WHERE first_name = 'John' AND last_name = 'Doe';
-- Students with specific first name OR last name
SELECT * FROM students
WHERE first_name = 'John' OR last_name = 'Smith';
-- Complex combination with parentheses
SELECT * FROM students
WHERE (first_name = 'John' OR first_name = 'Jane')
AND date_of_birth > '1999-01-01';
BETWEEN Operator
Filter values within a specified range (inclusive):
SELECT * FROM students
WHERE date_of_birth BETWEEN '1999-01-01' AND '2000-12-31';
-- Equivalent using comparison operators
SELECT * FROM students
WHERE date_of_birth >= '1999-01-01'
AND date_of_birth <= '2000-12-31';
-- NOT BETWEEN (outside the range)
SELECT * FROM students
WHERE date_of_birth NOT BETWEEN '1999-01-01' AND '2000-12-31';
IN Operator
Check if a value matches any value in a list:
SELECT * FROM students
WHERE last_name IN ('Doe', 'Smith', 'Johnson');
-- Equivalent using multiple OR conditions
SELECT * FROM students
WHERE last_name = 'Doe'
OR last_name = 'Smith'
OR last_name = 'Johnson';
-- NOT IN (exclude specific values)
SELECT * FROM students
WHERE last_name NOT IN ('Doe', 'Smith');
LIKE Operator for Pattern Matching
Search for patterns in text data using wildcards:
| Wildcard | Description | Example | Matches |
|---|---|---|---|
| % | Matches any sequence of characters (0 or more) | WHERE name LIKE 'J%' | John, Jane, James, etc. |
| _ | Matches exactly one character | WHERE name LIKE 'J_n' | Jen, Jon, Jan (but not John) |
| [charlist] | Matches any single character in charlist | WHERE name LIKE '[JS]%' | John, Sarah (starts with J or S) |
| [^charlist] | Matches any single character not in charlist | WHERE name LIKE '[^JS]%' | Mark, David (doesn't start with J or S) |
SELECT * FROM students
WHERE first_name LIKE 'J%';
-- Students with email containing 'gmail'
SELECT * FROM students
WHERE email LIKE '%gmail%';
-- Students with 3-letter first names
SELECT * FROM students
WHERE first_name LIKE '___';
-- NOT LIKE (exclude pattern)
SELECT * FROM students
WHERE email NOT LIKE '%@gmail.com';
NULL Value Handling
Special handling for NULL values in WHERE clauses:
SELECT * FROM students
WHERE email IS NULL;
-- Students with email address (NOT NULL)
SELECT * FROM students
WHERE email IS NOT NULL;
-- ❌ WRONG: This won't work with NULL values
SELECT * FROM students
WHERE email = NULL; -- Always returns no rows
-- ❌ WRONG: This also won't work
SELECT * FROM students
WHERE email != NULL; -- Always returns no rows
Complex WHERE Clause Examples
Combine multiple operators for sophisticated filtering:
SELECT * FROM students
WHERE YEAR(date_of_birth) = 2000
AND email LIKE '%@gmail.com';
-- Students with specific names OR born after 2001
SELECT * FROM students
WHERE last_name IN ('Doe', 'Smith')
OR date_of_birth > '2001-12-31';
-- Students not named John or Jane, born before 2000
SELECT * FROM students
WHERE first_name NOT IN ('John', 'Jane')
AND date_of_birth < '2000-01-01'
AND email IS NOT NULL;
WHERE Clause with Functions
Use functions in WHERE conditions for dynamic filtering:
SELECT * FROM students
WHERE YEAR(enrollment_date) = YEAR(CURRENT_DATE);
-- Students with email domains of specific length
SELECT * FROM students
WHERE LENGTH(SUBSTRING_INDEX(email, '@', -1)) > 10;
-- Students born on weekend
SELECT * FROM students
WHERE DAYOFWEEK(date_of_birth) IN (1, 7);
Performance Considerations
| Practice | Impact | Recommendation |
|---|---|---|
| Using indexed columns | High performance | Filter on indexed columns when possible |
| Functions on columns | Prevents index usage | Avoid WHERE YEAR(date) = 2023 |
| Leading wildcards in LIKE | Full table scan | Avoid WHERE name LIKE '%son' |
| OR conditions | Can be slow | Consider UNION for better performance |
| NOT conditions | Often slow | Use positive conditions when possible |
Best Practices for WHERE Clauses
- Use parentheses to clarify complex AND/OR logic
- Always use IS NULL/IS NOT NULL for NULL checks
- Use IN instead of multiple OR conditions for better readability
- Avoid functions on indexed columns in WHERE clauses
- Use BETWEEN for range queries instead of separate comparisons
- Be careful with LIKE patterns - leading wildcards can't use indexes
- Test your conditions with sample data to ensure correct logic
Common WHERE Clause Mistakes
- Using = NULL instead of IS NULL
- Forgetting parentheses in complex AND/OR conditions
- Case sensitivity in string comparisons
- Date format mismatches
- Using functions that prevent index usage
- Overly complex conditions that are hard to read
- Not considering NULL values in logic
Practice Exercise
Practice these WHERE clause scenarios:
- Find all students born in the 1990s
- Find students with first names starting with 'A' or 'B'
- Find students without email addresses
- Find students enrolled in the last 30 days
- Find students with specific last names but exclude certain first names
- Find students with email addresses from educational institutions (.edu)
- Find students born on weekdays (Monday-Friday)
- Find students with odd-numbered student IDs
MySQL Operators
Why Operators Matter
Operators are essential for:
- Data Filtering: Creating conditions in WHERE clauses
- Data Calculation: Performing mathematical operations
- Data Comparison: Comparing values and expressions
- Logical Operations: Combining multiple conditions
- Pattern Matching: Searching for text patterns
Operator Categories Overview
| Category | Purpose | Common Operators |
|---|---|---|
| Arithmetic | Mathematical calculations | +, -, *, /, %, DIV |
| Comparison | Compare values and expressions | =, !=, <, >, <=, >= |
| Logical | Combine boolean expressions | AND, OR, NOT, &&, ||, ! |
| Bitwise | Manipulate bits in numbers | &, |, ^, ~, <<, >> |
| Assignment | Assign values to variables | :=, = |
| Special | Unique SQL operations | BETWEEN, IN, LIKE, IS NULL |
Arithmetic Operators
Used for mathematical calculations on numeric values:
| Operator | Description | Example | Result |
|---|---|---|---|
| + | Addition | SELECT 10 + 5; | 15 |
| - | Subtraction | SELECT 10 - 5; | 5 |
| * | Multiplication | SELECT 10 * 5; | 50 |
| / | Division | SELECT 10 / 3; | 3.3333 |
| DIV | Integer division | SELECT 10 DIV 3; | 3 |
| % or MOD | Modulo (remainder) | SELECT 10 % 3; | 1 |
SELECT
10 + 5 AS addition,
10 - 5 AS subtraction,
10 * 5 AS multiplication,
10 / 3 AS division,
10 DIV 3 AS integer_division,
10 % 3 AS modulo;
-- Using arithmetic with table data
SELECT
product_name,
price,
price * 0.9 AS discounted_price,
price * quantity AS total_value
FROM products;
| addition | subtraction | multiplication | division | integer_division | modulo |
+-------+-------------+----------------+--------------+-------------------+--------+
| 15 | 5 | 50 | 3.3333 | 3 | 1 |
+-------+-------------+----------------+--------------+-------------------+--------+
Comparison Operators
Used to compare values and return TRUE, FALSE, or NULL:
| Operator | Description | Example | Result |
|---|---|---|---|
| = | Equal to | SELECT 10 = 10; | 1 (TRUE) |
| != or <> | Not equal to | SELECT 10 != 5; | 1 (TRUE) |
| > | Greater than | SELECT 10 > 5; | 1 (TRUE) |
| < | Less than | SELECT 10 < 5; | 0 (FALSE) |
| >= | Greater than or equal to | SELECT 10 >= 10; | 1 (TRUE) |
| <= | Less than or equal to | SELECT 10 <= 5; | 0 (FALSE) |
| <=> | NULL-safe equal to | SELECT NULL <=> NULL; | 1 (TRUE) |
SELECT
10 = 10 AS equal,
10 != 5 AS not_equal,
10 > 5 AS greater_than,
10 < 5 AS less_than,
10 >= 10 AS greater_equal,
10 <= 5 AS less_equal;
-- NULL-safe comparison
SELECT
NULL = NULL AS null_equals_null, -- Returns NULL
NULL <=> NULL AS null_safe_equals; -- Returns 1 (TRUE)
Logical Operators
Combine boolean expressions and return TRUE, FALSE, or NULL:
| Operator | Description | Example | Result |
|---|---|---|---|
| AND or && | Logical AND | SELECT 1 AND 1; | 1 (TRUE) |
| OR or || | Logical OR | SELECT 1 OR 0; | 1 (TRUE) |
| NOT or ! | Logical NOT | SELECT NOT 0; | 1 (TRUE) |
| XOR | Logical XOR (exclusive OR) | SELECT 1 XOR 1; | 0 (FALSE) |
SELECT
1 AND 1 AS and_true,
1 AND 0 AS and_false,
1 OR 0 AS or_true,
0 OR 0 AS or_false,
NOT 1 AS not_true,
NOT 0 AS not_false,
1 XOR 0 AS xor_true,
1 XOR 1 AS xor_false;
-- Using logical operators in WHERE clauses
SELECT * FROM students
WHERE (age > 18 AND status = 'active')
OR (grade >= 90 AND NOT suspended);
Bitwise Operators
Perform operations on individual bits of integer values:
| Operator | Description | Example | Result (Binary) |
|---|---|---|---|
| & | Bitwise AND | SELECT 5 & 3; | 1 (101 & 011 = 001) |
| | | Bitwise OR | SELECT 5 | 3; | 7 (101 | 011 = 111) |
| ^ | Bitwise XOR | SELECT 5 ^ 3; | 6 (101 ^ 011 = 110) |
| ~ | Bitwise NOT | SELECT ~1; | 4294967294 (~0001 = 1110) |
| << | Left shift | SELECT 1 << 2; | 4 (0001 << 2 = 0100) |
| >> | Right shift | SELECT 8 >> 2; | 2 (1000 >> 2 = 0010) |
5 & 3 AS bitwise_and, -- 101 AND 011 = 001 (1)
5 | 3 AS bitwise_or, -- 101 OR 011 = 111 (7)
5 ^ 3 AS bitwise_xor, -- 101 XOR 011 = 110 (6)
~1 AS bitwise_not, -- NOT 0001 = 1110 (large number)
1 << 2 AS left_shift, -- 0001 << 2 = 0100 (4)
8 >> 2 AS right_shift; -- 1000 >> 2 = 0010 (2)
Special SQL Operators
Unique operators for specific SQL operations:
| Operator | Description | Syntax | Example |
|---|---|---|---|
| BETWEEN | Range check (inclusive) | value BETWEEN min AND max | WHERE age BETWEEN 18 AND 65 |
| IN | Check value in list | value IN (val1, val2, ...) | WHERE status IN ('active', 'pending') |
| LIKE | Pattern matching | string LIKE pattern | WHERE name LIKE 'J%' |
| IS NULL | Check for NULL value | column IS NULL | WHERE email IS NULL |
| IS NOT NULL | Check for non-NULL value | column IS NOT NULL | WHERE phone IS NOT NULL |
| REGEXP | Regular expression match | string REGEXP pattern | WHERE email REGEXP '^[A-Za-z]' |
SELECT * FROM students
WHERE age BETWEEN 18 AND 25;
-- IN example
SELECT * FROM students
WHERE department IN ('CS', 'MATH', 'PHYSICS');
-- LIKE example
SELECT * FROM students
WHERE email LIKE '%@gmail.com';
-- IS NULL example
SELECT * FROM students
WHERE middle_name IS NULL;
-- REGEXP example
SELECT * FROM students
WHERE name REGEXP '^J.*n$'; -- Names starting with J and ending with n
Operator Precedence
Order in which operators are evaluated (highest to lowest):
| Precedence | Operators | Description |
|---|---|---|
| 1 | ! | Logical NOT |
| 2 | - (unary), ~ | Unary minus, bitwise NOT |
| 3 | ^ | Bitwise XOR |
| 4 | *, /, DIV, %, MOD | Multiplication, division |
| 5 | +, - | Addition, subtraction |
| 6 | <<, >> | Bitwise shifts |
| 7 | & | Bitwise AND |
| 8 | | | Bitwise OR |
| 9 |
=, <=>, >=, >, <=, <, <>, != IS, LIKE, REGEXP, IN |
Comparison, pattern matching |
| 10 | BETWEEN, CASE, WHEN, THEN, ELSE | Special operators |
| 11 | NOT | Logical NOT |
| 12 | AND, && | Logical AND |
| 13 | XOR | Logical XOR |
| 14 | OR, || | Logical OR |
| 15 | := | Assignment |
SELECT 2 + 3 * 4; -- Result: 14 (not 20)
-- AND has higher precedence than OR
SELECT 1 OR 0 AND 0; -- Result: 1 (not 0)
-- Equivalent to: 1 OR (0 AND 0) = 1 OR 0 = 1
-- Use parentheses to control evaluation
SELECT (1 OR 0) AND 0; -- Result: 0
Best Practices for Using Operators
- Use parentheses to clarify complex expressions
- Use IS NULL/IS NOT NULL instead of = NULL/!= NULL
- Prefer IN over multiple OR conditions for better readability
- Use BETWEEN for range checks instead of separate comparisons
- Be mindful of operator precedence in complex expressions
- Use appropriate operators for the data type (string vs numeric)
- Test your expressions with sample data to ensure correct logic
Common Operator Mistakes
- Using = NULL instead of IS NULL
- Forgetting operator precedence in complex expressions
- Using bitwise operators instead of logical operators
- Mixing data types in comparisons
- Using AND when OR is needed (and vice versa)
- Forgetting that NULL values affect logical operations
- Using arithmetic operators on non-numeric data
Practice Exercise
Practice using different operators with these scenarios:
- Calculate the total price including 8% tax for products
- Find students with ages between 20 and 25 using BETWEEN
- Use IN operator to find students in specific departments
- Combine AND/OR to find active students with high GPA or honors status
- Use bitwise operators to check if a number is even or odd
- Create complex WHERE conditions using proper parentheses
- Use REGEXP to find email addresses with specific patterns
- Test operator precedence with mixed arithmetic and logical operations
UPDATE & DELETE Statements in MySQL
Why UPDATE & DELETE Matter
These statements are essential for database maintenance:
- Data Correction: Fix errors in existing records
- Data Maintenance: Keep information current and accurate
- Data Archiving: Remove obsolete or temporary data
- Business Logic: Implement status changes and updates
- Database Cleanup: Remove unwanted or duplicate records
UPDATE Statement Overview
The UPDATE statement modifies existing records:
| Clause | Purpose | Required |
|---|---|---|
| UPDATE | Specifies table to update | Yes |
| SET | Specifies columns and new values | Yes |
| WHERE | Specifies which records to update | No (but highly recommended) |
| ORDER BY | Specifies update order | No |
| LIMIT | Limits number of records updated | No |
Basic UPDATE Syntax
Let's start with basic UPDATE operations on our students table:
UPDATE students
SET email = 'john.new@email.com'
WHERE student_id = 1;
-- Update multiple columns for a student
UPDATE students
SET
first_name = 'Jonathan',
last_name = 'Doerson',
email = 'jonathan.doerson@email.com'
WHERE student_id = 1;
-- Update with expression
UPDATE products
SET price = price * 1.1 -- Increase price by 10%
WHERE category = 'Electronics';
Rows matched: 1 Changed: 1 Warnings: 0
Advanced UPDATE Operations
More complex UPDATE scenarios with multiple conditions and joins:
UPDATE students
SET status = 'alumni'
WHERE graduation_year < 2020
AND status = 'active';
-- Update with LIMIT (safe testing)
UPDATE students
SET email = CONCAT(username, '@university.edu')
WHERE email IS NULL
LIMIT 10; -- Update only 10 records
-- Update using subquery
UPDATE students
SET scholarship_amount = (
SELECT AVG(scholarship_amount)
FROM students
WHERE department = 'Computer Science'
)
WHERE student_id = 5;
UPDATE with JOIN
Update records based on values from related tables:
UPDATE students s
JOIN (
SELECT student_id, COUNT(*) AS course_count
FROM enrollments
GROUP BY student_id
) e ON s.student_id = e.student_id
SET s.status = 'full_time'
WHERE e.course_count >= 4;
-- Update product prices based on supplier costs
UPDATE products p
JOIN suppliers s ON p.supplier_id = s.supplier_id
SET p.price = s.cost_price * 1.3 -- 30% markup
WHERE p.category = 'Electronics';
DELETE Statement Overview
The DELETE statement removes records from a table:
| Clause | Purpose | Required |
|---|---|---|
| DELETE FROM | Specifies table to delete from | Yes |
| WHERE | Specifies which records to delete | No (but CRITICAL to include) |
| ORDER BY | Specifies deletion order | No |
| LIMIT | Limits number of records deleted | No |
Basic DELETE Syntax
Let's explore DELETE operations with proper safety measures:
DELETE FROM students
WHERE student_id = 10;
-- Delete students with specific criteria
DELETE FROM students
WHERE graduation_year < 2010
AND status = 'inactive';
-- Safe DELETE with LIMIT
DELETE FROM temp_logs
WHERE created_date < '2023-01-01'
LIMIT 1000; -- Delete in batches
DELETE Safety Practices
| Practice | Description | Example |
|---|---|---|
| Test with SELECT first | Verify records to be deleted | SELECT * FROM table WHERE condition |
| Use transactions | Allow rollback if needed | START TRANSACTION; DELETE...; ROLLBACK; |
| Use LIMIT | Delete in controlled batches | DELETE ... LIMIT 1000 |
| Backup data | Always have recent backups | mysqldump database table |
| Use soft deletes | Mark as deleted instead of removing | UPDATE SET is_deleted = 1 |
Safe DELETE Workflow
Always follow this safety procedure before deleting:
SELECT * FROM students
WHERE graduation_year < 2010
AND status = 'inactive';
-- Step 2: Use transaction for safety
START TRANSACTION;
-- Step 3: Perform DELETE within transaction
DELETE FROM students
WHERE graduation_year < 2010
AND status = 'inactive';
-- Step 4: Verify the results
SELECT ROW_COUNT(); -- Check how many rows were affected
-- Step 5: COMMIT if correct, ROLLBACK if mistake
COMMIT; -- OR ROLLBACK; to undo
DELETE with JOIN
Delete records based on relationships with other tables:
DELETE e FROM enrollments e
JOIN students s ON e.student_id = s.student_id
WHERE s.graduation_year < 2020;
-- Delete using subquery (alternative approach)
DELETE FROM enrollments
WHERE student_id IN (
SELECT student_id
FROM students
WHERE graduation_year < 2020
);
-- Delete orphaned records (no foreign key relationship)
DELETE e FROM enrollments e
LEFT JOIN students s ON e.student_id = s.student_id
WHERE s.student_id IS NULL;
TRUNCATE vs DELETE
Understanding the difference between TRUNCATE and DELETE:
| Aspect | DELETE | TRUNCATE |
|---|---|---|
| Operation Type | DML (Data Manipulation Language) | DDL (Data Definition Language) |
| WHERE Clause | Supports WHERE for conditional deletion | No WHERE clause - removes all rows |
| Performance | Slower for large tables (row-by-row) | Faster (drops and recreates table) |
| Auto-increment | Does not reset auto-increment counter | Resets auto-increment counter to 1 |
| Triggers | Fires DELETE triggers | Does not fire triggers |
| Rollback | Can be rolled back in transactions | Cannot be rolled back in some databases |
| Use Case | Selective deletion of specific records | Complete table cleanup |
TRUNCATE TABLE temp_logs;
-- ❌ TRUNCATE cannot use WHERE clause
-- This will cause an error:
-- TRUNCATE TABLE students WHERE status = 'inactive';
Handling Foreign Key Constraints
Special considerations when updating or deleting related records:
DELETE FROM students
WHERE student_id = 1;
-- Error: Cannot delete or update a parent row: a foreign key constraint fails
-- Solution 1: Delete child records first
DELETE FROM enrollments
WHERE student_id = 1;
DELETE FROM students
WHERE student_id = 1;
-- Solution 2: Use CASCADE delete (if defined)
-- When foreign key is defined with ON DELETE CASCADE
DELETE FROM students
WHERE student_id = 1;
-- Automatically deletes related enrollments
Best Practices for UPDATE & DELETE
- Always backup data before major UPDATE/DELETE operations
- Use transactions for multi-step operations
- Test WHERE clauses with SELECT first
- Use LIMIT for large operations to avoid locking
- Consider using soft deletes (is_deleted flag) instead of physical deletes
- Be mindful of foreign key constraints and CASCADE rules
- Keep audit logs of significant data changes
Common UPDATE & DELETE Mistakes
- Omitting WHERE clause: Updates/deletes all records
- Incorrect WHERE conditions: Affects wrong records
- Not testing with SELECT first: No verification
- Ignoring foreign key constraints: Causes errors
- Not using transactions: No rollback capability
- Forgetting about triggers: Unexpected side effects
- Not considering performance: Locks large tables
Practice Exercise
Practice these UPDATE and DELETE scenarios safely:
- Update all product prices by applying a 15% discount for clearance items
- Change the email domain for all students from @old-school.edu to @new-school.edu
- Delete temporary log records older than 30 days (use LIMIT for safety)
- Update student status to 'graduated' for those with graduation_year in the past
- Delete duplicate student records (keep the one with the latest enrollment_date)
- Use a transaction to safely update multiple related tables
- Practice the safe DELETE workflow with SELECT verification
- Update records using a JOIN with another table
JOIN Operations in MySQL
Why JOIN Operations Matter
JOIN operations are essential for:
- Data Relationships: Combining related data from multiple tables
- Normalized Databases: Working with properly structured databases
- Complex Queries: Answering business questions across multiple entities
- Reporting: Creating comprehensive reports from related data
- Data Analysis: Analyzing relationships between different data entities
Sample Data for JOIN Examples
Let's set up sample tables to demonstrate different JOIN types:
CREATE TABLE students (
student_id INT PRIMARY KEY AUTO_INCREMENT,
first_name VARCHAR(50),
last_name VARCHAR(50)
);
-- Courses table
CREATE TABLE courses (
course_id INT PRIMARY KEY AUTO_INCREMENT,
course_name VARCHAR(100)
);
-- Enrollments table (junction table)
CREATE TABLE enrollments (
enrollment_id INT PRIMARY KEY AUTO_INCREMENT,
student_id INT,
course_id INT,
grade CHAR(2),
FOREIGN KEY (student_id) REFERENCES students(student_id),
FOREIGN KEY (course_id) REFERENCES courses(course_id)
);
JOIN Types Overview
| JOIN Type | Description | Venn Diagram | Use Case |
|---|---|---|---|
| INNER JOIN | Returns matching rows from both tables | Intersection | Find related records that exist in both tables |
| LEFT JOIN | Returns all rows from left table + matching rows from right | Full left circle + intersection | Find all records from left table with optional related data |
| RIGHT JOIN | Returns all rows from right table + matching rows from left | Full right circle + intersection | Find all records from right table with optional related data |
| FULL OUTER JOIN | Returns all rows when there's a match in either table | Union of both circles | Find all records from both tables (MySQL workaround needed) |
| CROSS JOIN | Returns Cartesian product of both tables | All combinations | Generate all possible combinations |
| SELF JOIN | Join a table with itself | Same table | Find relationships within the same table |
INNER JOIN
Returns only the rows that have matching values in both tables:
SELECT
s.first_name, s.last_name,
c.course_name, e.grade
FROM students s
INNER JOIN enrollments e ON s.student_id = e.student_id
INNER JOIN courses c ON e.course_id = c.course_id;
-- INNER JOIN with WHERE clause
SELECT
s.first_name, s.last_name, c.course_name
FROM students s
INNER JOIN enrollments e ON s.student_id = e.student_id
INNER JOIN courses c ON e.course_id = c.course_id
WHERE c.course_name = 'Mathematics';
-- Traditional syntax (older style, not recommended)
SELECT s.first_name, s.last_name, c.course_name
FROM students s, enrollments e, courses c
WHERE s.student_id = e.student_id
AND e.course_id = c.course_id;
| first_name | last_name | course_name | grade |
+------------+-----------+-------------+-------+
| John | Doe | Mathematics | A |
| Jane | Smith | Science | B+ |
| Bob | Johnson | Mathematics | A- |
+------------+-----------+-------------+-------+
3 rows in set (0.00 sec)
LEFT JOIN
Returns all rows from the left table and matched rows from the right table. Unmatched rows from the right table will contain NULLs:
SELECT
s.first_name, s.last_name,
c.course_name, e.grade
FROM students s
LEFT JOIN enrollments e ON s.student_id = e.student_id
LEFT JOIN courses c ON e.course_id = c.course_id;
-- Find students with no enrollments
SELECT
s.first_name, s.last_name
FROM students s
LEFT JOIN enrollments e ON s.student_id = e.student_id
WHERE e.enrollment_id IS NULL;
-- LEFT JOIN with multiple conditions
SELECT
s.first_name, s.last_name,
c.course_name, e.grade
FROM students s
LEFT JOIN enrollments e ON s.student_id = e.student_id
AND e.grade > 'B'
LEFT JOIN courses c ON e.course_id = c.course_id;
RIGHT JOIN
Returns all rows from the right table and matched rows from the left table. Unmatched rows from the left table will contain NULLs:
SELECT
c.course_name,
s.first_name, s.last_name
FROM students s
RIGHT JOIN enrollments e ON s.student_id = e.student_id
RIGHT JOIN courses c ON e.course_id = c.course_id;
-- Find courses with no students enrolled
SELECT
c.course_name
FROM students s
RIGHT JOIN enrollments e ON s.student_id = e.student_id
RIGHT JOIN courses c ON e.course_id = c.course_id
WHERE s.student_id IS NULL;
FULL OUTER JOIN (MySQL Workaround)
MySQL doesn't support FULL OUTER JOIN directly, but we can simulate it using UNION:
SELECT
s.first_name, s.last_name,
c.course_name, e.grade
FROM students s
LEFT JOIN enrollments e ON s.student_id = e.student_id
LEFT JOIN courses c ON e.course_id = c.course_id
UNION
SELECT
s.first_name, s.last_name,
c.course_name, e.grade
FROM students s
RIGHT JOIN enrollments e ON s.student_id = e.student_id
RIGHT JOIN courses c ON e.course_id = c.course_id
WHERE s.student_id IS NULL;
CROSS JOIN
Returns the Cartesian product of both tables (every row from first table combined with every row from second table):
SELECT
s.first_name, s.last_name,
c.course_name
FROM students s
CROSS JOIN courses c;
-- CROSS JOIN with WHERE (similar to INNER JOIN)
SELECT
s.first_name, s.last_name,
c.course_name
FROM students s
CROSS JOIN courses c
WHERE s.department = c.department;
SELF JOIN
Join a table with itself to find relationships within the same table:
SELECT
a.first_name, a.last_name,
b.first_name AS same_last_name_student
FROM students a
INNER JOIN students b ON a.last_name = b.last_name
WHERE a.student_id != b.student_id;
-- Find hierarchical relationships (manager-employee)
SELECT
e.first_name AS employee_name,
m.first_name AS manager_name
FROM employees e
LEFT JOIN employees m ON e.manager_id = m.employee_id;
JOIN Conditions and Multiple Tables
| Scenario | JOIN Approach | Example |
|---|---|---|
| Two tables | Single JOIN condition | FROM A JOIN B ON A.id = B.a_id |
| Three tables | Two JOIN conditions | FROM A JOIN B ON A.id = B.a_id JOIN C ON B.id = C.b_id |
| Multiple conditions | AND in ON clause | FROM A JOIN B ON A.id = B.a_id AND A.type = B.type |
| Complex relationships | Multiple JOIN types | FROM A LEFT JOIN B ON... INNER JOIN C ON... |
| Self-referencing | SELF JOIN with aliases | FROM A a1 JOIN A a2 ON a1.parent = a2.id |
JOIN Performance Considerations
| Factor | Impact | Recommendation |
|---|---|---|
| Indexes | High impact | Index columns used in JOIN conditions |
| JOIN Order | Moderate impact | Put smaller tables first in JOIN sequence |
| JOIN Type | High impact | Use INNER JOIN when possible, avoid CROSS JOIN |
| WHERE Clause | High impact | Filter early with WHERE before JOINs |
| Number of Tables | High impact | Avoid joining too many tables in one query |
| Data Types | Moderate impact | Ensure JOIN columns have same data types |
Common JOIN Patterns
Real-world scenarios where different JOIN types are useful:
SELECT o.order_id, o.order_date, c.customer_name
FROM orders o
INNER JOIN customers c ON o.customer_id = c.customer_id;
-- Pattern 2: Find products never ordered
SELECT p.product_name
FROM products p
LEFT JOIN order_items oi ON p.product_id = oi.product_id
WHERE oi.order_id IS NULL;
-- Pattern 3: Many-to-many relationship
SELECT s.first_name, s.last_name, c.course_name
FROM students s
INNER JOIN enrollments e ON s.student_id = e.student_id
INNER JOIN courses c ON e.course_id = c.course_id;
-- Pattern 4: Hierarchical data
SELECT e.employee_name, m.employee_name AS manager_name
FROM employees e
LEFT JOIN employees m ON e.manager_id = m.employee_id;
JOIN Best Practices
- Always use explicit JOIN syntax instead of comma-separated tables
- Use table aliases for better readability with multiple tables
- Put filter conditions in WHERE clause rather than ON clause when possible
- Use INNER JOIN when you only want matching records
- Use LEFT JOIN when you want all records from the left table
- Index columns used in JOIN conditions for better performance
- Be consistent with your JOIN syntax throughout the application
- Test JOIN queries with EXPLAIN to understand performance
Common JOIN Mistakes
- Cartesian products: Forgetting JOIN conditions
- Wrong JOIN type: Using INNER JOIN when LEFT JOIN is needed
- Performance issues: Joining too many tables without indexes
- Ambiguous columns: Not qualifying column names with table aliases
- NULL handling: Not considering NULL values in JOIN conditions
- Condition placement: Putting filter conditions in wrong clause (ON vs WHERE)
- Data type mismatches: Joining columns with different data types
Practice Exercise
Practice these JOIN scenarios:
- Create INNER JOIN to find all students enrolled in 'Computer Science' courses
- Use LEFT JOIN to find all courses and count how many students are enrolled in each
- Write a query to find students who are not enrolled in any courses
- Create a SELF JOIN to find employees and their managers
- Use multiple JOINs to display student names, course names, and instructor names
- Write a query using RIGHT JOIN to find all departments and their employees
- Create a CROSS JOIN to generate all possible product-size combinations
- Use JOIN with aggregate functions to find the average grade per course
Aggregate Functions in MySQL
Why Aggregate Functions Matter
Aggregate functions are crucial for:
- Data Summarization: Creating summary reports from detailed data
- Statistical Analysis: Calculating averages, totals, and distributions
- Business Intelligence: Supporting decision-making with summarized data
- Performance Monitoring: Tracking metrics and KPIs
- Data Quality: Identifying data patterns and anomalies
Common Aggregate Functions Overview
| Function | Description | Data Types | NULL Handling |
|---|---|---|---|
| COUNT() | Counts the number of rows | All types | Ignores NULL in COUNT(column) |
| SUM() | Calculates the sum of values | Numeric | Ignores NULL values |
| AVG() | Calculates the average value | Numeric | Ignores NULL values |
| MIN() | Finds the minimum value | All comparable types | Ignores NULL values |
| MAX() | Finds the maximum value | All comparable types | Ignores NULL values |
| GROUP_CONCAT() | Concatenates values from a group | String | Ignores NULL values |
| STDDEV() | Calculates standard deviation | Numeric | Ignores NULL values |
| VARIANCE() | Calculates statistical variance | Numeric | Ignores NULL values |
Sample Data for Examples
Let's use a sample sales data table for our examples:
sale_id INT PRIMARY KEY AUTO_INCREMENT,
product_name VARCHAR(100),
category VARCHAR(50),
sale_date DATE,
quantity INT,
unit_price DECIMAL(10, 2),
region VARCHAR(50)
);
COUNT() Function
Counts the number of rows or non-NULL values:
SELECT COUNT(*) AS total_sales FROM sales;
-- Count non-NULL values in a specific column
SELECT COUNT(region) AS sales_with_region FROM sales;
-- Count distinct values
SELECT COUNT(DISTINCT category) AS unique_categories FROM sales;
-- Count with condition
SELECT COUNT(*) AS high_value_sales
FROM sales
WHERE unit_price > 100;
| total_sales |
+-------------+
| 150 |
+-------------+
1 row in set (0.00 sec)
SUM() Function
Calculates the sum of numeric values:
SELECT SUM(quantity) AS total_quantity FROM sales;
-- Total revenue (quantity * unit_price)
SELECT SUM(quantity * unit_price) AS total_revenue FROM sales;
-- Sum with condition
SELECT SUM(unit_price) AS electronics_revenue
FROM sales
WHERE category = 'Electronics';
-- Sum distinct values (rarely used)
SELECT SUM(DISTINCT unit_price) AS sum_unique_prices FROM sales;
AVG() Function
Calculates the average value of numeric columns:
SELECT AVG(unit_price) AS average_price FROM sales;
-- Average with ROUND for readability
SELECT ROUND(AVG(unit_price), 2) AS average_price FROM sales;
-- Average quantity per sale
SELECT AVG(quantity) AS avg_quantity FROM sales;
-- Average with condition
SELECT AVG(unit_price) AS avg_electronics_price
FROM sales
WHERE category = 'Electronics';
MIN() and MAX() Functions
Find the minimum and maximum values in a column:
SELECT
MIN(unit_price) AS min_price,
MAX(unit_price) AS max_price,
MAX(unit_price) - MIN(unit_price) AS price_range
FROM sales;
-- Date range of sales
SELECT
MIN(sale_date) AS first_sale,
MAX(sale_date) AS last_sale
FROM sales;
-- Minimum and maximum by category
SELECT
category,
MIN(unit_price) AS min_price,
MAX(unit_price) AS max_price
FROM sales
GROUP BY category;
GROUP_CONCAT() Function
Concatenates values from multiple rows into a single string:
SELECT
category,
GROUP_CONCAT(product_name) AS products
FROM sales
GROUP BY category;
-- With custom separator
SELECT
category,
GROUP_CONCAT(product_name SEPARATOR ' | ') AS products
FROM sales
GROUP BY category;
-- With DISTINCT and ORDER BY
SELECT
region,
GROUP_CONCAT(DISTINCT category ORDER BY category) AS categories
FROM sales
GROUP BY region;
-- Limited number of values
SELECT
category,
GROUP_CONCAT(product_name ORDER BY unit_price DESC LIMIT 3) AS top_products
FROM sales
GROUP BY category;
Statistical Aggregate Functions
Advanced statistical calculations:
SELECT
STDDEV(unit_price) AS price_std_dev,
VARIANCE(unit_price) AS price_variance
FROM sales;
-- Population vs sample standard deviation
SELECT
STDDEV_POP(unit_price) AS population_stddev,
STDDEV_SAMP(unit_price) AS sample_stddev
FROM sales;
Using Aggregate Functions with GROUP BY
Group data and calculate aggregates for each group:
SELECT
category,
COUNT(*) AS total_sales,
SUM(quantity) AS total_quantity,
SUM(quantity * unit_price) AS total_revenue,
AVG(unit_price) AS avg_price
FROM sales
GROUP BY category;
-- Monthly sales summary
SELECT
YEAR(sale_date) AS sale_year,
MONTH(sale_date) AS sale_month,
COUNT(*) AS total_sales,
SUM(quantity * unit_price) AS monthly_revenue
FROM sales
GROUP BY YEAR(sale_date), MONTH(sale_date)
ORDER BY sale_year, sale_month;
-- Multiple grouping columns
SELECT
region,
category,
COUNT(*) AS sales_count,
SUM(quantity * unit_price) AS region_category_revenue
FROM sales
GROUP BY region, category
ORDER BY region, category;
HAVING Clause with Aggregate Functions
Filter groups based on aggregate results:
SELECT
category,
COUNT(*) AS sales_count
FROM sales
GROUP BY category
HAVING COUNT(*) > 10;
-- Categories with average price over $50
SELECT
category,
AVG(unit_price) AS avg_price
FROM sales
GROUP BY category
HAVING AVG(unit_price) > 50;
-- Multiple HAVING conditions
SELECT
region,
SUM(quantity * unit_price) AS total_revenue,
COUNT(*) AS sales_count
FROM sales
GROUP BY region
HAVING SUM(quantity * unit_price) > 10000
AND COUNT(*) > 5;
Aggregate Function Behavior
| Scenario | Behavior | Example Result |
|---|---|---|
| Empty table with COUNT(*) | Returns 0 | COUNT(*) → 0 |
| All NULL values with SUM/AVG | Returns NULL | SUM(NULL_column) → NULL |
| Mixed NULL and values with SUM | Ignores NULL, sums values | SUM(10, NULL, 20) → 30 |
| MIN/MAX with all NULLs | Returns NULL | MAX(NULL_column) → NULL |
| COUNT(column) with all NULLs | Returns 0 | COUNT(NULL_column) → 0 |
| GROUP BY with no rows | Returns empty result set | No rows returned |
Advanced Aggregate Techniques
Combining multiple aggregate functions and techniques:
SELECT
category,
SUM(quantity * unit_price) AS category_revenue,
(SUM(quantity * unit_price) / (SELECT SUM(quantity * unit_price) FROM sales)) * 100 AS revenue_percentage
FROM sales
GROUP BY category;
-- Running totals using window functions (MySQL 8.0+)
SELECT
sale_date,
quantity * unit_price AS daily_revenue,
SUM(quantity * unit_price) OVER (ORDER BY sale_date) AS running_total
FROM sales
ORDER BY sale_date;
-- Conditional aggregates
SELECT
region,
SUM(CASE WHEN category = 'Electronics' THEN quantity * unit_price ELSE 0 END) AS electronics_revenue,
SUM(CASE WHEN category = 'Clothing' THEN quantity * unit_price ELSE 0 END) AS clothing_revenue
FROM sales
GROUP BY region;
Best Practices for Aggregate Functions
- Use COUNT(*) when you need to count all rows, including NULLs
- Use COUNT(column) when you want to count non-NULL values only
- Be aware of NULL handling in aggregate functions
- Use GROUP BY appropriately to avoid unexpected results
- Use HAVING for filtering groups, WHERE for filtering rows
- Consider indexing columns used in GROUP BY for better performance
- Use ROUND() with AVG() for more readable results
- Test your aggregate queries with sample data to verify calculations
Common Aggregate Function Mistakes
- Mixing aggregated and non-aggregated columns without GROUP BY
- Using WHERE instead of HAVING for aggregate conditions
- Forgetting NULL handling in calculations
- Incorrect GROUP BY columns leading to wrong groupings
- Performance issues with large datasets and no indexes
- Misunderstanding COUNT(*) vs COUNT(column) behavior
- Floating point precision issues with AVG()
Practice Exercise
Practice these aggregate function scenarios:
- Calculate total revenue, average sale amount, and number of sales
- Find the top 3 selling products by quantity
- Calculate monthly sales trends with revenue and average order value
- Use GROUP_CONCAT to list all customers who purchased each product
- Find categories with average price above overall average price
- Calculate running totals of sales using window functions
- Use conditional aggregates to compare sales across different regions
- Find products that have never been sold (using COUNT and LEFT JOIN)
GROUP BY & HAVING in MySQL
Why GROUP BY & HAVING Matter
These clauses are essential for:
- Data Summarization: Creating meaningful summaries from detailed data
- Category Analysis: Analyzing data by different categories or segments
- Statistical Reporting: Generating reports with totals, averages, and counts
- Data Segmentation: Breaking down data into meaningful groups
- Business Intelligence: Supporting decision-making with grouped insights
GROUP BY vs HAVING Comparison
| Aspect | GROUP BY | HAVING |
|---|---|---|
| Purpose | Groups rows into summary rows | Filters groups based on aggregate conditions |
| Operation Level | Works on individual rows before grouping | Works on grouped results after aggregation |
| Used With | Aggregate functions (COUNT, SUM, AVG, etc.) | Aggregate functions in conditions |
| Position in Query | After WHERE, before HAVING | After GROUP BY, before ORDER BY |
| Filter Type | Cannot use aggregate functions | Must use aggregate functions |
| Performance | Benefits from indexed columns | Applied after grouping is complete |
Sample Data for Examples
Let's use an enhanced sales dataset for our examples:
sale_id INT PRIMARY KEY AUTO_INCREMENT,
product_name VARCHAR(100),
category VARCHAR(50),
sale_date DATE,
quantity INT,
unit_price DECIMAL(10, 2),
region VARCHAR(50),
salesperson VARCHAR(50)
);
Basic GROUP BY Syntax
The fundamental structure of GROUP BY queries:
column1, column2, ..., aggregate_function(column)
FROM table_name
WHERE condition
GROUP BY column1, column2, ...
HAVING aggregate_condition
ORDER BY column;
Single Column GROUP BY
Grouping by a single column with various aggregate functions:
SELECT
category,
COUNT(*) AS sales_count
FROM sales
GROUP BY category
ORDER BY sales_count DESC;
-- Total revenue by region
SELECT
region,
SUM(quantity * unit_price) AS total_revenue
FROM sales
GROUP BY region
ORDER BY total_revenue DESC;
-- Average price by category
SELECT
category,
ROUND(AVG(unit_price), 2) AS average_price,
MIN(unit_price) AS min_price,
MAX(unit_price) AS max_price
FROM sales
GROUP BY category
ORDER BY average_price DESC;
| category | sales_count |
+-------------+-------------+
| Electronics | 45 |
| Clothing | 38 |
| Books | 27 |
+-------------+-------------+
3 rows in set (0.00 sec)
Multiple Column GROUP BY
Grouping by multiple columns for more detailed analysis:
SELECT
region,
category,
COUNT(*) AS sales_count,
SUM(quantity * unit_price) AS total_revenue
FROM sales
GROUP BY region, category
ORDER BY region, total_revenue DESC;
-- Monthly sales by category
SELECT
YEAR(sale_date) AS sale_year,
MONTH(sale_date) AS sale_month,
category,
SUM(quantity) AS total_quantity,
SUM(quantity * unit_price) AS monthly_revenue
FROM sales
GROUP BY YEAR(sale_date), MONTH(sale_date), category
ORDER BY sale_year, sale_month, category;
-- Salesperson performance by region and category
SELECT
salesperson,
region,
category,
COUNT(*) AS deals_closed,
SUM(quantity * unit_price) AS total_sales
FROM sales
GROUP BY salesperson, region, category
ORDER BY total_sales DESC;
GROUP BY with Expressions
Using expressions and functions in GROUP BY clauses:
SELECT
YEAR(sale_date) AS sale_year,
MONTH(sale_date) AS sale_month,
COUNT(*) AS sales_count
FROM sales
GROUP BY YEAR(sale_date), MONTH(sale_date)
ORDER BY sale_year, sale_month;
-- Group by price ranges
SELECT
CASE
WHEN unit_price < 50 THEN 'Budget'
WHEN unit_price < 100 THEN 'Mid-range'
ELSE 'Premium'
END AS price_category,
COUNT(*) AS product_count,
AVG(unit_price) AS avg_price
FROM sales
GROUP BY price_category
ORDER BY avg_price;
-- Group by quarter
SELECT
YEAR(sale_date) AS sale_year,
QUARTER(sale_date) AS sale_quarter,
SUM(quantity * unit_price) AS quarterly_revenue
FROM sales
GROUP BY YEAR(sale_date), QUARTER(sale_date)
ORDER BY sale_year, sale_quarter;
HAVING Clause Fundamentals
Using HAVING to filter groups based on aggregate conditions:
SELECT
category,
COUNT(*) AS sales_count
FROM sales
GROUP BY category
HAVING COUNT(*) > 10
ORDER BY sales_count DESC;
-- Regions with total revenue over $10,000
SELECT
region,
SUM(quantity * unit_price) AS total_revenue
FROM sales
GROUP BY region
HAVING SUM(quantity * unit_price) > 10000
ORDER BY total_revenue DESC;
-- Categories with average price above $50
SELECT
category,
AVG(unit_price) AS average_price
FROM sales
GROUP BY category
HAVING AVG(unit_price) > 50
ORDER BY average_price DESC;
Complex HAVING Conditions
Using multiple conditions and different aggregate functions in HAVING:
SELECT
region,
COUNT(*) AS sales_count,
SUM(quantity * unit_price) AS total_revenue
FROM sales
GROUP BY region
HAVING COUNT(*) > 5
AND SUM(quantity * unit_price) > 5000
ORDER BY total_revenue DESC;
-- HAVING with OR conditions
SELECT
category,
COUNT(*) AS sales_count,
AVG(unit_price) AS avg_price
FROM sales
GROUP BY category
HAVING COUNT(*) > 20
OR AVG(unit_price) > 100
ORDER BY sales_count DESC;
-- HAVING with different aggregate functions
SELECT
salesperson,
COUNT(*) AS deals,
AVG(unit_price) AS avg_deal_size,
SUM(quantity * unit_price) AS total_sales
FROM sales
GROUP BY salesperson
HAVING COUNT(*) >= 10
AND AVG(unit_price) > 50
AND SUM(quantity * unit_price) > 1000
ORDER BY total_sales DESC;
WHERE vs HAVING Comparison
| Scenario | Use WHERE | Use HAVING |
|---|---|---|
| Filter individual rows | ✓ BEFORE grouping | ✗ Not applicable |
| Filter based on aggregate results | ✗ Cannot use aggregates | ✓ AFTER grouping |
| Filter on non-aggregated columns | ✓ More efficient | ✓ Possible but less efficient |
| Performance impact | Reduces rows before grouping | Filters after grouping is done |
| Column aliases | ✗ Cannot use SELECT aliases | ✓ Can use SELECT aliases |
SELECT
category,
COUNT(*) AS sales_count
FROM sales
WHERE sale_date >= '2023-01-01'
GROUP BY category
HAVING COUNT(*) > 5;
-- Inefficient: HAVING filters after grouping (same result)
SELECT
category,
COUNT(*) AS sales_count
FROM sales
GROUP BY category
HAVING COUNT(*) > 5
AND MIN(sale_date) >= '2023-01-01';
GROUP BY with JOINs
Combining GROUP BY with table joins for complex analysis:
CREATE TABLE products (
product_id INT PRIMARY KEY,
product_name VARCHAR(100),
supplier_id INT
);
CREATE TABLE suppliers (
supplier_id INT PRIMARY KEY,
supplier_name VARCHAR(100)
);
-- Sales by supplier
SELECT
s.supplier_name,
COUNT(*) AS sales_count,
SUM(sa.quantity * sa.unit_price) AS total_revenue
FROM sales sa
JOIN products p ON sa.product_name = p.product_name
JOIN suppliers s ON p.supplier_id = s.supplier_id
GROUP BY s.supplier_name
HAVING SUM(sa.quantity * sa.unit_price) > 5000
ORDER BY total_revenue DESC;
Advanced GROUP BY Techniques
Rollup, grouping sets, and other advanced grouping features:
SELECT
region,
category,
SUM(quantity * unit_price) AS total_revenue
FROM sales
GROUP BY region, category WITH ROLLUP;
-- Filtering ROLLUP results
SELECT
COALESCE(region, 'All Regions') AS region,
COALESCE(category, 'All Categories') AS category,
SUM(quantity * unit_price) AS total_revenue
FROM sales
GROUP BY region, category WITH ROLLUP
HAVING total_revenue > 1000;
GROUP BY Best Practices
- Include all non-aggregated columns from SELECT in GROUP BY
- Use WHERE to filter rows before grouping for better performance
- Use HAVING only for conditions involving aggregate functions
- Index columns used in GROUP BY for large datasets
- Be mindful of NULL values in grouping columns
- Use meaningful column aliases for aggregate results
- Test your GROUP BY queries with sample data to verify groupings
- Consider the order of grouping columns for logical results
Common GROUP BY & HAVING Mistakes
- Omitting non-aggregated columns from GROUP BY clause
- Using HAVING instead of WHERE for non-aggregate conditions
- Confusing column order in multiple column GROUP BY
- Forgetting about NULL groups in results
- Performance issues with unindexed GROUP BY columns
- Incorrect aggregate calculations due to wrong groupings
- Misunderstanding WHERE vs HAVING execution order
Practice Exercise
Practice these GROUP BY and HAVING scenarios:
- Find the top 5 best-selling products by total quantity sold
- Calculate monthly revenue trends and identify months with revenue over $10,000
- Find salespeople who have closed more than 15 deals with average deal size over $75
- Identify regions where Electronics category sales exceed Clothing sales
- Calculate the percentage contribution of each category to total revenue
- Use WITH ROLLUP to create a sales summary with regional and category subtotals
- Find products that have been sold in all regions
- Identify seasonal patterns by grouping sales by quarter and year
Subqueries in MySQL
Why Subqueries Matter
Subqueries are essential for:
- Complex Filtering: Creating dynamic conditions based on other query results
- Data Comparison: Comparing values against calculated results
- Multi-step Analysis: Breaking complex problems into manageable steps
- Flexible Queries: Writing queries that adapt to changing data
- Advanced Reporting: Creating sophisticated business intelligence queries
Subquery Types Overview
| Type | Description | Common Use | Example |
|---|---|---|---|
| Scalar Subquery | Returns single value | Comparisons, calculations | WHERE price > (SELECT AVG(price) FROM products) |
| Row Subquery | Returns single row | Multiple column comparisons | WHERE (id, name) = (SELECT id, name FROM...) |
| Column Subquery | Returns single column | IN, ANY, ALL operators | WHERE id IN (SELECT product_id FROM orders) |
| Table Subquery | Returns multiple rows/columns | FROM clause, JOINs | SELECT * FROM (SELECT ...) AS subquery |
| Correlated Subquery | References outer query | Row-by-row processing | WHERE salary > (SELECT AVG(salary) FROM employees e2 WHERE e1.dept = e2.dept) |
Sample Data for Examples
Let's use employee and department data for our examples:
employee_id INT PRIMARY KEY AUTO_INCREMENT,
first_name VARCHAR(50),
last_name VARCHAR(50),
department VARCHAR(50),
salary DECIMAL(10, 2),
hire_date DATE
);
CREATE TABLE departments (
department_id INT PRIMARY KEY,
department_name VARCHAR(50),
budget DECIMAL(12, 2)
);
CREATE TABLE projects (
project_id INT PRIMARY KEY,
project_name VARCHAR(100),
department VARCHAR(50),
budget DECIMAL(12, 2)
);
Subqueries in WHERE Clause
Using subqueries to create dynamic conditions:
SELECT first_name, last_name, salary
FROM employees
WHERE salary > (
SELECT AVG(salary) FROM employees
);
-- Employees in departments with high budget
SELECT first_name, last_name, department
FROM employees
WHERE department IN (
SELECT department_name FROM departments
WHERE budget > 1000000
);
-- Employees hired after the last hire in IT department
SELECT first_name, last_name, hire_date
FROM employees
WHERE hire_date > (
SELECT MAX(hire_date) FROM employees
WHERE department = 'IT'
);
| first_name | last_name | salary |
+------------+-----------+----------+
| John | Doe | 85000.00 |
| Jane | Smith | 92000.00 |
| Bob | Johnson | 78000.00 |
+------------+-----------+----------+
3 rows in set (0.00 sec)
Subqueries with Comparison Operators
| Operator | Description | Subquery Must Return | Example |
|---|---|---|---|
| =, !=, >, <, >=, <= | Single value comparison | Single value | WHERE salary > (SELECT AVG(salary) FROM...) |
| IN / NOT IN | Check list membership | Single column | WHERE department IN (SELECT name FROM depts) |
| ANY / SOME | Compare to any value in list | Single column | WHERE salary > ANY (SELECT salary FROM...) |
| ALL | Compare to all values in list | Single column | WHERE salary > ALL (SELECT salary FROM...) |
| EXISTS / NOT EXISTS | Check if subquery returns rows | Any result set | WHERE EXISTS (SELECT 1 FROM projects WHERE...) |
SELECT first_name, last_name, department
FROM employees
WHERE department IN (
SELECT department_name FROM departments
WHERE budget > 500000
);
-- ANY: Employees earning more than any IT employee
SELECT first_name, last_name, salary
FROM employees
WHERE salary > ANY (
SELECT salary FROM employees
WHERE department = 'IT'
);
-- ALL: Employees earning more than all Marketing employees
SELECT first_name, last_name, salary
FROM employees
WHERE salary > ALL (
SELECT salary FROM employees
WHERE department = 'Marketing'
);
EXISTS and NOT EXISTS
Checking for existence of related records:
SELECT department_name
FROM departments d
WHERE EXISTS (
SELECT 1 FROM employees e
WHERE e.department = d.department_name
);
-- Employees without projects
SELECT first_name, last_name
FROM employees e
WHERE NOT EXISTS (
SELECT 1 FROM projects p
WHERE p.department = e.department
);
-- Departments with high-budget projects
SELECT department_name
FROM departments d
WHERE EXISTS (
SELECT 1 FROM projects p
WHERE p.department = d.department_name
AND p.budget > 100000
);
Correlated Subqueries
Subqueries that reference columns from the outer query:
SELECT first_name, last_name, department, salary
FROM employees e1
WHERE salary > (
SELECT AVG(salary) FROM employees e2
WHERE e2.department = e1.department
);
-- Latest hire in each department
SELECT first_name, last_name, department, hire_date
FROM employees e1
WHERE hire_date = (
SELECT MAX(hire_date) FROM employees e2
WHERE e2.department = e1.department
);
-- Employees with salary in top 3 of their department
SELECT first_name, last_name, department, salary
FROM employees e1
WHERE (
SELECT COUNT(DISTINCT e2.salary)
FROM employees e2
WHERE e2.department = e1.department
AND e2.salary > e1.salary
) < 3;
Subqueries in SELECT Clause
Using subqueries to calculate additional columns:
SELECT
first_name, last_name, department, salary,
(
SELECT AVG(salary) FROM employees e2
WHERE e2.department = e1.department
) AS department_avg_salary,
salary - (
SELECT AVG(salary) FROM employees e2
WHERE e2.department = e1.department
) AS difference_from_avg
FROM employees e1
ORDER BY department, salary DESC;
-- Department info with employee count
SELECT
department_name,
budget,
(
SELECT COUNT(*) FROM employees e
WHERE e.department = d.department_name
) AS employee_count
FROM departments d;
Subqueries in FROM Clause (Derived Tables)
Using subqueries as temporary tables in the FROM clause:
SELECT
department,
avg_salary,
employee_count
FROM (
SELECT
department,
AVG(salary) AS avg_salary,
COUNT(*) AS employee_count
FROM employees
GROUP BY department
) AS dept_summary
WHERE avg_salary > 60000
ORDER BY avg_salary DESC;
-- Complex analysis with multiple derived tables
SELECT
d.department_name,
d.budget,
es.avg_salary,
es.employee_count
FROM departments d
JOIN (
SELECT
department,
AVG(salary) AS avg_salary,
COUNT(*) AS employee_count
FROM employees
GROUP BY department
) es ON d.department_name = es.department;
Subqueries in HAVING Clause
Using subqueries to filter grouped results:
SELECT
department,
AVG(salary) AS avg_salary
FROM employees
GROUP BY department
HAVING AVG(salary) > (
SELECT AVG(salary) FROM employees
);
-- Departments with more employees than the smallest department
SELECT
department,
COUNT(*) AS employee_count
FROM employees
GROUP BY department
HAVING COUNT(*) > (
SELECT MIN(emp_count) FROM (
SELECT COUNT(*) AS emp_count
FROM employees
GROUP BY department
) AS dept_counts
);
Subquery Performance Considerations
| Subquery Type | Performance Impact | Optimization Tips |
|---|---|---|
| Scalar Subquery | Low to Moderate | Ensure subquery returns single value quickly |
| Correlated Subquery | High (runs for each row) | Consider rewriting as JOIN where possible |
| IN Subquery | Moderate to High | Use EXISTS instead for better performance |
| Derived Table (FROM) | Moderate | Limit rows in subquery, add indexes |
| EXISTS Subquery | Good (stops at first match) | Use indexed columns in correlation |
Subquery vs JOIN Comparison
| Aspect | Subquery | JOIN |
|---|---|---|
| Readability | Often more intuitive for complex conditions | Can be cleaner for simple relationships |
| Performance | Correlated subqueries can be slow | Generally faster for large datasets |
| Flexibility | Better for complex calculations and conditions | Better for combining multiple tables |
| NULL Handling | EXISTS handles NULLs well | JOINs may exclude NULL relationships |
| Use Case | When you need to check conditions or calculate values | When you need to combine related data |
SELECT first_name, last_name
FROM employees
WHERE department IN (
SELECT department_name FROM departments
WHERE budget > 1000000
);
-- Using JOIN: Same result
SELECT DISTINCT e.first_name, e.last_name
FROM employees e
JOIN departments d ON e.department = d.department_name
WHERE d.budget > 1000000;
Common Subquery Patterns
Real-world scenarios where subqueries are particularly useful:
SELECT product_name, category, price
FROM products p1
WHERE price = (
SELECT MAX(price) FROM products p2
WHERE p2.category = p1.category
);
-- Pattern 2: Checking for existence of related records
SELECT customer_name
FROM customers c
WHERE EXISTS (
SELECT 1 FROM orders o
WHERE o.customer_id = c.customer_id
AND o.order_date >= '2023-01-01'
);
-- Pattern 3: Comparing to aggregate values
SELECT employee_name, salary
FROM employees
WHERE salary > (
SELECT AVG(salary) FROM employees
);
Subquery Best Practices
- Use EXISTS instead of IN when checking for existence (better performance)
- Avoid correlated subqueries in large datasets when possible
- Use derived tables (FROM subqueries) for complex calculations
- Always test subqueries independently first
- Use meaningful aliases for subquery tables
- Consider rewriting correlated subqueries as JOINs for better performance
- Use appropriate indexes on columns used in subquery conditions
- Limit the number of rows returned by subqueries when possible
Common Subquery Mistakes
- Returning multiple rows when single value expected
- Using correlated subqueries unnecessarily on large tables
- Forgetting table aliases in correlated subqueries
- Performance issues with nested subqueries
- Incorrect operator usage (e.g., = instead of IN)
- NULL handling issues in comparison subqueries
- Circular references in complex subquery structures
Practice Exercise
Practice these subquery scenarios:
- Find employees who earn more than the average salary of their department
- List departments that have no employees using NOT EXISTS
- Find products that have never been ordered using a subquery
- Calculate each employee's salary as a percentage of their department's total salary
- Find the second highest salary in the company using a subquery
- List customers who have placed orders in the last 30 days using EXISTS
- Find departments where the average salary is greater than the company average
- Use a derived table to find monthly sales trends and compare to previous months
Indexes in MySQL
Why Indexes Matter
Indexes are crucial for:
- Query Performance: Dramatically speed up SELECT queries
- Data Retrieval: Enable fast lookups and range scans
- Sorting Efficiency: Improve ORDER BY and GROUP BY performance
- Join Operations: Accelerate table joins on indexed columns
- Constraint Enforcement: Support UNIQUE and PRIMARY KEY constraints
Index Types in MySQL
| Index Type | Description | Use Case | Syntax Example |
|---|---|---|---|
| PRIMARY KEY | Unique identifier for each row, automatically indexed | Every table should have one | PRIMARY KEY (id) |
| UNIQUE Index | Ensures all values are unique, allows NULL | Email, username, product codes | UNIQUE (email) |
| INDEX / KEY | Standard non-unique index for faster searches | Foreign keys, frequently searched columns | INDEX (last_name) |
| FULLTEXT | Specialized index for text searching | Content search, document retrieval | FULLTEXT (content) |
| SPATIAL | For spatial data types (GIS data) | Geographic data, mapping applications | SPATIAL (location) |
| Composite Index | Index on multiple columns | Multi-column searches, covering indexes | INDEX (last_name, first_name) |
How Indexes Work
Understanding the underlying mechanism:
SELECT * FROM employees WHERE last_name = 'Smith';
-- MySQL must check every row in the table
-- With index: Index lookup (like using book index)
CREATE INDEX idx_last_name ON employees(last_name);
SELECT * FROM employees WHERE last_name = 'Smith';
-- MySQL uses index to find matching rows quickly
Creating Indexes
Different ways to create indexes in MySQL:
CREATE TABLE employees (
employee_id INT PRIMARY KEY AUTO_INCREMENT,
first_name VARCHAR(50),
last_name VARCHAR(50),
email VARCHAR(100) UNIQUE,
department VARCHAR(50),
salary DECIMAL(10, 2),
hire_date DATE,
INDEX idx_last_name (last_name),
INDEX idx_department (department),
INDEX idx_hire_date (hire_date)
);
-- Add index to existing table
CREATE INDEX idx_salary ON employees(salary);
-- Create composite index
CREATE INDEX idx_name_department ON employees(last_name, department);
-- Create unique index
CREATE UNIQUE INDEX idx_unique_email ON employees(email);
-- Create fulltext index
CREATE FULLTEXT INDEX idx_ft_content ON articles(content);
When to Create Indexes
| Scenario | Recommendation | Example |
|---|---|---|
| Primary Key Columns | Always (automatically created) | PRIMARY KEY (id) |
| Foreign Key Columns | Highly recommended | INDEX (customer_id) |
| Frequently searched columns | Recommended | WHERE last_name = 'Smith' |
| Columns in WHERE clauses | Consider if selective enough | WHERE status = 'active' |
| JOIN conditions | Highly recommended | ON users.id = orders.user_id |
| ORDER BY / GROUP BY columns | Recommended for large tables | ORDER BY created_date DESC |
| Columns with high selectivity | Good candidate | Email, username, product_code |
| Columns with low selectivity | Avoid (e.g., gender, status flags) | WHERE gender = 'M' |
Composite Indexes
Indexes on multiple columns and their optimal usage:
CREATE INDEX idx_last_first ON employees(last_name, first_name);
-- These queries can use the composite index:
SELECT * FROM employees WHERE last_name = 'Smith';
SELECT * FROM employees WHERE last_name = 'Smith' AND first_name = 'John';
SELECT * FROM employees WHERE last_name LIKE 'S%';
-- These queries CANNOT use the composite index effectively:
SELECT * FROM employees WHERE first_name = 'John'; -- Index skip scan in MySQL 8.0+
SELECT * FROM employees WHERE last_name = 'Smith' OR first_name = 'John';
-- Composite index with included columns (MySQL 8.0+)
CREATE INDEX idx_customer_orders ON orders(customer_id, order_date) INCLUDE (total_amount);
Index Cardinality and Selectivity
| Metric | Definition | Impact on Index Effectiveness | Example |
|---|---|---|---|
| Cardinality | Number of unique values in column | Higher cardinality = better index | Email (high), Gender (low) |
| Selectivity | Ratio of unique values to total rows | Higher selectivity = better index | UserID: 100%, Status: 5% |
| Data Distribution | How values are spread across rows | Even distribution = better index | Dates (even), Status (skewed) |
Viewing Index Information
How to examine existing indexes and their usage:
SHOW INDEX FROM employees;
-- Show index information from information_schema
SELECT
TABLE_NAME,
INDEX_NAME,
SEQ_IN_INDEX,
COLUMN_NAME,
CARDINALITY
FROM information_schema.STATISTICS
WHERE TABLE_SCHEMA = 'your_database'
AND TABLE_NAME = 'employees';
-- Check if index is used in a query
EXPLAIN SELECT * FROM employees WHERE last_name = 'Smith';
-- Show index size and statistics
SELECT
table_name,
index_name,
stat_value * @@innodb_page_size AS index_size_bytes
FROM mysql.innodb_index_stats
WHERE database_name = 'your_database'
AND table_name = 'employees';
EXPLAIN Command for Index Analysis
Using EXPLAIN to understand query execution and index usage:
EXPLAIN SELECT * FROM employees WHERE last_name = 'Smith';
-- EXPLAIN output analysis
EXPLAIN FORMAT=JSON SELECT * FROM employees WHERE department = 'IT';
-- Check index usage on JOIN queries
EXPLAIN
SELECT e.first_name, e.last_name, d.department_name
FROM employees e
JOIN departments d ON e.department_id = d.department_id
WHERE e.salary > 50000;
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+-----------+------------+------+---------------+-------------+---------+-------+------+----------+-------+
| 1 | SIMPLE | employees | NULL | ref | idx_last_name | idx_last_name| 203 | const | 1 | 100.00 | NULL |
+----+-------------+-----------+------------+------+---------------+-------------+---------+-------+------+----------+-------+
Index Maintenance and Optimization
Keeping indexes efficient and up-to-date:
ALTER TABLE employees ENGINE = InnoDB; -- Rebuilds all indexes
-- Update index statistics
ANALYZE TABLE employees;
-- Check for unused indexes
SELECT
OBJECT_SCHEMA,
OBJECT_NAME,
INDEX_NAME
FROM performance_schema.table_io_waits_summary_by_index_usage
WHERE INDEX_NAME IS NOT NULL
AND COUNT_STAR = 0
AND OBJECT_SCHEMA = 'your_database';
-- Monitor index usage
SELECT
TABLE_NAME,
INDEX_NAME,
ROWS_READ,
ROWS_INSERTED,
ROWS_UPDATED,
ROWS_DELETED
FROM performance_schema.table_io_waits_summary_by_index_usage;
Dropping and Modifying Indexes
Managing existing indexes:
DROP INDEX idx_last_name ON employees;
-- Rename an index (MySQL 8.0+)
ALTER TABLE employees RENAME INDEX idx_old_name TO idx_new_name;
-- Disable an index (MyISAM only)
ALTER TABLE employees DISABLE KEYS;
-- Bulk insert operations here
ALTER TABLE employees ENABLE KEYS;
-- Invisible indexes (MySQL 8.0+)
ALTER TABLE employees ALTER INDEX idx_test INVISIBLE; -- Query optimizer ignores
ALTER TABLE employees ALTER INDEX idx_test VISIBLE; -- Query optimizer uses
Index Costs and Trade-offs
| Benefit | Cost | Consideration |
|---|---|---|
| Faster SELECT queries | Slower INSERT/UPDATE/DELETE | Each index must be updated on write operations |
| Improved JOIN performance | Increased storage space | Indexes can be 10-30% of table size |
| Better sorting performance | Maintenance overhead | Regular ANALYZE TABLE needed for statistics |
| Efficient data retrieval | Query optimizer complexity | More indexes = more choices for optimizer |
Index Best Practices
- Index columns used in WHERE, JOIN, ORDER BY, and GROUP BY clauses
- Use composite indexes for multi-column searches
- Place the most selective columns first in composite indexes
- Avoid over-indexing - each index has maintenance costs
- Regularly monitor and remove unused indexes
- Use covering indexes when possible (include all needed columns)
- Consider index selectivity before creating indexes
- Use EXPLAIN to verify index usage in queries
- Rebuild indexes periodically for fragmented tables
Common Indexing Mistakes
- Over-indexing: Creating too many indexes on frequently updated tables
- Under-indexing: Missing indexes on foreign keys and search columns
- Wrong column order: Poor ordering in composite indexes
- Indexing low-selectivity columns: Gender, status flags, etc.
- Not monitoring index usage: Keeping unused indexes
- Ignoring index fragmentation: Not maintaining indexes over time
- Wrong index type: Using standard index where unique is needed
- Not testing with real data: Creating indexes without performance testing
Practice Exercise
Practice these indexing scenarios:
- Create appropriate indexes for a customers table with name, email, and phone searches
- Design composite indexes for an orders table with customer_id, order_date, and status queries
- Use EXPLAIN to analyze query performance before and after adding indexes
- Identify and remove unused indexes from an existing database
- Create covering indexes for frequently used reporting queries
- Monitor index usage statistics to identify performance bottlenecks
- Design indexes for a products table with category, price range, and brand searches
- Implement fulltext indexes for product descriptions and search functionality
Views in MySQL
Why Views Matter
Views are essential for:
- Data Security: Restrict access to specific columns or rows
- Query Simplification: Hide complex queries behind simple interfaces
- Data Abstraction: Provide consistent interface despite schema changes
- Logical Data Organization: Present data in business-friendly formats
- Code Reusability: Avoid repeating complex query logic
View Types in MySQL
| View Type | Description | Use Case | Limitations |
|---|---|---|---|
| Simple View | Based on a single table, no aggregates or GROUP BY | Data security, column restrictions | Can usually be updated |
| Complex View | Based on multiple tables, uses JOINs, aggregates | Reporting, data summarization | Often read-only |
| Updatable View | Allows INSERT, UPDATE, DELETE operations | Data modification through abstraction | Specific conditions required |
| Materialized View | Stores physical copy of data (MySQL doesn't natively support) | Performance optimization | Requires manual refresh |
| Inline View | Subquery in FROM clause (derived table) | Complex query organization | Temporary, not stored |
Sample Data for Examples
Let's use employee and department data for our view examples:
employee_id INT PRIMARY KEY AUTO_INCREMENT,
first_name VARCHAR(50) NOT NULL,
last_name VARCHAR(50) NOT NULL,
email VARCHAR(100) UNIQUE,
phone VARCHAR(20),
salary DECIMAL(10, 2),
department_id INT,
hire_date DATE,
is_active BOOLEAN DEFAULT TRUE
);
CREATE TABLE departments (
department_id INT PRIMARY KEY,
department_name VARCHAR(50) NOT NULL,
manager_id INT,
budget DECIMAL(12, 2)
);
CREATE TABLE projects (
project_id INT PRIMARY KEY,
project_name VARCHAR(100),
department_id INT,
start_date DATE,
end_date DATE,
budget DECIMAL(12, 2)
);
Creating Basic Views
Creating simple and complex views:
CREATE VIEW active_employees AS
SELECT employee_id, first_name, last_name, email, department_id
FROM employees
WHERE is_active = TRUE;
-- Complex view: Employee details with department info
CREATE VIEW employee_directory AS
SELECT
e.employee_id,
e.first_name,
e.last_name,
e.email,
e.phone,
d.department_name,
e.hire_date
FROM employees e
JOIN departments d ON e.department_id = d.department_id
WHERE e.is_active = TRUE;
-- View with calculated columns
CREATE VIEW employee_salary_summary AS
SELECT
department_id,
COUNT(*) AS employee_count,
AVG(salary) AS average_salary,
MAX(salary) AS max_salary,
MIN(salary) AS min_salary
FROM employees
WHERE is_active = TRUE
GROUP BY department_id;
Using Views
Querying views just like regular tables:
SELECT * FROM employee_directory;
-- Filter and sort view data
SELECT first_name, last_name, department_name
FROM employee_directory
WHERE department_name = 'Engineering'
ORDER BY last_name, first_name;
-- Join views with other tables or views
SELECT
ed.first_name,
ed.last_name,
ed.department_name,
p.project_name
FROM employee_directory ed
JOIN projects p ON ed.department_id = p.department_id;
-- Use aggregate functions on views
SELECT
department_name,
COUNT(*) AS employee_count
FROM employee_directory
GROUP BY department_name
ORDER BY employee_count DESC;
| first_name | last_name | department_name | email |
+------------+-----------+-----------------+---------------------+
| John | Doe | Engineering | john.doe@company.com|
| Jane | Smith | Marketing | jane.smith@company.com |
| Bob | Johnson | Engineering | bob.johnson@company.com |
+------------+-----------+-----------------+---------------------+
3 rows in set (0.00 sec)
View Options and Features
| Option | Description | Syntax | Use Case |
|---|---|---|---|
| WITH CHECK OPTION | Ensures data modifications through view satisfy view definition | CREATE VIEW ... WITH CHECK OPTION | Data integrity in updatable views |
| ALGORITHM | Specifies how MySQL processes the view | ALGORITHM = MERGE | TEMPTABLE | Performance optimization |
| DEFINER | Specifies the user who defined the view | DEFINER = user@host | Security and access control |
| SQL SECURITY | Specifies security context for view execution | SQL SECURITY DEFINER | INVOKER | Access control management |
CREATE VIEW high_salary_employees AS
SELECT employee_id, first_name, last_name, salary
FROM employees
WHERE salary > 80000
WITH CHECK OPTION;
-- View with specific algorithm
CREATE ALGORITHM = MERGE VIEW department_summary AS
SELECT d.department_name, COUNT(e.employee_id) AS employee_count
FROM departments d
LEFT JOIN employees e ON d.department_id = e.department_id
GROUP BY d.department_name;
-- View with security options
CREATE SQL SECURITY INVOKER VIEW employee_contacts AS
SELECT first_name, last_name, email, phone
FROM employees
WHERE is_active = TRUE;
Updatable Views
Views that allow data modification operations:
CREATE VIEW employee_basic_info AS
SELECT employee_id, first_name, last_name, email, department_id
FROM employees
WHERE is_active = TRUE;
-- Update through view
UPDATE employee_basic_info
SET email = 'new.email@company.com'
WHERE employee_id = 1;
-- Insert through view
INSERT INTO employee_basic_info
(first_name, last_name, email, department_id)
VALUES ('Sarah', 'Wilson', 'sarah.wilson@company.com', 2);
-- Delete through view
DELETE FROM employee_basic_info
WHERE employee_id = 5;
View Modification and Management
Altering, replacing, and dropping views:
CREATE OR REPLACE VIEW employee_directory AS
SELECT
e.employee_id,
e.first_name,
e.last_name,
e.email,
d.department_name,
e.hire_date,
e.salary -- Added salary column
FROM employees e
JOIN departments d ON e.department_id = d.department_id
WHERE e.is_active = TRUE;
-- Alter view (MySQL doesn't support direct ALTER VIEW for definition)
-- Use CREATE OR REPLACE instead
-- Drop a view
DROP VIEW IF EXISTS employee_basic_info;
-- Show all views in database
SHOW FULL TABLES WHERE TABLE_TYPE = 'VIEW';
-- Show view definition
SHOW CREATE VIEW employee_directory;
View Security and Access Control
| Security Aspect | Description | Implementation | Benefit |
|---|---|---|---|
| Column-level Security | Hide sensitive columns from users | Exclude columns from view SELECT | Protect salary, SSN, personal data |
| Row-level Security | Restrict rows based on conditions | Use WHERE clause in view definition | Department-specific data access |
| SQL SECURITY DEFINER | View runs with definer's privileges | CREATE SQL SECURITY DEFINER VIEW | Grant access through view without table permissions |
| SQL SECURITY INVOKER | View runs with invoker's privileges | CREATE SQL SECURITY INVOKER VIEW | Stricter security, user must have underlying permissions |
CREATE VIEW engineering_employees AS
SELECT employee_id, first_name, last_name, email, phone
FROM employees
WHERE department_id = 1 -- Engineering department
AND is_active = TRUE;
-- Public employee directory (hide sensitive info)
CREATE VIEW public_employee_directory AS
SELECT first_name, last_name, email, department_id
FROM employees
WHERE is_active = TRUE;
-- View with definer security
CREATE SQL SECURITY DEFINER VIEW sensitive_employee_data AS
SELECT employee_id, first_name, last_name, salary, hire_date
FROM employees
WHERE is_active = TRUE;
Performance Considerations
| Factor | Impact | Recommendation |
|---|---|---|
| View Complexity | Complex views with multiple JOINs and aggregates can be slow | Use simple views when possible, consider materialized patterns |
| Algorithm Choice | MERGE vs TEMPTABLE affects performance | Let MySQL choose automatically in most cases |
| Underlying Indexes | Views benefit from indexes on base tables | Ensure proper indexing on tables used in views |
| Nested Views | Views based on other views can compound performance issues | Avoid deep nesting of views |
| Data Volume | Large datasets in views can be slow | Use WHERE clauses to limit data in views |
View Limitations and Restrictions
Understanding what views cannot do:
-- This view is NOT updatable:
CREATE VIEW department_stats AS
SELECT d.department_name, stats.avg_salary
FROM departments d
JOIN (
SELECT department_id, AVG(salary) AS avg_salary
FROM employees
GROUP BY department_id
) stats ON d.department_id = stats.department_id;
-- Cannot contain TEMPORARY tables
-- Cannot be created from temporary tables
-- Some aggregate views are not updatable
CREATE VIEW employee_count_by_dept AS
SELECT department_id, COUNT(*) AS employee_count
FROM employees
GROUP BY department_id;
-- This view is read-only
Best Practices for Using Views
- Use views for security to restrict column and row access
- Simplify complex queries by encapsulating them in views
- Use meaningful, descriptive names for views
- Document view purposes and usage in comments
- Avoid nesting views too deeply (max 2-3 levels)
- Consider performance implications of complex views
- Use WITH CHECK OPTION for updatable views to maintain integrity
- Test views thoroughly with different user permissions
- Monitor view performance and optimize underlying queries
Common View Mistakes
- Overusing views: Creating views for simple queries that don't need abstraction
- Performance issues: Not considering the performance impact of complex views
- Security misconfiguration: Incorrect use of SQL SECURITY options
- Nested view complexity: Creating views that depend on other views excessively
- Assuming updatability: Not checking if a view is actually updatable
- Missing documentation: Not documenting view purposes and limitations
- Ignoring base table changes: Not updating views when underlying tables change
- Wrong algorithm choice: Manually specifying algorithm when not needed
Practice Exercise
Practice these view scenarios:
- Create a view that shows employee names and departments without revealing salaries
- Design a view for managers that only shows employees in their department
- Create a reporting view that summarizes project budgets by department
- Implement an updatable view for HR to modify basic employee information
- Create a view with WITH CHECK OPTION to ensure data integrity
- Design a view that combines employee, department, and project information
- Create a view that shows monthly hiring trends using date functions
- Implement security-focused views with SQL SECURITY DEFINER option
Stored Procedures in MySQL
Why Stored Procedures Matter
Stored procedures are essential for:
- Code Reusability: Write once, use multiple times
- Performance: Reduced network traffic and pre-compiled execution
- Security: Control data access through procedure permissions
- Maintainability: Centralize business logic in database
- Transaction Management: Complex operations in single transaction
Stored Procedure Advantages
| Advantage | Description | Benefit |
|---|---|---|
| Performance | Pre-compiled and stored on server | Faster execution than dynamic SQL |
| Security | Grant execute permission without table access | Better access control and data protection |
| Maintenance | Business logic centralized in database | Easier updates and bug fixes |
| Network Traffic | Single call instead of multiple queries | Reduced network overhead |
| Transaction Control | Complex operations in single transaction | Data integrity and consistency |
Basic Stored Procedure Syntax
Fundamental structure of stored procedures:
CREATE PROCEDURE procedure_name(IN param1 datatype, OUT param2 datatype)
BEGIN
-- SQL statements here
SELECT * FROM table_name;
END $$
DELIMITER ;
Sample Data for Examples
Let's use a banking system example for our procedures:
customer_id INT PRIMARY KEY AUTO_INCREMENT,
first_name VARCHAR(50) NOT NULL,
last_name VARCHAR(50) NOT NULL,
email VARCHAR(100) UNIQUE,
phone VARCHAR(20),
created_date DATE DEFAULT (CURRENT_DATE)
);
CREATE TABLE accounts (
account_id INT PRIMARY KEY AUTO_INCREMENT,
customer_id INT,
account_type VARCHAR(20),
balance DECIMAL(15, 2) DEFAULT 0.00,
created_date DATE DEFAULT (CURRENT_DATE),
FOREIGN KEY (customer_id) REFERENCES customers(customer_id)
);
CREATE TABLE transactions (
transaction_id INT PRIMARY KEY AUTO_INCREMENT,
account_id INT,
transaction_type ENUM('DEPOSIT', 'WITHDRAWAL', 'TRANSFER'),
amount DECIMAL(15, 2),
transaction_date DATETIME DEFAULT CURRENT_TIMESTAMP,
description VARCHAR(255),
FOREIGN KEY (account_id) REFERENCES accounts(account_id)
);
Creating Simple Stored Procedures
Basic procedures without parameters:
-- Procedure to get all customers
CREATE PROCEDURE GetAllCustomers()
BEGIN
SELECT customer_id, first_name, last_name, email, phone
FROM customers
ORDER BY last_name, first_name;
END $$
-- Procedure to get customer count
CREATE PROCEDURE GetCustomerCount()
BEGIN
SELECT COUNT(*) AS total_customers FROM customers;
END $$
-- Procedure with complex logic
CREATE PROCEDURE GetAccountSummary()
BEGIN
SELECT
c.first_name,
c.last_name,
a.account_type,
a.balance,
COUNT(t.transaction_id) AS transaction_count
FROM customers c
JOIN accounts a ON c.customer_id = a.customer_id
LEFT JOIN transactions t ON a.account_id = t.account_id
GROUP BY c.customer_id, a.account_id
ORDER BY a.balance DESC;
END $$
DELIMITER ;
Parameter Types in Stored Procedures
| Parameter Type | Description | Syntax | Use Case |
|---|---|---|---|
| IN | Input parameter (default if not specified) | IN param_name datatype | Pass values to procedure |
| OUT | Output parameter returns value to caller | OUT param_name datatype | Return calculated results |
| INOUT | Both input and output parameter | INOUT param_name datatype | Modify and return values |
Procedures with Parameters
Using different parameter types:
-- IN parameter: Get customer by ID
CREATE PROCEDURE GetCustomerById(IN customerId INT)
BEGIN
SELECT customer_id, first_name, last_name, email, phone
FROM customers
WHERE customer_id = customerId;
END $$
-- OUT parameter: Get total balance
CREATE PROCEDURE GetTotalBalance(OUT total DECIMAL(15, 2))
BEGIN
SELECT SUM(balance) INTO total FROM accounts;
END $$
-- INOUT parameter: Increment counter
CREATE PROCEDURE IncrementCounter(INOUT counter INT)
BEGIN
SET counter = counter + 1;
END $$
-- Multiple parameters
CREATE PROCEDURE GetCustomersByDateRange(
IN startDate DATE,
IN endDate DATE
)
BEGIN
SELECT customer_id, first_name, last_name, created_date
FROM customers
WHERE created_date BETWEEN startDate AND endDate
ORDER BY created_date DESC;
END $$
DELIMITER ;
Calling Stored Procedures
How to execute procedures with different parameter types:
CALL GetAllCustomers();
-- Call procedure with IN parameters
CALL GetCustomerById(1);
CALL GetCustomersByDateRange('2023-01-01', '2023-12-31');
-- Call procedure with OUT parameters
SET @total = 0;
CALL GetTotalBalance(@total);
SELECT @total AS total_balance;
-- Call procedure with INOUT parameters
SET @counter = 5;
CALL IncrementCounter(@counter);
SELECT @counter AS updated_counter;
| customer_id| first_name| last_name | email |
+------------+-----------+-----------+---------------------+
| 1 | John | Doe | john.doe@email.com |
+------------+-----------+-----------+---------------------+
1 row in set (0.00 sec)
Variables and Control Structures
Using variables, conditionals, and loops in procedures:
CREATE PROCEDURE CalculateAccountStats(IN accountId INT)
BEGIN
-- Variable declarations
DECLARE accountBalance DECIMAL(15, 2);
DECLARE transactionCount INT DEFAULT 0;
DECLARE avgTransaction DECIMAL(15, 2);
DECLARE accountStatus VARCHAR(20);
-- Get account balance
SELECT balance INTO accountBalance
FROM accounts WHERE account_id = accountId;
-- Get transaction count
SELECT COUNT(*) INTO transactionCount
FROM transactions WHERE account_id = accountId;
-- Calculate average transaction
IF transactionCount > 0 THEN
SELECT AVG(amount) INTO avgTransaction
FROM transactions WHERE account_id = accountId;
ELSE
SET avgTransaction = 0;
END IF;
-- Determine account status
CASE
WHEN accountBalance > 10000 THEN SET accountStatus = 'PREMIUM';
WHEN accountBalance > 1000 THEN SET accountStatus = 'STANDARD';
ELSE SET accountStatus = 'BASIC';
END CASE;
-- Return results
SELECT
accountBalance AS balance,
transactionCount AS transactions,
avgTransaction AS avg_transaction_amount,
accountStatus AS status;
END $$
DELIMITER ;
Error Handling and Transactions
Implementing robust error handling and transaction management:
CREATE PROCEDURE TransferFunds(
IN fromAccount INT,
IN toAccount INT,
IN transferAmount DECIMAL(15, 2),
OUT success BOOLEAN,
OUT message VARCHAR(255)
)
BEGIN
DECLARE fromBalance DECIMAL(15, 2);
DECLARE exit handler FOR sqlexception
BEGIN
ROLLBACK;
SET success = FALSE;
SET message = 'Transaction failed due to system error';
END;
-- Initialize variables
SET success = FALSE;
SET message = '';
-- Start transaction
START TRANSACTION;
-- Check if from account exists and has sufficient funds
SELECT balance INTO fromBalance
FROM accounts WHERE account_id = fromAccount FOR UPDATE;
IF fromBalance IS NULL THEN
SET message = 'Source account not found';
ROLLBACK;
LEAVE proc_exit;
END IF;
IF fromBalance < transferAmount THEN
SET message = 'Insufficient funds';
ROLLBACK;
LEAVE proc_exit;
END IF;
-- Perform the transfer
UPDATE accounts SET balance = balance - transferAmount
WHERE account_id = fromAccount;
UPDATE accounts SET balance = balance + transferAmount
WHERE account_id = toAccount;
-- Record transactions
INSERT INTO transactions (account_id, transaction_type, amount, description)
VALUES (fromAccount, 'WITHDRAWAL', transferAmount, 'Transfer to account ' + toAccount);
INSERT INTO transactions (account_id, transaction_type, amount, description)
VALUES (toAccount, 'DEPOSIT', transferAmount, 'Transfer from account ' + fromAccount);
-- Commit transaction
COMMIT;
SET success = TRUE;
SET message = 'Transfer completed successfully';
END $$
DELIMITER ;
Managing Stored Procedures
Viewing, modifying, and dropping procedures:
SHOW PROCEDURE STATUS WHERE Db = 'your_database';
-- Show procedure definition
SHOW CREATE PROCEDURE GetCustomerById;
-- Drop a procedure
DROP PROCEDURE IF EXISTS GetCustomerById;
-- Alter procedure (MySQL doesn't support direct ALTER PROCEDURE)
-- Use DROP and CREATE instead
DROP PROCEDURE IF EXISTS GetCustomerById;
CREATE PROCEDURE GetCustomerById(...) -- New definition
Stored Procedure Best Practices
- Use meaningful, descriptive names for procedures
- Include comprehensive error handling
- Use transactions for multi-step operations
- Document procedures with comments
- Validate input parameters
- Avoid using SELECT * - specify columns explicitly
- Consider performance implications of procedure logic
- Test procedures with various input scenarios
- Use consistent naming conventions
Common Stored Procedure Mistakes
- No error handling: Procedures fail without proper error messages
- Missing transactions: Data inconsistency in multi-step operations
- Poor naming conventions: Unclear procedure purposes
- No input validation: Procedures fail with invalid parameters
- Overly complex procedures: Difficult to maintain and debug
- No documentation: Hard for other developers to understand
- Performance issues: Not optimizing queries within procedures
- Security vulnerabilities: SQL injection risks with dynamic SQL
Practice Exercise
Practice these stored procedure scenarios:
- Create a procedure to open a new bank account for an existing customer
- Write a procedure that calculates interest for all accounts
- Create a procedure to generate monthly account statements
- Implement a procedure with transaction handling for account closures
- Write a procedure that finds customers with low account activity
- Create a procedure with error handling for fund withdrawals
- Implement a procedure that generates financial reports with parameters
- Write a procedure that updates customer information with validation
MySQL Triggers
What are Triggers?
Triggers are powerful database features that allow you to automatically execute SQL code when certain database events occur. They help maintain data integrity, enforce business rules, and automate repetitive tasks without requiring application-level code changes.
Trigger Timing and Events
Triggers can be defined with different timing and event combinations:
| Timing | Event | Description |
|---|---|---|
| BEFORE | INSERT | Executes before new data is inserted |
| AFTER | INSERT | Executes after new data is inserted |
| BEFORE | UPDATE | Executes before data is updated |
| AFTER | UPDATE | Executes after data is updated |
| BEFORE | DELETE | Executes before data is deleted |
| AFTER | DELETE | Executes after data is deleted |
Basic Trigger Syntax
trigger_time trigger_event
ON table_name FOR EACH ROW
BEGIN
-- trigger body
-- SQL statements
END;
Example 1: BEFORE INSERT Trigger
This trigger automatically sets the creation timestamp before inserting a new record:
BEFORE INSERT ON employees FOR EACH ROW
BEGIN
SET NEW.created_at = NOW();
SET NEW.updated_at = NOW();
END;
Example 2: AFTER INSERT Trigger
This trigger maintains an audit log after new employees are added:
CREATE TABLE employee_audit (
id INT PRIMARY KEY AUTO_INCREMENT,
employee_id INT,
action VARCHAR(50),
action_date DATETIME,
details TEXT
);
-- Create the trigger
CREATE TRIGGER after_employee_insert
AFTER INSERT ON employees FOR EACH ROW
BEGIN
INSERT INTO employee_audit (employee_id, action, action_date, details)
VALUES (NEW.id, 'INSERT', NOW(), CONCAT('New employee: ', NEW.first_name, ' ', NEW.last_name));
END;
Example 3: BEFORE UPDATE Trigger
This trigger automatically updates the modification timestamp and validates salary changes:
BEFORE UPDATE ON employees FOR EACH ROW
BEGIN
-- Update timestamp
SET NEW.updated_at = NOW();
-- Validate salary increase doesn't exceed 50%
IF NEW.salary > OLD.salary * 1.5 THEN
SIGNAL SQLSTATE '45000' SET MESSAGE_TEXT = 'Salary increase cannot exceed 50%';
END IF;
END;
Using OLD and NEW Keywords
In triggers, you can reference the old and new values of rows using OLD and NEW prefixes:
| Trigger Type | OLD Reference | NEW Reference |
|---|---|---|
| INSERT | NULL (no old values) | New row being inserted |
| UPDATE | Original row before update | New row after update |
| DELETE | Row being deleted | NULL (no new values) |
Managing Triggers
View, modify, and delete triggers using these commands:
SHOW TRIGGERS;
-- View triggers for a specific table
SHOW TRIGGERS LIKE 'employees';
-- Drop a trigger
DROP TRIGGER IF EXISTS before_employee_insert;
-- Create or replace a trigger
CREATE OR REPLACE TRIGGER trigger_name
-- trigger definition
Real-World Use Cases
- Audit Trail: Track all changes to sensitive data
- Data Validation: Enforce complex business rules
- Automatic Timestamps: Set created_at and updated_at fields
- Derived Columns: Calculate and store computed values
- Referential Integrity: Maintain consistency across related tables
- Notification Systems: Send alerts when specific conditions are met
- Keep trigger logic simple and efficient
- Document triggers thoroughly
- Avoid complex business logic in triggers
- Test triggers with various data scenarios
- Monitor trigger performance in production
MySQL Transactions
What are Transactions?
Transactions allow you to execute multiple SQL statements as a single unit. Either all operations within the transaction complete successfully, or if any operation fails, all changes are rolled back to maintain database consistency. This is crucial for maintaining data integrity in multi-step operations.
ACID Properties
Transactions in MySQL adhere to the ACID principles:
| Property | Description | MySQL Implementation |
|---|---|---|
| Atomicity | All operations in a transaction succeed or all fail | COMMIT and ROLLBACK statements |
| Consistency | Database remains in a consistent state before and after transaction | Constraints, triggers, and business rules |
| Isolation | Concurrent transactions don't interfere with each other | Transaction isolation levels |
| Durability | Committed transactions persist even after system failure | Write-ahead logging and crash recovery |
Transaction Control Statements
START TRANSACTION;
-- Or use BEGIN
BEGIN;
-- Execute your SQL operations
INSERT INTO accounts (account_id, balance) VALUES (1, 1000.00);
UPDATE accounts SET balance = balance - 100 WHERE account_id = 1;
-- Make changes permanent
COMMIT;
-- Or undo all changes in the transaction
ROLLBACK;
Example 1: Bank Transfer Transaction
This example demonstrates a secure bank transfer between two accounts:
-- Deduct amount from sender
UPDATE accounts
SET balance = balance - 500.00
WHERE account_id = 123
AND balance >= 500.00;
-- Check if deduction was successful
IF ROW_COUNT() = 0 THEN
ROLLBACK;
SIGNAL SQLSTATE '45000' SET MESSAGE_TEXT = 'Insufficient funds';
END IF;
-- Add amount to receiver
UPDATE accounts
SET balance = balance + 500.00
WHERE account_id = 456;
-- Record the transaction
INSERT INTO transactions (from_account, to_account, amount, transaction_date)
VALUES (123, 456, 500.00, NOW());
-- Commit if all operations succeed
COMMIT;
Transaction Isolation Levels
MySQL supports different isolation levels to control transaction visibility:
| Isolation Level | Dirty Read | Non-Repeatable Read | Phantom Read | Performance |
|---|---|---|---|---|
| READ UNCOMMITTED | Possible | Possible | Possible | Highest |
| READ COMMITTED | Not Possible | Possible | Possible | High |
| REPEATABLE READ | Not Possible | Not Possible | Possible | Medium |
| SERIALIZABLE | Not Possible | Not Possible | Not Possible | Lowest |
Setting Isolation Levels
SELECT @@transaction_isolation;
-- Set isolation level for current session
SET SESSION TRANSACTION ISOLATION LEVEL READ COMMITTED;
-- Set isolation level for next transaction only
SET TRANSACTION ISOLATION LEVEL SERIALIZABLE;
-- Set global isolation level
SET GLOBAL TRANSACTION ISOLATION LEVEL REPEATABLE READ;
Savepoints
Savepoints allow you to roll back to specific points within a transaction:
-- First operation
INSERT INTO orders (customer_id, total) VALUES (1, 100.00);
SET @order_id = LAST_INSERT_ID();
-- Create a savepoint
SAVEPOINT after_order_created;
-- Second operation that might fail
INSERT INTO order_items (order_id, product_id, quantity)
VALUES (@order_id, 101, 2);
-- If item insertion fails, rollback to savepoint
IF ROW_COUNT() = 0 THEN
ROLLBACK TO after_order_created;
-- Continue with alternative logic
END IF;
COMMIT;
-- Release a savepoint
RELEASE SAVEPOINT after_order_created;
Auto-Commit Mode
MySQL has an auto-commit mode that automatically commits each statement:
SELECT @@autocommit;
-- Disable auto-commit (0 = OFF, 1 = ON)
SET autocommit = 0;
-- With auto-commit OFF, you must explicitly commit
INSERT INTO table1 VALUES (1);
INSERT INTO table2 VALUES (2);
COMMIT;
-- Re-enable auto-commit
SET autocommit = 1;
Common Transaction Patterns
- Financial Transactions: Ensure money transfers are atomic
- Inventory Management: Prevent overselling with concurrent updates
- Order Processing: Create orders and update inventory atomically
- Data Migration: Move data between tables reliably
- Batch Operations: Process multiple records as a single unit
- Keep transactions short and focused
- Handle exceptions and implement proper rollback logic
- Choose appropriate isolation levels for your use case
- Test transactions with concurrent access scenarios
- Monitor transaction logs and performance metrics
- Use savepoints for complex multi-step transactions
MySQL Backup & Recovery
Why Backup is Essential
Regular backups are crucial for database management to protect against various risks including hardware failures, human errors, software bugs, security breaches, and natural disasters. A proper backup strategy ensures business continuity and data integrity.
Types of MySQL Backups
| Backup Type | Description | Advantages | Disadvantages |
|---|---|---|---|
| Logical Backup | SQL statements that recreate database objects and data | Portable, version-independent, readable | Slower for large databases, requires processing power |
| Physical Backup | Copy of actual database files and directories | Faster, more efficient for large databases | Not portable across versions, binary format |
| Hot Backup | Backup while database is running and accessible | No downtime required | More complex to implement |
| Cold Backup | Backup while database is shut down | Simple, consistent backup | Requires downtime |
| Incremental Backup | Only changed data since last backup | Faster, less storage required | More complex recovery process |
| Full Backup | Complete copy of entire database | Simple recovery process | Time-consuming, requires more storage |
Logical Backup with mysqldump
mysqldump is the most common tool for creating logical backups in MySQL:
mysqldump -u username -p database_name > backup_file.sql
# Backup all databases
mysqldump -u username -p --all-databases > full_backup.sql
# Backup specific tables
mysqldump -u username -p database_name table1 table2 > tables_backup.sql
# Backup with compression
mysqldump -u username -p database_name | gzip > backup_file.sql.gz
# Backup with consistent data (recommended for InnoDB)
mysqldump -u username -p --single-transaction database_name > backup.sql
# Backup structure only (no data)
mysqldump -u username -p --no-data database_name > structure.sql
# Backup data only (no structure)
mysqldump -u username -p --no-create-info database_name > data.sql
Restoring from mysqldump Backups
mysql -u username -p database_name < backup_file.sql
# Restore all databases
mysql -u username -p < full_backup.sql
# Restore compressed backup
gunzip < backup_file.sql.gz | mysql -u username -p database_name
# Restore to different database name
mysql -u username -p new_database_name < backup_file.sql
Physical Backup Methods
Physical backups copy actual database files for faster backup and recovery:
systemctl stop mysql
cp -r /var/lib/mysql /backup/mysql_backup
systemctl start mysql
# Using MySQL Enterprise Backup (commercial)
mysqlbackup --backup-dir=/backup/mysql backup
# Using Percona XtraBackup (open-source alternative)
xtrabackup --backup --target-dir=/backup/mysql
xtrabackup --prepare --target-dir=/backup/mysql
Binary Log Backup and Point-in-Time Recovery
Binary logs record all data changes, enabling point-in-time recovery:
[mysqld]
log-bin=mysql-bin
server-id=1
# View binary logs
mysql -u username -p -e "SHOW BINARY LOGS;"
# Backup binary logs
cp /var/lib/mysql/mysql-bin.* /backup/binary_logs/
# Point-in-time recovery process
# 1. Restore last full backup
mysql -u root -p < full_backup.sql
# 2. Apply binary logs up to specific time
mysqlbinlog --stop-datetime="2024-01-15 14:30:00" \
mysql-bin.000001 mysql-bin.000002 | mysql -u root -p
Automated Backup Strategies
Create automated backup scripts for consistent protection:
# Daily backup script
BACKUP_DIR="/backup/mysql"
DATE=$(date +%Y%m%d_%H%M%S)
DB_USER="backup_user"
DB_PASSWORD="secure_password"
# Create backup directory
mkdir -p $BACKUP_DIR/$DATE
# Backup each database separately
DATABASES=$(mysql -u $DB_USER -p$DB_PASSWORD -e "SHOW DATABASES;" | grep -Ev "(Database|information_schema|performance_schema)")
for DB in $DATABASES; do
echo "Backing up $DB..."
mysqldump -u $DB_USER -p$DB_PASSWORD --single-transaction --routines --triggers $DB | gzip > $BACKUP_DIR/$DATE/$DB.sql.gz
done
# Backup binary logs
mysql -u $DB_USER -p$DB_PASSWORD -e "FLUSH BINARY LOGS;"
cp /var/lib/mysql/mysql-bin.* $BACKUP_DIR/$DATE/
# Clean up old backups (keep 30 days)
find $BACKUP_DIR -type d -mtime +30 -exec rm -rf {} \;
echo "Backup completed: $BACKUP_DIR/$DATE"
Recovery Scenarios
| Scenario | Recovery Method | Steps |
|---|---|---|
| Accidental DELETE | Point-in-Time Recovery | Restore backup, apply binary logs up to before DELETE |
| Table Corruption | Table-level Recovery | Restore specific table from backup |
| Full Server Failure | Complete Restore | Restore all databases and binary logs |
| Single Database Loss | Database-level Recovery | Restore specific database only |
| Data Center Disaster | Disaster Recovery | Restore to alternate location from offsite backups |
MySQL Enterprise Backup
For enterprise environments, consider MySQL Enterprise Backup:
- Online hot backups with minimal performance impact
- Incremental and compressed backups
- Point-in-time recovery capabilities
- Backup encryption and verification
- Integration with MySQL Enterprise Monitor
Best Practices for Backup Strategy
- 3 copies of your data
- 2 different storage media
- 1 copy stored offsite
Monitoring and Maintenance
SHOW MASTER STATUS;
-- Monitor backup completion
SELECT TABLE_SCHEMA, TABLE_NAME, UPDATE_TIME
FROM information_schema.tables
ORDER BY UPDATE_TIME DESC;
-- Check database size for backup planning
SELECT TABLE_SCHEMA AS 'Database',
ROUND(SUM(data_length + index_length) / 1024 / 1024, 2) AS 'Size (MB)'
FROM information_schema.tables
GROUP BY TABLE_SCHEMA;
MySQL Security & Users
MySQL Security Architecture
MySQL implements a multi-layered security model that includes user authentication, privilege verification, network security, and data encryption. Understanding this architecture is essential for implementing robust database security.
User Management
Creating and managing database users is the foundation of MySQL security:
CREATE USER 'username'@'hostname' IDENTIFIED BY 'strong_password';
-- Create user with specific authentication plugin
CREATE USER 'admin'@'localhost' IDENTIFIED WITH mysql_native_password BY 'password';
-- View all users
SELECT user, host, authentication_string FROM mysql.user;
-- Rename a user
RENAME USER 'old_user'@'localhost' TO 'new_user'@'localhost';
-- Drop a user
DROP USER 'username'@'hostname';
-- Set password for user
ALTER USER 'username'@'localhost' IDENTIFIED BY 'new_password';
-- Lock and unlock user accounts
ALTER USER 'username'@'localhost' ACCOUNT LOCK;
ALTER USER 'username'@'localhost' ACCOUNT UNLOCK;
Privilege System
MySQL uses a granular privilege system to control user access:
| Privilege Level | Description | Common Use Cases |
|---|---|---|
| Global Privileges | Apply to all databases on server | Administrative users, DBAs |
| Database Privileges | Apply to all objects in a database | Application users, developers |
| Table Privileges | Apply to all columns in a table | Restricted application access |
| Column Privileges | Apply to specific columns | Compliance, sensitive data |
| Routine Privileges | Apply to stored procedures/functions | API-like access patterns |
Granting and Revoking Privileges
GRANT SELECT, INSERT, UPDATE, DELETE ON company_db.* TO 'app_user'@'localhost';
-- Grant all privileges on a database
GRANT ALL PRIVILEGES ON company_db.* TO 'dba'@'localhost';
-- Grant privileges on specific table
GRANT SELECT ON company_db.employees TO 'report_user'@'%';
-- Grant column-level privileges
GRANT SELECT (id, name, department) ON company_db.employees TO 'hr_user'@'localhost';
-- Grant execute privilege on stored procedures
GRANT EXECUTE ON PROCEDURE company_db.calculate_bonus TO 'app_user'@'localhost';
-- Revoke privileges
REVOKE DELETE ON company_db.* FROM 'app_user'@'localhost';
-- Show granted privileges for a user
SHOW GRANTS FOR 'username'@'hostname';
-- Apply privilege changes immediately
FLUSH PRIVILEGES;
Common Privilege Types
| Privilege | Description | Risk Level |
|---|---|---|
| SELECT | Read data from tables | Low |
| INSERT | Add new rows to tables | Medium |
| UPDATE | Modify existing data | Medium |
| DELETE | Remove rows from tables | High |
| CREATE | Create new databases/tables | High |
| DROP | Delete databases/tables | Critical |
| GRANT OPTION | Grant privileges to other users | Critical |
| PROCESS | View active queries | Medium |
| FILE | Read/write files on server | Critical |
Role-Based Access Control (RBAC)
MySQL 8.0+ supports roles for easier privilege management:
CREATE ROLE 'read_only', 'data_entry', 'db_admin';
-- Grant privileges to roles
GRANT SELECT ON company_db.* TO 'read_only';
GRANT SELECT, INSERT, UPDATE ON company_db.* TO 'data_entry';
GRANT ALL PRIVILEGES ON company_db.* TO 'db_admin';
-- Assign roles to users
GRANT 'read_only' TO 'report_user'@'localhost';
GRANT 'data_entry' TO 'clerk_user'@'localhost';
-- Set default roles
SET DEFAULT ROLE 'read_only' TO 'report_user'@'localhost';
-- Activate roles for current session
SET ROLE 'read_only';
-- View granted roles
SHOW GRANTS USING 'read_only';
Password Management and Policies
ALTER USER 'app_user'@'localhost' PASSWORD EXPIRE INTERVAL 90 DAY;
-- Require password change on first login
ALTER USER 'new_user'@'localhost' PASSWORD EXPIRE;
-- Set account with no password (insecure - avoid)
ALTER USER 'test_user'@'localhost' IDENTIFIED BY '';
-- Configure password validation policy
SET GLOBAL validate_password.policy = MEDIUM;
SET GLOBAL validate_password.length = 8;
SET GLOBAL validate_password.number_count = 1;
SET GLOBAL validate_password.special_char_count = 1;
-- Install password validation component (MySQL 8.0+)
INSTALL COMPONENT 'file://component_validate_password';
Network Security and Remote Access
bind-address = 127.0.0.1
# Or bind to specific IP
bind-address = 192.168.1.100
-- Create user with specific host restrictions
CREATE USER 'remote_user'@'192.168.1.%' IDENTIFIED BY 'password';
CREATE USER 'vpn_user'@'10.0.0.50' IDENTIFIED BY 'password';
-- Use SSL for secure connections
CREATE USER 'secure_user'@'%' REQUIRE SSL;
CREATE USER 'x509_user'@'%' REQUIRE X509;
-- Check connection information
SELECT user, host, db, command FROM information_schema.processlist;
Security Best Practices Example
CREATE USER 'web_app'@'192.168.1.100' IDENTIFIED BY 'StrongPass123!' PASSWORD EXPIRE INTERVAL 90 DAY;
CREATE USER 'backup_user'@'localhost' IDENTIFIED BY 'BackupPass456!' REQUIRE SSL;
-- Grant minimal required privileges
GRANT SELECT, INSERT, UPDATE ON ecommerce.* TO 'web_app'@'192.168.1.100';
GRANT SELECT, RELOAD, PROCESS, LOCK TABLES ON *.* TO 'backup_user'@'localhost';
-- Explicitly deny dangerous privileges
REVOKE DROP, CREATE, ALTER, GRANT OPTION ON ecommerce.* FROM 'web_app'@'192.168.1.100';
-- Create read-only user for reporting
CREATE ROLE 'reporting_role';
GRANT SELECT ON ecommerce.products, ecommerce.categories TO 'reporting_role';
GRANT 'reporting_role' TO 'analyst'@'%';
SET DEFAULT ROLE 'reporting_role' TO 'analyst'@'%';
FLUSH PRIVILEGES;
Security Auditing and Monitoring
SELECT * FROM information_schema.processlist;
-- Check user privileges
SELECT * FROM information_schema.user_privileges;
-- Monitor failed login attempts (requires audit plugin)
SELECT * FROM mysql.error_log WHERE error_code = 1045;
-- Check for users without passwords
SELECT user, host FROM mysql.user WHERE authentication_string = '';
-- Find users with administrative privileges
SELECT user, host FROM mysql.user WHERE Super_priv = 'Y' OR Grant_priv = 'Y';
Common Security Risks and Mitigations
| Risk | Description | Mitigation |
|---|---|---|
| Weak Passwords | Easy-to-guess or default passwords | Implement password policies, use strong passwords |
| Excessive Privileges | Users have more access than needed | Apply principle of least privilege |
| Network Exposure | Database accessible from untrusted networks | Use firewalls, bind to specific interfaces |
| SQL Injection | Malicious SQL execution through applications | Use prepared statements, input validation |
| Default Configurations | Using insecure default settings | Harden MySQL configuration, remove test databases |
- Change default root password immediately after installation
- Remove anonymous users and test databases
- Use strong, unique passwords for all users
- Implement network security and firewalls
- Regularly audit user privileges and access patterns
- Enable logging and monitoring for suspicious activities
- Keep MySQL updated with security patches
MySQL Performance Optimization
Performance Optimization Layers
MySQL performance optimization occurs at multiple levels, from query-level tuning to server-wide configuration. A systematic approach ensures comprehensive improvements across all layers.
Query Performance Analysis
Use EXPLAIN to analyze query execution plans and identify bottlenecks:
EXPLAIN SELECT * FROM employees WHERE department = 'Sales';
-- EXPLAIN with format options
EXPLAIN FORMAT=JSON SELECT * FROM orders WHERE total_amount > 1000;
-- ANALYZE TABLE to update statistics
ANALYZE TABLE employees, departments;
-- Check index usage
SHOW INDEX FROM employees;
-- Find slow queries (requires slow query log)
SELECT * FROM mysql.slow_log ORDER BY start_time DESC LIMIT 10;
EXPLAIN Output Interpretation
| Column | Description | Optimal Values |
|---|---|---|
| type | Join type | const, eq_ref, ref, range |
| possible_keys | Possible indexes | Should list relevant indexes |
| key | Actual index used | Should match query needs |
| rows | Estimated rows examined | Lower is better |
| Extra | Additional information | Avoid "Using filesort", "Using temporary" |
Index Optimization Strategies
Proper indexing is crucial for query performance:
CREATE INDEX idx_department ON employees(department);
-- Create composite index
CREATE INDEX idx_dept_salary ON employees(department, salary);
-- Create unique index
CREATE UNIQUE INDEX idx_email ON employees(email);
-- Create full-text index
CREATE FULLTEXT INDEX idx_description ON products(description);
-- Drop index
DROP INDEX idx_department ON employees;
-- Check index usage statistics
SELECT * FROM sys.schema_index_statistics WHERE table_schema = 'company_db';
Query Optimization Techniques
SELECT id, name, email FROM employees; -- Good
SELECT * FROM employees; -- Avoid
-- Use WHERE instead of HAVING for filtering
SELECT department, AVG(salary) FROM employees WHERE department = 'Sales'; -- Good
SELECT department, AVG(salary) FROM employees GROUP BY department HAVING department = 'Sales'; -- Avoid
-- Use EXISTS instead of IN for large datasets
SELECT name FROM departments d WHERE EXISTS (SELECT 1 FROM employees e WHERE e.department_id = d.id); -- Good
SELECT name FROM departments WHERE id IN (SELECT department_id FROM employees); -- Less efficient
-- Use LIMIT to restrict result sets
SELECT * FROM orders ORDER BY order_date DESC LIMIT 100;
Server Configuration Optimization
Key my.cnf configuration parameters for performance:
innodb_buffer_pool_size = 16G
innodb_buffer_pool_instances = 8
# Log Settings
innodb_log_file_size = 2G
innodb_log_buffer_size = 256M
# Connection Settings
max_connections = 200
thread_cache_size = 16
table_open_cache = 4000
# Query Cache (MySQL 5.7) - Disable in MySQL 8.0+
query_cache_type = 0
query_cache_size = 0
# Temporary Tables
tmp_table_size = 256M
max_heap_table_size = 256M
Performance Schema and Sys Schema
Use built-in monitoring to identify performance issues:
UPDATE performance_schema.setup_instruments SET ENABLED = 'YES', TIMED = 'YES';
-- Top 10 most time-consuming queries
SELECT query, db, exec_count, total_latency, avg_latency
FROM sys.statement_analysis
ORDER BY total_latency DESC LIMIT 10;
-- Check index usage
SELECT * FROM sys.schema_unused_indexes;
-- Monitor InnoDB metrics
SELECT * FROM sys.innodb_buffer_stats_by_schema;
-- Check table IO statistics
SELECT table_name, count_read, count_write
FROM sys.io_global_by_file_by_bytes
WHERE file LIKE '%ibd' ORDER BY total DESC LIMIT 10;
Connection and Thread Optimization
SHOW PROCESSLIST;
SELECT * FROM information_schema.processlist;
-- Kill long-running queries
KILL 1234;
-- Monitor connection statistics
SHOW STATUS LIKE 'Threads_%';
SHOW STATUS LIKE 'Aborted_%';
-- Check max used connections
SHOW STATUS LIKE 'Max_used_connections';
-- Analyze connection errors
SHOW STATUS LIKE 'Connection_errors_%';
Table Optimization and Maintenance
OPTIMIZE TABLE employees, orders;
-- Check table fragmentation
SELECT TABLE_NAME, DATA_FREE
FROM information_schema.TABLES
WHERE TABLE_SCHEMA = 'company_db' AND DATA_FREE > 0;
-- Check table statistics
SHOW TABLE STATUS LIKE 'employees';
-- Repair table (if corrupted)
REPAIR TABLE corrupted_table;
-- Convert table to different storage engine
ALTER TABLE my_table ENGINE = InnoDB;
Partitioning for Large Tables
Partitioning improves performance for large datasets:
CREATE TABLE sales (
sale_id INT,
sale_date DATE,
amount DECIMAL(10,2)
) PARTITION BY RANGE (YEAR(sale_date)) (
PARTITION p2020 VALUES LESS THAN (2021),
PARTITION p2021 VALUES LESS THAN (2022),
PARTITION p2022 VALUES LESS THAN (2023),
PARTITION p2023 VALUES LESS THAN (2024),
PARTITION pfuture VALUES LESS THAN MAXVALUE
);
-- Query specific partition
SELECT * FROM sales PARTITION (p2023);
-- Add new partition
ALTER TABLE sales REORGANIZE PARTITION pfuture INTO (
PARTITION p2024 VALUES LESS THAN (2025),
PARTITION pfuture VALUES LESS THAN MAXVALUE
);
Caching Strategies
| Cache Type | Purpose | Configuration |
|---|---|---|
| InnoDB Buffer Pool | Cache data and indexes | innodb_buffer_pool_size |
| Query Cache | Cache query results (deprecated) | query_cache_type, query_cache_size |
| Table Cache | Cache table descriptors | table_open_cache |
| Thread Cache | Cache worker threads | thread_cache_size |
| Application Cache | Application-level caching | Redis, Memcached |
Performance Monitoring Script
# MySQL Performance Health Check
echo "=== MySQL Performance Report ==="
echo "Generated: $(date)"
echo
# Check key metrics
mysql -e "SHOW GLOBAL STATUS LIKE 'Questions';"
mysql -e "SHOW GLOBAL STATUS LIKE 'Threads_connected';"
mysql -e "SHOW GLOBAL STATUS LIKE 'Innodb_buffer_pool_hit_rate';"
mysql -e "SHOW GLOBAL STATUS LIKE 'Slow_queries';"
mysql -e "SHOW GLOBAL STATUS LIKE 'Table_locks_waited';"
# Check current processes
echo "=== Top 10 Long Running Queries ==="
mysql -e "SELECT id, user, host, db, command, time, state, info
FROM information_schema.processlist
WHERE command != 'Sleep'
ORDER BY time DESC LIMIT 10\G"
- Add indexes on frequently queried columns
- Optimize slow queries identified in slow query log
- Increase InnoDB buffer pool size (if RAM available)
- Use connection pooling in applications
- Regularly maintain tables with OPTIMIZE TABLE
- Monitor and kill long-running queries
- Use appropriate data types to reduce storage
Advanced MySQL Features
Window Functions
Window functions perform calculations across a set of table rows that are somehow related to the current row, without collapsing rows like GROUP BY does.
SELECT
employee_id,
name,
salary,
ROW_NUMBER() OVER (ORDER BY salary DESC) AS salary_rank
FROM employees;
-- RANK() and DENSE_RANK() - Handle ties differently
SELECT
employee_id,
department,
salary,
RANK() OVER (PARTITION BY department ORDER BY salary DESC) AS dept_rank,
DENSE_RANK() OVER (PARTITION BY department ORDER BY salary DESC) AS dept_dense_rank
FROM employees;
-- Running totals with SUM() window function
SELECT
order_date,
daily_revenue,
SUM(daily_revenue) OVER (ORDER BY order_date) AS running_total
FROM daily_sales;
-- LAG and LEAD - Access previous/next row values
SELECT
month,
revenue,
LAG(revenue) OVER (ORDER BY month) AS prev_month_revenue,
LEAD(revenue) OVER (ORDER BY month) AS next_month_revenue
FROM monthly_revenue;
Common Table Expressions (CTEs)
CTEs provide better readability and enable recursive queries:
WITH department_stats AS (
SELECT
department,
AVG(salary) AS avg_salary,
COUNT(*) AS employee_count
FROM employees
GROUP BY department
)
SELECT * FROM department_stats WHERE avg_salary > 50000;
-- Multiple CTEs in single query
WITH
high_earners AS (
SELECT employee_id, name, salary FROM employees WHERE salary > 80000
),
department_high_earners AS (
SELECT e.department, COUNT(*) AS high_earner_count
FROM employees e
JOIN high_earners h ON e.employee_id = h.employee_id
GROUP BY e.department
)
SELECT * FROM department_high_earners ORDER BY high_earner_count DESC;
-- Recursive CTE for hierarchical data
WITH RECURSIVE employee_hierarchy AS (
-- Anchor member: top-level managers
SELECT employee_id, name, manager_id, 1 AS level
FROM employees
WHERE manager_id IS NULL
UNION ALL
-- Recursive member: subordinates
SELECT e.employee_id, e.name, e.manager_id, eh.level + 1
FROM employees e
JOIN employee_hierarchy eh ON e.manager_id = eh.employee_id
)
SELECT * FROM employee_hierarchy ORDER BY level, employee_id;
JSON Support
MySQL provides extensive JSON data type support and functions:
CREATE TABLE products (
product_id INT PRIMARY KEY AUTO_INCREMENT,
name VARCHAR(100) NOT NULL,
details JSON,
attributes JSON
);
-- Insert JSON data
INSERT INTO products (name, details, attributes) VALUES (
'Laptop',
'{"brand": "Dell", "specs": {"ram": "16GB", "storage": "512GB"}, "price": 999.99}',
'["portable", "gaming", "business"]'
);
-- Extract JSON values
SELECT
name,
details->'$.brand' AS brand,
details->'$.specs.ram' AS ram,
JSON_EXTRACT(details, '$.price') AS price
FROM products;
-- Query JSON data
SELECT name, details
FROM products
WHERE details->'$.specs.ram' = '"16GB"';
-- Modify JSON data
UPDATE products
SET details = JSON_SET(details, '$.price', 899.99)
WHERE product_id = 1;
-- JSON aggregation
SELECT
department,
JSON_ARRAYAGG(JSON_OBJECT('id', employee_id, 'name', name)) AS employees
FROM employees
GROUP BY department;
Full-Text Search
Advanced text searching capabilities in MySQL:
CREATE TABLE articles (
article_id INT PRIMARY KEY AUTO_INCREMENT,
title VARCHAR(255),
content TEXT,
FULLTEXT(title, content)
);
-- Natural language full-text search
SELECT title, MATCH(title, content) AGAINST('database optimization') AS relevance
FROM articles
WHERE MATCH(title, content) AGAINST('database optimization')
ORDER BY relevance DESC;
-- Boolean mode full-text search
SELECT title
FROM articles
WHERE MATCH(title, content) AGAINST('+MySQL -Oracle' IN BOOLEAN MODE);
-- Query expansion (MySQL 8.0+)
SELECT title
FROM articles
WHERE MATCH(title, content) AGAINST('database' WITH QUERY EXPANSION);
Geographic Data and Spatial Features
MySQL supports geographic data types and spatial functions:
CREATE TABLE locations (
location_id INT PRIMARY KEY AUTO_INCREMENT,
name VARCHAR(100),
coordinates POINT NOT NULL,
SPATIAL INDEX(coordinates)
);
-- Insert spatial data
INSERT INTO locations (name, coordinates) VALUES
('Office', ST_GeomFromText('POINT(40.7128 -74.0060)')),
('Store', ST_GeomFromText('POINT(40.7589 -73.9851)'));
-- Calculate distance between points
SELECT
name,
ST_Distance_Sphere(
ST_GeomFromText('POINT(40.7128 -74.0060)'),
coordinates
) AS distance_meters
FROM locations
ORDER BY distance_meters;
-- Find points within radius
SELECT name
FROM locations
WHERE ST_Distance_Sphere(
ST_GeomFromText('POINT(40.7128 -74.0060)'),
coordinates
) <= 5000; -- 5km radius
Generated Columns
Automatically computed columns based on other columns:
CREATE TABLE products (
product_id INT PRIMARY KEY,
price DECIMAL(10,2),
quantity INT,
total_value DECIMAL(12,2) GENERATED ALWAYS AS (price * quantity) VIRTUAL
);
-- Stored generated column (computed on write)
CREATE TABLE employees (
employee_id INT PRIMARY KEY,
first_name VARCHAR(50),
last_name VARCHAR(50),
full_name VARCHAR(101) GENERATED ALWAYS AS (CONCAT(first_name, ' ', last_name)) STORED
);
-- JSON extracted to generated column
CREATE TABLE products_json (
product_id INT PRIMARY KEY,
details JSON,
brand VARCHAR(50) GENERATED ALWAYS AS (details->'$.brand') VIRTUAL
);
Advanced Indexing Features
CREATE INDEX idx_salary_desc ON employees (salary DESC);
-- Invisible indexes (for testing index removal)
CREATE INDEX idx_test ON employees(department) INVISIBLE;
ALTER INDEX idx_test VISIBLE;
-- Functional indexes (MySQL 8.0+)
CREATE INDEX idx_name_lower ON employees ((LOWER(name)));
-- Multi-valued indexes for JSON arrays
CREATE INDEX idx_attributes ON products ( (CAST(attributes AS CHAR(50) ARRAY)) );
Enterprise Features
| Feature | Description | Use Case |
|---|---|---|
| Data Masking | Hide sensitive data from unauthorized users | GDPR compliance, development environments |
| Audit Logging | Comprehensive logging of database activities | Security compliance, troubleshooting |
| Thread Pool | Efficient connection handling for high concurrency | Web applications with many concurrent users |
| Enterprise Backup | Hot backups with minimal performance impact | Large production databases |
| Enterprise Monitor | Comprehensive monitoring and alerting | Production database management |
Performance Schema Advanced Usage
SELECT * FROM performance_schema.memory_summary_global_by_event_name
ORDER BY CURRENT_NUMBER_OF_BYTES_USED DESC LIMIT 10;
-- Analyze table IO
SELECT OBJECT_NAME, COUNT_READ, COUNT_WRITE, SUM_NUMBER_OF_BYTES_READ
FROM performance_schema.table_io_waits_summary_by_table
ORDER BY SUM_TIMER_WAIT DESC LIMIT 10;
-- Monitor replication lag
SELECT CHANNEL_NAME, SERVICE_STATE, LAST_ERROR_NUMBER, LAST_ERROR_MESSAGE
FROM performance_schema.replication_connection_status;
- Use window functions for complex analytics and rankings
- Use CTEs for better query organization and recursive queries
- Use JSON for flexible schema and semi-structured data
- Use full-text search for text-heavy applications
- Use spatial features for location-based applications
- Use generated columns for computed values and data integrity