What is the InnoDB buffer pool and why is it important for MySQL performance?

The InnoDB buffer pool is a memory cache that stores frequently accessed table and index data, reducing disk I/O. It is the single most important tuning parameter for InnoDB performance and should be set to 70-80% of available RAM on dedicated servers. A properly sized buffer pool can improve query speed by thousands of times by serving data from RAM instead of disk.

What is the difference between InnoDB and MyISAM storage engines?

InnoDB is a transaction-safe storage engine with ACID compliance, row-level locking, crash recovery, and foreign key support, making it ideal for applications requiring data integrity. MyISAM uses table-level locking and lacks transactions, foreign keys, and automatic crash recovery, but is simpler and can be faster for read-heavy workloads. InnoDB is the default since MySQL 5.5 and recommended for most applications.

How does InnoDB implement MVCC (Multi-Version Concurrency Control)?

InnoDB implements MVCC by storing multiple versions of each row using hidden system columns (DB_TRX_ID and DB_ROLL_PTR) and maintaining old versions in undo logs. Each transaction gets a read view that determines which row version to show based on transaction IDs. This allows concurrent reads without blocking writes and vice versa, providing excellent concurrency.

What is the leftmost prefix rule for composite indexes?

The leftmost prefix rule states that a composite index can only be used if the query includes the leftmost column(s) in the index definition. For an index on (category, subcategory, brand), queries can use the index for category alone, category+subcategory, or all three columns, but not for subcategory or brand alone. This is critical for designing efficient multi-column indexes.

What are the ACID properties and how does MySQL InnoDB implement them?

ACID stands for Atomicity (all-or-nothing transactions via undo log), Consistency (enforced through constraints and triggers), Isolation (using MVCC and locking to prevent transaction interference), and Durability (committed transactions survive crashes via redo log and double-write buffer). These four properties ensure reliable transaction processing in MySQL InnoDB.

What transaction isolation levels does MySQL support?

MySQL supports four isolation levels: READ UNCOMMITTED (allows dirty reads), READ COMMITTED (prevents dirty reads), REPEATABLE READ (default, prevents dirty and non-repeatable reads), and SERIALIZABLE (full isolation preventing all anomalies). InnoDB uses MVCC for REPEATABLE READ and SERIALIZABLE, providing strong consistency with high concurrency.

What is the difference between statement-based and row-based replication?

Statement-based replication (SBR) logs the actual SQL statements and produces smaller logs but can cause inconsistencies with non-deterministic functions. Row-based replication (RBR) logs actual row changes, is completely deterministic and safer, but generates larger logs for bulk operations. MySQL's default MIXED format automatically switches between them based on statement type.

What do the type values in EXPLAIN output mean?

The type column in EXPLAIN shows the access method used, from best to worst: const (single row by primary/unique key), eq_ref (one row per join), ref (non-unique index lookup), range (index range scan), index (full index scan), and ALL (full table scan). Lower rows examined and avoiding ALL for large tables indicates better query performance.

How does the MySQL query optimizer work?

The MySQL query optimizer is a cost-based optimizer that analyzes multiple execution plans and selects the one with the lowest estimated cost. It uses table statistics, index cardinality, and data distribution to evaluate different access methods (table scans, index scans, joins) and considers factors like rows to examine, I/O operations, and CPU cost to choose the most efficient plan.

What is SELECT FOR UPDATE and when should you use it?

SELECT FOR UPDATE acquires exclusive locks on selected rows, preventing other transactions from reading (with FOR UPDATE/SHARE) or modifying them until you commit. It's essential for preventing race conditions in scenarios like inventory management, seat booking, or financial transactions where you need to read data and ensure it doesn't change before your update.

How do you monitor MySQL replication lag?

Replication lag is monitored using SHOW SLAVE STATUS (or SHOW REPLICA STATUS in MySQL 8.0.22+) by checking the Seconds_Behind_Master field, which measures the time difference between when a transaction was committed on the primary and the current time on the replica. Better metrics include GTID comparison and performance schema tables like replication_applier_status_by_worker for multi-threaded replication.

What are common high availability solutions for MySQL?

Common MySQL HA solutions include MySQL Replication (async/semi-sync with manual or automated failover), MySQL Group Replication (multi-master with built-in conflict detection), MySQL InnoDB Cluster (integrated HA stack with automatic failover), Galera Cluster (synchronous multi-master), and orchestrator tools like Orchestrator or MHA for automating traditional replication failover. Choice depends on RPO/RTO requirements, consistency needs, and operational expertise.

MySQL Interview Questions (Free Preview)

Q: What is the difference between asynchronous and semi-synchronous replication?

Asynchronous replication is MySQL's default where the primary commits immediately without waiting for replicas, providing maximum write performance but risking data loss if the primary crashes. Semi-synchronous replication requires at least one replica to acknowledge receiving the transaction before the primary commits, ensuring better durability at the cost of slightly higher latency.

MySQL Interview Questions (Free Preview)

Free sample of 15 from 53 questions available

Back to all questions

Test Levels 72

Claude Code Coming Soon

Backup and Recovery

What is the difference between logical and physical backups?

The 30-Second Answer:

Logical backups export database structure and data as SQL statements (using tools like mysqldump), making them portable across platforms and MySQL versions but slower to restore. Physical backups copy the actual database files and directories (using tools like Percona XtraBackup or MySQL Enterprise Backup), offering faster backup and restore but requiring the same MySQL version and storage engine. Logical backups are ideal for smaller databases and migrations; physical backups are preferred for large production systems requiring minimal downtime.

The 2-Minute Answer (If They Want More):

Logical Backups create a representation of your database by extracting data and converting it into SQL statements. Tools like mysqldump generate CREATE TABLE and INSERT statements that can recreate your database. Advantages include:

Portability: Works across different MySQL versions, operating systems, and even other databases
Flexibility: Easy to edit, version control, and selectively restore specific tables or databases
Compression: Text-based output compresses well
Inspection: Human-readable format for verification

Disadvantages:

Slower: Must execute all INSERT statements during restore
Table locking: Can impact production during backup
Larger size: SQL statements are verbose before compression

Physical Backups copy the raw database files from disk, including data files, logs, and configuration. Tools like Percona XtraBackup, MySQL Enterprise Backup, or simple file system snapshots fall into this category. Advantages include:

Speed: Direct file copy is much faster for large databases
Minimal downtime: Hot backup tools can backup InnoDB without locking
Complete state: Captures exact database state including indexes
Point-in-time recovery: Combined with binary logs enables precise recovery

Disadvantages:

Platform dependency: Requires same MySQL version and architecture
Less portable: Can't easily migrate between different MySQL configurations
Storage engine specific: Some tools work only with specific engines (e.g., XtraBackup with InnoDB)
Binary format: Not human-readable, harder to verify

Best Practices: Use logical backups for development, testing, and smaller databases (< 100GB). Use physical backups for production systems, large databases, and scenarios requiring quick recovery. Implement both for comprehensive disaster recovery: physical for quick restoration, logical for long-term archival and migration flexibility.

Code Example:

-- LOGICAL BACKUP EXAMPLE
-- Backup entire database with mysqldump
mysqldump -u root -p \
  --single-transaction \
  --routines \
  --triggers \
  --events \
  my_database > my_database_backup.sql

-- Backup specific tables
mysqldump -u root -p my_database users orders > tables_backup.sql

-- Backup all databases
mysqldump -u root -p --all-databases > all_databases.sql

-- Restore from logical backup
mysql -u root -p my_database < my_database_backup.sql

-- PHYSICAL BACKUP EXAMPLE (using filesystem copy - requires shutdown)
-- Stop MySQL server
sudo systemctl stop mysql

-- Copy data directory
sudo cp -r /var/lib/mysql /backup/mysql_$(date +%Y%m%d)

-- Start MySQL server
sudo systemctl start mysql

-- PHYSICAL BACKUP (using Percona XtraBackup - hot backup)
-- Full backup without stopping MySQL
xtrabackup --backup \
  --target-dir=/backup/full_backup \
  --user=root \
  --password=yourpassword

-- Prepare backup for restore
xtrabackup --prepare --target-dir=/backup/full_backup

-- Restore (requires MySQL to be stopped)
sudo systemctl stop mysql
sudo rm -rf /var/lib/mysql/*
xtrabackup --copy-back --target-dir=/backup/full_backup
sudo chown -R mysql:mysql /var/lib/mysql
sudo systemctl start mysql

-- HYBRID APPROACH: Logical backup with compression
mysqldump -u root -p --single-transaction my_database | gzip > backup.sql.gz

-- Restore compressed logical backup
gunzip < backup.sql.gz | mysql -u root -p my_database

References:

↑ Back to top

What is Percona XtraBackup and how does it work?

The 30-Second Answer:

Percona XtraBackup is an open-source hot backup tool for MySQL that creates physical backups of InnoDB, XtraDB, and MyISAM tables without blocking database operations. It works by copying InnoDB data files while recording concurrent transactions in a redo log, then applies those changes during the "prepare" phase to create a consistent backup. Key advantages include zero downtime for InnoDB backups, faster backup and restore compared to mysqldump, incremental backup support, and compression capabilities. It's the preferred solution for backing up large MySQL databases in production.

The 2-Minute Answer (If They Want More):

Percona XtraBackup creates physical backups by copying actual database files rather than exporting SQL. The process works in several phases:

Backup Phase:

Copy Data Files: XtraBackup begins copying InnoDB data files (.ibd) to the backup directory while the database remains online
Monitor Changes: Simultaneously, it monitors the InnoDB redo log (transaction log) and copies all new entries to a file called xtrabackup_logfile
Lock for Non-InnoDB: Brief lock on non-InnoDB tables (MyISAM) to ensure consistency
Record LSN: Records the Log Sequence Number (LSN) representing the backup's point-in-time

Prepare Phase: After backup completes, you must "prepare" it before restoration:

Apply Redo Log: XtraBackup applies committed transactions from xtrabackup_logfile to the data files
Rollback Uncommitted: Rolls back any uncommitted transactions
Result: Creates a consistent point-in-time snapshot ready for restore

Key Features:

Hot Backup: InnoDB/XtraDB tables backed up without locking or downtime
Incremental Backups: Only copies changed pages since last backup, saving time and space
Compression: Built-in compression reduces backup size and transfer time
Partial Backups: Backup specific databases or tables
Streaming: Direct backup to remote server without local storage
Encryption: Encrypt backups for security
Parallel Processing: Multi-threaded for faster operations

Advantages over mysqldump:

10-100x faster for large databases (TBs)
No table locks for InnoDB (zero downtime)
Binary format smaller than SQL dumps
Faster restore (file copy vs. SQL execution)
Incremental backup capability

Limitations:

Requires same MySQL version for restore (or compatible)
More complex than mysqldump
Binary format not portable across platforms
Requires adequate disk space for data files plus redo log changes

Best Use Cases:

Production databases > 100GB
Databases requiring minimal downtime
Systems needing fast recovery time objectives (RTO)
Environments with regular incremental backup needs
Disaster recovery scenarios requiring point-in-time recovery

Code Example:

-- INSTALLATION
-- Ubuntu/Debian
wget https://repo.percona.com/apt/percona-release_latest.$(lsb_release -sc)_all.deb
sudo dpkg -i percona-release_latest.$(lsb_release -sc)_all.deb
sudo apt-get update
sudo apt-get install percona-xtrabackup-80

-- RedHat/CentOS
sudo yum install https://repo.percona.com/yum/percona-release-latest.noarch.rpm
sudo yum install percona-xtrabackup-80

-- BASIC FULL BACKUP
-- Create backup directory
mkdir -p /backup/full

-- Perform full backup
xtrabackup --backup \
  --target-dir=/backup/full \
  --user=root \
  --password=yourpassword

-- Alternative: using MySQL config file
xtrabackup --backup \
  --target-dir=/backup/full \
  --defaults-file=/etc/mysql/my.cnf

-- PREPARE BACKUP (required before restore)
xtrabackup --prepare --target-dir=/backup/full

-- RESTORE BACKUP
-- Stop MySQL
sudo systemctl stop mysql

-- Clear datadir (DANGEROUS - ensure you have backup!)
sudo rm -rf /var/lib/mysql/*

-- Copy backup to datadir
xtrabackup --copy-back --target-dir=/backup/full

-- Fix permissions
sudo chown -R mysql:mysql /var/lib/mysql

-- Start MySQL
sudo systemctl start mysql

-- COMPRESSED BACKUP
-- Compress during backup (saves space)
xtrabackup --backup \
  --compress \
  --compress-threads=4 \
  --target-dir=/backup/compressed \
  --user=root \
  --password=yourpassword

-- Decompress before prepare
xtrabackup --decompress --target-dir=/backup/compressed

-- Remove compressed files
find /backup/compressed -name "*.qp" -delete

-- Then prepare
xtrabackup --prepare --target-dir=/backup/compressed

-- INCREMENTAL BACKUP
-- First, create full backup (base)
xtrabackup --backup \
  --target-dir=/backup/base \
  --user=root \
  --password=yourpassword

-- Create first incremental backup
xtrabackup --backup \
  --target-dir=/backup/inc1 \
  --incremental-basedir=/backup/base \
  --user=root \
  --password=yourpassword

-- Create second incremental backup (based on inc1)
xtrabackup --backup \
  --target-dir=/backup/inc2 \
  --incremental-basedir=/backup/inc1 \
  --user=root \
  --password=yourpassword

-- PREPARE INCREMENTAL BACKUPS
-- Prepare base backup (with --apply-log-only to keep redo logs)
xtrabackup --prepare --apply-log-only --target-dir=/backup/base

-- Apply first incremental to base
xtrabackup --prepare --apply-log-only \
  --target-dir=/backup/base \
  --incremental-dir=/backup/inc1

-- Apply second incremental to base (no --apply-log-only on last)
xtrabackup --prepare \
  --target-dir=/backup/base \
  --incremental-dir=/backup/inc2

-- Final prepare
xtrabackup --prepare --target-dir=/backup/base

-- Now restore from /backup/base as shown above

-- STREAMING BACKUP TO REMOTE SERVER
-- Stream backup via SSH
xtrabackup --backup --stream=xbstream --user=root --password=yourpassword | \
  ssh user@remotehost "xbstream -x -C /backup/remote"

-- Stream and compress
xtrabackup --backup --stream=xbstream --compress --user=root | \
  ssh user@remotehost "xbstream -x -C /backup/remote"

-- ENCRYPTED BACKUP
-- Generate encryption key
openssl rand -base64 24 > /root/.xtrabackup_encrypt_key

-- Backup with encryption
xtrabackup --backup \
  --encrypt=AES256 \
  --encrypt-key-file=/root/.xtrabackup_encrypt_key \
  --target-dir=/backup/encrypted \
  --user=root \
  --password=yourpassword

-- Decrypt before prepare
xtrabackup --decrypt=AES256 \
  --encrypt-key-file=/root/.xtrabackup_encrypt_key \
  --target-dir=/backup/encrypted

-- Remove encrypted files
find /backup/encrypted -name "*.xbcrypt" -delete

-- PARTIAL BACKUP (specific databases)
xtrabackup --backup \
  --databases="database1 database2" \
  --target-dir=/backup/partial \
  --user=root \
  --password=yourpassword

-- PARALLEL PROCESSING (faster for large databases)
xtrabackup --backup \
  --parallel=4 \
  --target-dir=/backup/parallel \
  --user=root \
  --password=yourpassword

-- THROTTLING (limit I/O impact)
-- Limit to 10MB/s to reduce impact on production
xtrabackup --backup \
  --throttle=10 \
  --target-dir=/backup/throttled \
  --user=root \
  --password=yourpassword

-- VERIFY BACKUP
-- Check backup integrity
xtrabackup --backup --target-dir=/backup/verify --user=root --password=yourpassword

-- Prepare and check for errors
xtrabackup --prepare --target-dir=/backup/verify

-- If no errors, backup is valid

-- AUTOMATED BACKUP SCRIPT
#!/bin/bash
# Percona XtraBackup Automation Script

BACKUP_DIR="/backup/mysql"
FULL_BACKUP_DIR="$BACKUP_DIR/full"
INC_BACKUP_DIR="$BACKUP_DIR/incremental"
DATE=$(date +%Y%m%d_%H%M%S)
MYSQL_USER="root"
MYSQL_PASSWORD="yourpassword"
RETENTION_DAYS=7

# Create backup directories
mkdir -p $FULL_BACKUP_DIR
mkdir -p $INC_BACKUP_DIR

# Determine if we need full or incremental backup
# Full backup on Sunday, incremental rest of week
if [ $(date +%u) -eq 7 ]; then
    # Full backup
    echo "Starting full backup..."
    xtrabackup --backup \
        --target-dir=$FULL_BACKUP_DIR/$DATE \
        --user=$MYSQL_USER \
        --password=$MYSQL_PASSWORD \
        --compress \
        --compress-threads=4

    if [ $? -eq 0 ]; then
        echo "Full backup completed: $DATE"
        # Mark as latest full backup
        echo $DATE > $BACKUP_DIR/latest_full
    else
        echo "ERROR: Full backup failed"
        exit 1
    fi
else
    # Incremental backup
    LATEST_FULL=$(cat $BACKUP_DIR/latest_full)
    echo "Starting incremental backup based on $LATEST_FULL..."

    xtrabackup --backup \
        --target-dir=$INC_BACKUP_DIR/$DATE \
        --incremental-basedir=$FULL_BACKUP_DIR/$LATEST_FULL \
        --user=$MYSQL_USER \
        --password=$MYSQL_PASSWORD \
        --compress \
        --compress-threads=4

    if [ $? -eq 0 ]; then
        echo "Incremental backup completed: $DATE"
    else
        echo "ERROR: Incremental backup failed"
        exit 1
    fi
fi

# Cleanup old backups
find $FULL_BACKUP_DIR -type d -mtime +$RETENTION_DAYS -exec rm -rf {} +
find $INC_BACKUP_DIR -type d -mtime +$RETENTION_DAYS -exec rm -rf {} +

-- MONITORING BACKUP PROGRESS
-- In another terminal, monitor backup size
watch -n 1 'du -sh /backup/full'

-- Monitor XtraBackup log output
tail -f /backup/full/xtrabackup.log

-- Check LSN (Log Sequence Number) in backup
cat /backup/full/xtrabackup_checkpoints

-- OUTPUT:
-- backup_type = full-backuped
-- from_lsn = 0
-- to_lsn = 2456789
-- last_lsn = 2456789

References:

↑ Back to top

Storage Engines

What is the difference between InnoDB and MyISAM?

The 30-Second Answer:

InnoDB and MyISAM are MySQL storage engines with fundamentally different architectures. InnoDB is transaction-safe with ACID compliance, row-level locking, crash recovery, and foreign key support - ideal for applications requiring data integrity. MyISAM is simpler and faster for read-heavy workloads but lacks transactions, uses table-level locking (causing concurrency issues), and has no crash recovery. InnoDB is the default since MySQL 5.5 and recommended for most applications.

The 2-Minute Answer (If They Want More):

InnoDB and MyISAM represent two different philosophies in database storage engine design:

InnoDB is a robust, transaction-safe storage engine designed for reliability and data integrity. It implements ACID properties (Atomicity, Consistency, Isolation, Durability), making it suitable for mission-critical applications. InnoDB uses row-level locking, which allows multiple transactions to modify different rows simultaneously, providing excellent concurrency for write-heavy workloads. It supports foreign key constraints for referential integrity, automatic crash recovery through the doublewrite buffer and redo logs, and MVCC (Multi-Version Concurrency Control) for non-blocking reads. InnoDB stores data in clustered indexes organized by primary key, which optimizes primary key lookups.

MyISAM is a simpler, older storage engine optimized for read-heavy workloads with minimal writes. It uses table-level locking, meaning the entire table is locked during write operations, which can cause significant performance bottlenecks in concurrent environments. MyISAM doesn't support transactions, foreign keys, or automatic crash recovery - a server crash can corrupt tables requiring manual repair with myisamchk. However, MyISAM has advantages in specific scenarios: smaller disk footprint, faster for full-table scans and COUNT(*) operations (maintains row count), and supports full-text indexing (though InnoDB added this in MySQL 5.6.4).

Key Differences Summary:

Feature	InnoDB	MyISAM
Transactions	Yes (ACID)	No
Locking	Row-level	Table-level
Foreign Keys	Yes	No
Crash Recovery	Automatic	Manual repair needed
Concurrency	Excellent	Poor for writes
Storage	More disk space	Less disk space
Use Case	OLTP, general purpose	Read-heavy, logging

Code Example:

-- Check storage engine of existing tables
SELECT TABLE_NAME, ENGINE
FROM information_schema.TABLES
WHERE TABLE_SCHEMA = 'your_database';

-- Create table with InnoDB (default since MySQL 5.5)
CREATE TABLE users (
    id INT PRIMARY KEY AUTO_INCREMENT,
    username VARCHAR(50) NOT NULL,
    email VARCHAR(100) NOT NULL,
    created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
) ENGINE=InnoDB;

-- Create table with MyISAM
CREATE TABLE access_logs (
    id BIGINT PRIMARY KEY AUTO_INCREMENT,
    user_id INT,
    action VARCHAR(100),
    log_time TIMESTAMP DEFAULT CURRENT_TIMESTAMP
) ENGINE=MyISAM;

-- Demonstrate InnoDB transaction support
START TRANSACTION;
INSERT INTO users (username, email) VALUES ('john_doe', 'john@example.com');
INSERT INTO users (username, email) VALUES ('jane_smith', 'jane@example.com');
COMMIT; -- Both inserts succeed or both fail

-- MyISAM doesn't support transactions
-- Each INSERT is immediately committed, no rollback possible

-- Convert MyISAM table to InnoDB
ALTER TABLE access_logs ENGINE=InnoDB;

-- Check InnoDB status and performance metrics
SHOW ENGINE InnoDB STATUS;

-- Compare row count performance
-- MyISAM: Fast (maintains count)
SELECT COUNT(*) FROM myisam_table; -- Instant

-- InnoDB: Slower (scans index)
SELECT COUNT(*) FROM innodb_table; -- Scans clustered index

-- Demonstrate row-level locking (InnoDB)
-- Session 1:
START TRANSACTION;
UPDATE users SET email = 'newemail@example.com' WHERE id = 1;
-- (don't commit yet)

-- Session 2 (can update different row concurrently):
UPDATE users SET email = 'another@example.com' WHERE id = 2; -- Works!

-- With MyISAM, Session 2 would wait for table lock

References:

↑ Back to top

InnoDB Internals

What is the InnoDB redo log and how does it work?

The 30-Second Answer:

The redo log (also called transaction log) is a circular write-ahead log that records all changes made to InnoDB tables. When a transaction modifies data, changes are first written to the redo log (sequential writes, very fast), then applied to the buffer pool. This enables crash recovery - if MySQL crashes, it replays the redo log on restart to recover committed transactions. The redo log uses a circular buffer with configurable size (innodb_redo_log_capacity in MySQL 8.0.30+).

The 2-Minute Answer (If They Want More):

The InnoDB redo log is critical for ACID durability and crash recovery:

How It Works:

Write-Ahead Logging (WAL): Before data pages are modified in the buffer pool and flushed to disk, the changes are written to the redo log
Fast Sequential Writes: Redo log writes are sequential, making them much faster than random disk writes
Circular Buffer: The redo log wraps around when full (older committed transactions are overwritten)
Checkpoint Process: InnoDB periodically flushes dirty pages to disk and advances the checkpoint, freeing redo log space
Crash Recovery: On restart after a crash, InnoDB reads the redo log and reapplies all committed transactions since the last checkpoint

Redo Log Components:

Log Buffer (innodb_log_buffer_size): In-memory buffer for redo log entries
Redo Log Files: Physical files on disk (ib_logfile0, ib_logfile1, or #ib_redo files in MySQL 8.0.30+)
Log Sequence Number (LSN): Monotonically increasing counter tracking log position

Flush Behavior (controlled by innodb_flush_log_at_trx_commit):

0: Write to log buffer only, flush every second (fastest, least durable)
1: Write and flush to disk on every commit (slowest, fully durable - default)
2: Write to OS cache on commit, flush every second (middle ground)

Performance Trade-offs:

Larger redo log = fewer checkpoints, better write performance, longer recovery time
Smaller redo log = more frequent checkpoints, slower writes, faster recovery
Modern SSDs benefit from larger redo logs (8GB+ recommended)

Code Example:

-- View current redo log configuration (MySQL 8.0.30+)
SELECT
    @@innodb_redo_log_capacity / 1024 / 1024 / 1024 as redo_log_gb;

-- For older MySQL versions (before 8.0.30)
SELECT
    @@innodb_log_file_size / 1024 / 1024 as log_file_size_mb,
    @@innodb_log_files_in_group as num_log_files,
    (@@innodb_log_file_size * @@innodb_log_files_in_group) / 1024 / 1024 as total_redo_log_mb;

-- Configure redo log size (MySQL 8.0.30+ - dynamic)
SET GLOBAL innodb_redo_log_capacity = 8589934592;  -- 8GB

-- For older versions (requires restart)
-- Add to my.cnf:
-- innodb_log_file_size = 2G
-- innodb_log_files_in_group = 2

-- View log buffer size
SELECT @@innodb_log_buffer_size / 1024 / 1024 as log_buffer_mb;

-- Configure log buffer (dynamic)
SET GLOBAL innodb_log_buffer_size = 67108864;  -- 64MB

-- Check flush behavior
SELECT @@innodb_flush_log_at_trx_commit as flush_policy;
-- 1 = full durability (default)
-- 0 = flush every second (performance)
-- 2 = OS cache (compromise)

-- Set flush policy (dynamic)
SET GLOBAL innodb_flush_log_at_trx_commit = 1;

-- Monitor redo log usage
SELECT
    NAME,
    COUNT
FROM performance_schema.global_status
WHERE NAME LIKE 'Innodb_log%';

-- Key metrics to monitor
SELECT
    VARIABLE_NAME,
    VARIABLE_VALUE
FROM performance_schema.global_status
WHERE VARIABLE_NAME IN (
    'Innodb_log_waits',           -- Log buffer too small if > 0
    'Innodb_log_writes',          -- Number of log writes
    'Innodb_log_write_requests',  -- Number of log write requests
    'Innodb_os_log_written'       -- Bytes written to redo log
);

-- View redo log files (MySQL 8.0+)
SELECT
    FILE_NAME,
    FILE_TYPE,
    TABLESPACE_NAME,
    TOTAL_EXTENTS,
    EXTENT_SIZE
FROM information_schema.FILES
WHERE FILE_TYPE = 'REDO LOG';

-- Monitor checkpoint age (how far behind checkpointing is)
SELECT
    (SELECT VARIABLE_VALUE FROM performance_schema.global_status
     WHERE VARIABLE_NAME = 'Innodb_lsn_current') -
    (SELECT VARIABLE_VALUE FROM performance_schema.global_status
     WHERE VARIABLE_NAME = 'Innodb_lsn_flushed')
    AS checkpoint_age_bytes;

-- Check if redo log writes are causing waits
-- High Innodb_log_waits means log buffer is too small
SELECT
    VARIABLE_VALUE as log_waits
FROM performance_schema.global_status
WHERE VARIABLE_NAME = 'Innodb_log_waits';
-- If > 0, increase innodb_log_buffer_size

-- Example: Simulate redo log activity
START TRANSACTION;

UPDATE users SET last_login = NOW() WHERE id = 1;
-- Change is written to redo log buffer

INSERT INTO audit_log (user_id, action, timestamp)
VALUES (1, 'login', NOW());
-- Another redo log entry

COMMIT;
-- With innodb_flush_log_at_trx_commit=1,
-- redo log is flushed to disk NOW

-- Configuration recommendations for different scenarios

-- High-performance scenario (SSD, acceptable 1-second data loss)
SET GLOBAL innodb_flush_log_at_trx_commit = 2;
SET GLOBAL innodb_redo_log_capacity = 17179869184;  -- 16GB

-- Full durability scenario (financial data)
SET GLOBAL innodb_flush_log_at_trx_commit = 1;
SET GLOBAL innodb_redo_log_capacity = 8589934592;  -- 8GB

-- Development/testing (maximum performance)
SET GLOBAL innodb_flush_log_at_trx_commit = 0;
SET GLOBAL innodb_redo_log_capacity = 4294967296;  -- 4GB

References:

↑ Back to top

Replication

What is the difference between asynchronous and semi-synchronous replication?

The 30-Second Answer:

Asynchronous replication is MySQL's default mode where the primary doesn't wait for replicas to acknowledge transactions - it commits immediately and replicas catch up independently. Semi-synchronous replication requires at least one replica to acknowledge receiving the transaction before the primary commits, providing better durability at the cost of slightly higher latency. Asynchronous is faster but risks data loss if the primary crashes; semi-synchronous ensures at least one replica has the data but can impact write performance.

The 2-Minute Answer (If They Want More):

Asynchronous Replication (Default)

In asynchronous replication, the primary server commits transactions and returns to the client immediately without waiting for any replica confirmation. Replicas read the binary log events at their own pace, which means:

Advantages: Maximum write performance, no blocking on the primary, works well even with network latency or slow replicas
Disadvantages: No guarantee that replicas have received transactions, potential data loss if primary crashes before replicas catch up, possible replica lag during high load
Use case: Most production scenarios where performance is prioritized and some replication lag is acceptable

Semi-Synchronous Replication

Semi-synchronous replication adds an acknowledgment step. After committing a transaction, the primary waits for at least one replica to:

Receive the binary log events
Write them to its relay log
Send an acknowledgment back to the primary

Only after receiving this acknowledgment does the primary return success to the client.

Advantages: Better durability - at least one replica has the transaction, reduces data loss risk during primary failures, provides stronger consistency guarantees
Disadvantages: Increased write latency (typically 1-10ms per transaction depending on network), can fall back to asynchronous if no replicas acknowledge within timeout, requires plugin installation
Use case: High-availability scenarios where data durability is critical, financial systems, or when you need to minimize RPO (Recovery Point Objective)

Configuration Details:

Semi-synchronous replication requires the rpl_semi_sync_master and rpl_semi_sync_slave plugins. Key configuration parameters include:

rpl_semi_sync_master_timeout: How long to wait for acknowledgment before falling back to asynchronous (default 10 seconds)
rpl_semi_sync_master_wait_for_slave_count: Number of replicas that must acknowledge (MySQL 5.7.3+)
rpl_semi_sync_master_wait_point: When to wait - AFTER_SYNC (default, safer) or AFTER_COMMIT

The Fallback Mechanism:

If no replicas acknowledge within the timeout period, semi-synchronous replication automatically falls back to asynchronous mode to prevent blocking the primary. It automatically switches back to semi-synchronous when replicas reconnect and catch up.

Code Example:

-- Enable semi-synchronous replication on PRIMARY
-- First, install the plugin
INSTALL PLUGIN rpl_semi_sync_master SONAME 'semisync_master.so';

-- Enable semi-sync replication
SET GLOBAL rpl_semi_sync_master_enabled = 1;

-- Configure timeout (10 seconds)
SET GLOBAL rpl_semi_sync_master_timeout = 10000;

-- Configure wait point (AFTER_SYNC is safer - default in MySQL 5.7+)
SET GLOBAL rpl_semi_sync_master_wait_point = 'AFTER_SYNC';

-- Require acknowledgment from at least 1 replica (MySQL 5.7.3+)
SET GLOBAL rpl_semi_sync_master_wait_for_slave_count = 1;

-- Make settings persistent (add to my.cnf)
/*
[mysqld]
rpl_semi_sync_master_enabled=1
rpl_semi_sync_master_timeout=10000
rpl_semi_sync_master_wait_point=AFTER_SYNC
rpl_semi_sync_master_wait_for_slave_count=1
*/

-- Enable semi-synchronous replication on REPLICA
-- Install the plugin
INSTALL PLUGIN rpl_semi_sync_slave SONAME 'semisync_slave.so';

-- Enable semi-sync on replica
SET GLOBAL rpl_semi_sync_slave_enabled = 1;

-- Make persistent (add to my.cnf)
/*
[mysqld]
rpl_semi_sync_slave_enabled=1
*/

-- Restart replication IO thread to activate semi-sync
STOP SLAVE IO_THREAD;
START SLAVE IO_THREAD;

-- Monitor semi-synchronous replication status on PRIMARY
SHOW STATUS LIKE 'Rpl_semi_sync_master%';
-- Key metrics:
-- Rpl_semi_sync_master_status: ON/OFF
-- Rpl_semi_sync_master_clients: Number of semi-sync replicas
-- Rpl_semi_sync_master_yes_tx: Transactions acknowledged
-- Rpl_semi_sync_master_no_tx: Transactions not acknowledged (fell back to async)
-- Rpl_semi_sync_master_wait_sessions: Current waiting sessions
-- Rpl_semi_sync_master_tx_wait_time: Total wait time

-- Monitor on REPLICA
SHOW STATUS LIKE 'Rpl_semi_sync_slave%';
-- Key metric:
-- Rpl_semi_sync_slave_status: ON/OFF

-- Example monitoring query
SELECT
    VARIABLE_NAME,
    VARIABLE_VALUE
FROM performance_schema.global_status
WHERE VARIABLE_NAME LIKE 'Rpl_semi_sync_master%'
ORDER BY VARIABLE_NAME;

-- Calculate acknowledgment ratio
SELECT
    (SELECT VARIABLE_VALUE
     FROM performance_schema.global_status
     WHERE VARIABLE_NAME = 'Rpl_semi_sync_master_yes_tx') /
    (SELECT VARIABLE_VALUE
     FROM performance_schema.global_status
     WHERE VARIABLE_NAME = 'Rpl_semi_sync_master_yes_tx' +
     SELECT VARIABLE_VALUE
     FROM performance_schema.global_status
     WHERE VARIABLE_NAME = 'Rpl_semi_sync_master_no_tx')
    AS ack_ratio;

-- Testing semi-sync behavior
-- On primary, create a transaction and observe the wait
BEGIN;
INSERT INTO test_table (data) VALUES ('testing semi-sync');
COMMIT; -- This will wait for replica acknowledgment

-- If you stop all replicas, transactions will wait until timeout
-- then fall back to async mode

References:

↑ Back to top

What is the binary log and how does it enable replication?

The 30-Second Answer:

The binary log (binlog) is a set of files that record all changes to the database - DDL and DML statements that modify data. It contains "events" describing database modifications as SQL statements (SBR), row changes (RBR), or both (MIXED). The binary log enables replication by allowing replicas to read and replay these events to maintain synchronized copies of the data. It also enables point-in-time recovery and is essential for many HA (High Availability) setups.

The 2-Minute Answer (If They Want More):

Purpose of the Binary Log:

Replication: Primary mechanism for data synchronization between primary and replica servers
Point-in-Time Recovery: Restore database to a specific moment by replaying binary logs after a backup
Auditing: Track all data changes for compliance and debugging
Change Data Capture (CDC): Extract database changes for data pipelines (tools like Debezium)

Binary Log Structure:

Binary Log Files: Numbered sequence files (mysql-bin.000001, mysql-bin.000002, etc.)
Binary Log Index: Index file tracking all binary log files (mysql-bin.index)
Events: Individual units of change recorded in the binary log
Log Rotation: New file created when current reaches max_binlog_size or on FLUSH LOGS

Types of Binary Log Events:

Query Events: DDL statements and DML in statement-based format
Row Events: Row changes in row-based format (Insert_rows, Update_rows, Delete_rows)
GTID Events: Global Transaction Identifiers when GTID mode is enabled
Format Description Events: Metadata about binary log format
Rotate Events: Indicate switching to a new binary log file
XID Events: Transaction commit markers for XA transactions

How Binary Log Enables Replication:

Primary Server:
- Executes client transactions
- Writes changes to binary log (binlog)
- Binary log is durable (survives crashes if sync_binlog=1)
Replica Server:
- IO Thread: Connects to primary, reads binary log events, writes to local relay log
- SQL Thread: Reads relay log, executes events to apply changes
- Maintains position (file/position or GTID) to track replication progress

Replication Flow:

PRIMARY:  Transaction â†’ Binary Log â†’ Network
                                        â†“
REPLICA:  Network â†’ Relay Log â†’ SQL Thread â†’ Data Files

Binary Log Configuration:

log_bin: Enable binary logging (path and base name)
binlog_format: STATEMENT, ROW, or MIXED
sync_binlog: Sync to disk frequency (1 = every commit, safest)
max_binlog_size: Maximum size before rotation (default 1GB)
binlog_expire_logs_seconds: Auto-purge old logs after N seconds (MySQL 8.0+)
expire_logs_days: Auto-purge old logs after N days (deprecated, use binlog_expire_logs_seconds)
binlog_do_db / binlog_ignore_db: Filter which databases to log

Performance Considerations:

Binary logging adds ~5-15% overhead to write operations
sync_binlog=1 is safest but slowest (syncs after every commit)
sync_binlog=0 is fastest but risks losing transactions on crash
sync_binlog=N syncs after N commits (balance between safety and performance)
Row-based replication generates more log data for bulk operations
Use SSDs for binary log storage to reduce I/O latency

Code Example:

-- Check if binary logging is enabled
SHOW VARIABLES LIKE 'log_bin';
-- ON means enabled

-- View binary log configuration
SHOW VARIABLES LIKE 'log_bin%';
SHOW VARIABLES LIKE 'binlog%';
SHOW VARIABLES LIKE 'sync_binlog';
SHOW VARIABLES LIKE 'max_binlog_size';

-- Enable binary logging (my.cnf configuration)
/*
[mysqld]
# Enable binary log
log_bin = /var/lib/mysql/mysql-bin
server_id = 1  # Required, must be unique in replication topology

# Binary log format
binlog_format = ROW  # or STATEMENT, MIXED

# Durability settings
sync_binlog = 1  # Safest: sync to disk after every commit
innodb_flush_log_at_trx_commit = 1  # Sync InnoDB logs too

# Size and retention
max_binlog_size = 1G
binlog_expire_logs_seconds = 604800  # 7 days in seconds

# Optional: GTID mode
gtid_mode = ON
enforce_gtid_consistency = ON
*/

-- View current binary logs
SHOW BINARY LOGS;
/*
+------------------+-----------+-----------+
| Log_name         | File_size | Encrypted |
+------------------+-----------+-----------+
| mysql-bin.000001 | 1073742   | No        |
| mysql-bin.000002 | 2048576   | No        |
| mysql-bin.000003 | 512000    | No        |
+------------------+-----------+-----------+
*/

-- View current binary log position
SHOW MASTER STATUS;
/*
+------------------+----------+--------------+------------------+-------------------+
| File             | Position | Binlog_Do_DB | Binlog_Ignore_DB | Executed_Gtid_Set |
+------------------+----------+--------------+------------------+-------------------+
| mysql-bin.000003 | 512000   |              |                  | uuid:1-1000       |
+------------------+----------+--------------+------------------+-------------------+
*/

-- View events in a binary log
SHOW BINLOG EVENTS IN 'mysql-bin.000003';
-- Limit output for readability
SHOW BINLOG EVENTS IN 'mysql-bin.000003' LIMIT 10;
SHOW BINLOG EVENTS IN 'mysql-bin.000003' FROM 1024 LIMIT 5;

/*
+------------------+-----+----------------+-----------+-------------+--------------------------------------+
| Log_name         | Pos | Event_type     | Server_id | End_log_pos | Info                                 |
+------------------+-----+----------------+-----------+-------------+--------------------------------------+
| mysql-bin.000003 | 4   | Format_desc    | 1         | 124         | Server ver: 8.0.32-MySQL, Binlog...  |
| mysql-bin.000003 | 124 | Previous_gtids | 1         | 155         | uuid:1-999                           |
| mysql-bin.000003 | 155 | Gtid           | 1         | 234         | SET @@SESSION.GTID_NEXT= 'uuid:1000' |
| mysql-bin.000003 | 234 | Query          | 1         | 315         | BEGIN                                |
| mysql-bin.000003 | 315 | Table_map      | 1         | 378         | table_id: 108 (mydb.users)          |
| mysql-bin.000003 | 378 | Write_rows     | 1         | 438         | table_id: 108 flags: STMT_END_F      |
| mysql-bin.000003 | 438 | Xid            | 1         | 469         | COMMIT /* xid=1234 */                |
+------------------+-----+----------------+-----------+-------------+--------------------------------------+
*/

-- Manually rotate binary log (create new file)
FLUSH BINARY LOGS;

-- Purge old binary logs
-- Purge logs before specific log file
PURGE BINARY LOGS TO 'mysql-bin.000003';

-- Purge logs before specific date/time
PURGE BINARY LOGS BEFORE '2025-12-20 00:00:00';

-- Purge logs older than N days
PURGE BINARY LOGS BEFORE DATE_SUB(NOW(), INTERVAL 7 DAY);

-- DO NOT PURGE if replicas are reading from those logs!
-- Check replica positions first:
SHOW SLAVE HOSTS;  -- See connected replicas

-- Reset binary logs (DELETE ALL - dangerous!)
-- Only use on fresh setup or after all replicas are reconfigured
RESET MASTER;

-- View binary log on replica
SHOW BINARY LOGS;  -- Replica's own binary log (if log_slave_updates=ON)

-- View relay log (replica's copy of primary's binary log)
SHOW RELAYLOG EVENTS;

-- Monitor binary log disk usage
SELECT
    SUBSTRING_INDEX(Log_name, '.', 1) AS log_base,
    COUNT(*) AS file_count,
    ROUND(SUM(File_size) / 1024 / 1024, 2) AS total_size_mb
FROM information_schema.binary_logs
GROUP BY log_base;

-- Read binary log from command line using mysqlbinlog utility
-- Statement-based format:
-- mysqlbinlog /var/lib/mysql/mysql-bin.000003

-- Row-based format (verbose to see row data):
-- mysqlbinlog --verbose --base64-output=DECODE-ROWS /var/lib/mysql/mysql-bin.000003

-- Point-in-time recovery example
-- Restore backup taken at 2025-12-24 00:00:00
-- Then replay binary logs from that point until 2025-12-24 10:30:00
-- mysqlbinlog --start-datetime="2025-12-24 00:00:00" \
--             --stop-datetime="2025-12-24 10:30:00" \
--             mysql-bin.000003 mysql-bin.000004 | mysql -u root -p

-- Or use positions instead of datetime
-- mysqlbinlog --start-position=512000 \
--             --stop-position=1024000 \
--             mysql-bin.000003 | mysql -u root -p

-- Check binary log encryption status (MySQL 8.0.14+)
SHOW VARIABLES LIKE 'binlog_encryption';

-- Enable binary log encryption (my.cnf)
/*
[mysqld]
binlog_encryption = ON
*/

-- Monitor binary log performance impact
-- Check binary log write performance
SHOW GLOBAL STATUS LIKE 'Binlog%';
/*
Key metrics:
- Binlog_cache_disk_use: Transactions too large for binlog_cache_size
- Binlog_cache_use: Transactions using binlog cache
- Binlog_stmt_cache_disk_use: Statement cache disk usage
- Binlog_stmt_cache_use: Statement cache usage
*/

-- Optimize binlog cache size if many disk uses
SET GLOBAL binlog_cache_size = 32768;  -- Default, increase if needed
SET GLOBAL binlog_stmt_cache_size = 32768;

-- Check transaction rate vs binlog size growth
SELECT
    VARIABLE_VALUE AS transactions
FROM performance_schema.global_status
WHERE VARIABLE_NAME = 'Com_commit';

-- Example: Create a transaction and verify it's in binlog
CREATE TABLE binlog_test (
    id INT PRIMARY KEY AUTO_INCREMENT,
    data VARCHAR(100),
    created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
);

INSERT INTO binlog_test (data) VALUES ('test data');

-- Check the last few binlog events
SHOW BINLOG EVENTS IN 'mysql-bin.000003' FROM 500000 LIMIT 20;
-- Should see events for CREATE TABLE and INSERT

-- Replication setup using binary log position
-- On PRIMARY:
SHOW MASTER STATUS;
-- Note File and Position

-- On REPLICA:
CHANGE MASTER TO
    MASTER_HOST = '192.168.1.10',
    MASTER_USER = 'repl_user',
    MASTER_PASSWORD = 'password',
    MASTER_LOG_FILE = 'mysql-bin.000003',  -- From SHOW MASTER STATUS
    MASTER_LOG_POS = 512000;                -- From SHOW MASTER STATUS

START SLAVE;

-- Monitor replication using binary log positions
SHOW SLAVE STATUS\G
/*
Key fields:
- Master_Log_File: Current binary log file being read from primary
- Read_Master_Log_Pos: Position in that file
- Relay_Master_Log_File: Binary log file being executed
- Exec_Master_Log_Pos: Position being executed
- Seconds_Behind_Master: Replication lag
*/

-- Binary log filtering (use with caution)
-- In my.cnf:
/*
[mysqld]
# Only log specific databases
binlog_do_db = mydb
binlog_do_db = another_db

# Ignore specific databases (NOT recommended, breaks point-in-time recovery)
binlog_ignore_db = test
binlog_ignore_db = temp
*/

References:

↑ Back to top

High Availability

What is the difference between ProxySQL and MySQL Router?

The 30-Second Answer:

ProxySQL is a feature-rich, open-source proxy with advanced query routing, caching, query rewriting, and connection pooling, designed for complex database architectures. MySQL Router is Oracle's lightweight routing solution specifically optimized for MySQL InnoDB Cluster with automatic configuration and simpler setup. ProxySQL offers more features and flexibility but requires more configuration, while Router provides easier integration with InnoDB Cluster but fewer advanced features.

The 2-Minute Answer (If They Want More):

ProxySQL and MySQL Router serve similar purposes but differ significantly in capabilities and use cases:

ProxySQL Advantages:

Query Analysis and Routing: Can route queries based on content (SELECT to replicas, writes to primary), regex patterns, and query digests
Query Caching: Built-in result set caching to reduce database load
Query Rewriting: Ability to modify queries on-the-fly for optimization or compatibility
Advanced Connection Pooling: Sophisticated multiplexing reduces backend connections significantly
Traffic Mirroring: Send duplicate traffic to test environments
Firewall Capabilities: Block queries matching specific patterns
Extensive Monitoring: Rich statistics and performance metrics through admin interface
Scheduler: Built-in job scheduler for maintenance tasks
Flexibility: Works with any MySQL-compatible database, not just InnoDB Cluster

MySQL Router Advantages:

InnoDB Cluster Integration: Automatic discovery and configuration with InnoDB Cluster
Simplicity: Bootstrap mode eliminates most manual configuration
Metadata-Driven: Automatically tracks cluster topology changes through metadata
Official Support: Backed by Oracle as part of the MySQL ecosystem
X Protocol Support: Native support for MySQL X Protocol
Lower Resource Usage: Lightweight design with minimal memory footprint
Easier Upgrades: Synchronized with MySQL version releases

ProxySQL Disadvantages:

Requires manual configuration and rule setup
Steeper learning curve
Configuration complexity for advanced features
More resource intensive
Requires separate monitoring setup

MySQL Router Disadvantages:

Limited to basic routing (read-write split by port)
No query-based routing or caching
Less flexible with non-InnoDB Cluster setups
Fewer monitoring and statistics features
No query rewriting or firewall capabilities
Limited customization options

Use Case Recommendations:

Choose ProxySQL for: Complex routing requirements, query caching needs, legacy replication setups, multi-cluster environments, need for query analysis and rewriting
Choose MySQL Router for: InnoDB Cluster deployments, simple read-write splitting, preference for official Oracle solutions, minimal configuration requirements

Code Example:

-- ProxySQL Configuration
-- Connect to ProxySQL admin interface
-- mysql -h 127.0.0.1 -P 6032 -u admin -p

-- Add MySQL servers
INSERT INTO mysql_servers (hostgroup_id, hostname, port)
VALUES
  (0, 'primary.example.com', 3306),
  (1, 'replica1.example.com', 3306),
  (1, 'replica2.example.com', 3306);

LOAD MYSQL SERVERS TO RUNTIME;
SAVE MYSQL SERVERS TO DISK;

-- Configure users
INSERT INTO mysql_users (username, password, default_hostgroup, transaction_persistent)
VALUES ('app_user', 'password', 0, 1);

LOAD MYSQL USERS TO RUNTIME;
SAVE MYSQL USERS TO DISK;

-- Query routing rules (reads to replicas, writes to primary)
INSERT INTO mysql_query_rules (rule_id, active, match_digest, destination_hostgroup, apply)
VALUES
  (1, 1, '^SELECT.*FOR UPDATE', 0, 1),
  (2, 1, '^SELECT', 1, 1);

LOAD MYSQL QUERY RULES TO RUNTIME;
SAVE MYSQL QUERY RULES TO DISK;

-- Enable query caching
INSERT INTO mysql_query_rules (rule_id, active, match_digest, cache_ttl, apply)
VALUES (10, 1, '^SELECT COUNT', 60000, 1);

-- Query rewriting example
INSERT INTO mysql_query_rules (rule_id, active, match_pattern, replace_pattern, apply)
VALUES (20, 1, '^SELECT \* FROM users$', 'SELECT id, name, email FROM users', 1);

-- Configure connection pooling
UPDATE global_variables SET variable_value='1000'
WHERE variable_name='mysql-max_connections';

UPDATE global_variables SET variable_value='200'
WHERE variable_name='mysql-default_max_latency_ms';

LOAD MYSQL VARIABLES TO RUNTIME;
SAVE MYSQL VARIABLES TO DISK;

-- Monitor statistics
SELECT * FROM stats_mysql_query_digest ORDER BY sum_time DESC LIMIT 10;
SELECT * FROM stats_mysql_connection_pool;
SELECT * FROM stats_mysql_commands_counters;

-- Health check configuration
UPDATE mysql_servers SET max_replication_lag=10 WHERE hostgroup_id=1;

UPDATE global_variables SET variable_value='2000'
WHERE variable_name='mysql-monitor_connect_interval';

-- --- MySQL Router Configuration ---
-- Router is configured primarily through mysqlrouter.conf
-- Bootstrap creates configuration automatically

-- View router status (MySQL Shell)
// var cluster = dba.getCluster();
// cluster.listRouters();

-- Check router REST API (if enabled)
-- curl http://localhost:8443/api/20190715/routes

-- Monitor connections through router
-- netstat -an | grep :6446  # Read-write port
-- netstat -an | grep :6447  # Read-only port

-- Application connection comparison
-- ProxySQL: All traffic through single port (3306 by default)
-- mysql -h proxysql-host -P 6033 -u app_user -p

-- MySQL Router: Different ports for read-write vs read-only
-- mysql -h router-host -P 6446 -u app_user -p  # Read-write
-- mysql -h router-host -P 6447 -u app_user -p  # Read-only

-- ProxySQL monitoring queries
SELECT hostgroup, srv_host, status, Queries, Bytes_sent, Bytes_recv
FROM stats_mysql_connection_pool;

SELECT digest_text, count_star, sum_time, min_time, max_time
FROM stats_mysql_query_digest
ORDER BY sum_time DESC LIMIT 20;

-- ProxySQL health check
SELECT * FROM mysql_server_ping_log ORDER BY time_start DESC LIMIT 10;
SELECT * FROM mysql_server_replication_lag_log ORDER BY time_start DESC LIMIT 10;

-- Advanced ProxySQL features
-- Traffic mirroring
INSERT INTO mysql_query_rules (rule_id, active, match_digest, mirror_hostgroup, apply)
VALUES (100, 1, '^SELECT.*FROM orders', 2, 1);

-- Query firewall
INSERT INTO mysql_query_rules (rule_id, active, match_pattern, error_msg, apply)
VALUES (200, 1, '.*DROP TABLE.*', 'DROP TABLE not allowed', 1);

-- Scheduler for automated tasks
INSERT INTO scheduler (active, interval_ms, filename, arg1)
VALUES (1, 300000, '/var/lib/proxysql/check_readonly.sh', '1');

LOAD SCHEDULER TO RUNTIME;
SAVE SCHEDULER TO DISK;

-- ProxySQL query analysis
SELECT hostgroup, schemaname, username, digest_text, count_star
FROM stats_mysql_query_digest
WHERE digest_text LIKE '%JOIN%'
ORDER BY sum_time DESC;

-- Connection pool efficiency
SELECT hostgroup,
       SUM(ConnUsed) as used_connections,
       SUM(ConnFree) as free_connections,
       SUM(Queries) as total_queries
FROM stats_mysql_connection_pool
GROUP BY hostgroup;

References:

↑ Back to top

Performance Tuning

What is Performance Schema?

The 30-Second Answer:

Performance Schema is MySQL's built-in instrumentation framework that collects real-time performance metrics at low overhead. It monitors server execution at runtime, tracking statement execution, table I/O, locks, memory usage, and more through in-memory tables. Unlike the slow query log, it provides detailed timing breakdowns, waits analysis, and is queryable like regular tables, making it essential for performance troubleshooting and monitoring.

The 2-Minute Answer (If They Want More):

Performance Schema is a feature for monitoring MySQL server execution at a low level, introduced in MySQL 5.5 and significantly enhanced in later versions. It operates as a storage engine with special in-memory tables that capture performance data.

Key Characteristics:

Data Collection:

Instruments server code to collect timing and execution metrics
Measures statement execution, I/O operations, locks, memory allocation
Tracks metadata operations, table I/O, index usage
Records connection activity, prepared statements, stages
Monitors replication performance

Design Philosophy:

Low overhead (typically 5-10% when fully enabled)
In-memory tables (data lost on restart)
Queryable with standard SQL
Configurable instrumentation levels
No need to restart server for most changes

Key Table Categories:

Setup Tables (setup_*): Configure what to monitor
Instance Tables (*_instances): Objects being monitored
Event Tables (events_*): Current and historical events
Summary Tables (*_summary_*): Aggregated statistics
Status Variables (status_by_*): Status variable summaries
Connection Tables: Current and historical connections
Replication Tables: Replication performance metrics

Common Use Cases:

Identify slow queries with detailed timing breakdown
Find tables with most I/O operations
Analyze wait events (what's blocking queries)
Monitor memory usage by user/thread
Track index usage and missing indexes
Debug locking and deadlock issues
Analyze prepared statement performance
Monitor replication lag and throughput

Advantages Over Slow Query Log:

Real-time, queryable data
Detailed execution stage timing
Wait event analysis
No need for log file parsing
Can aggregate and filter with SQL
Lower overhead than table-based slow log

Configuration Considerations:

Enabled by default in MySQL 5.6.6+
Many instruments disabled by default for performance
Can enable/disable instruments dynamically
Memory usage configurable
Use sys schema for easier querying

When to Use:

Performance troubleshooting and optimization
Monitoring production workloads
Capacity planning and trend analysis
Identifying resource bottlenecks
Comparing performance before/after changes

Code Example:

-- Check if Performance Schema is enabled
SHOW VARIABLES LIKE 'performance_schema';

-- View memory usage by Performance Schema
SELECT * FROM performance_schema.memory_summary_global_by_event_name
WHERE EVENT_NAME LIKE 'memory/performance_schema%'
ORDER BY CURRENT_NUMBER_OF_BYTES_USED DESC;

-- List all setup/configuration tables
SHOW TABLES FROM performance_schema LIKE 'setup%';

-- View enabled instruments
SELECT NAME, ENABLED, TIMED
FROM performance_schema.setup_instruments
WHERE ENABLED = 'YES'
LIMIT 20;

-- Enable specific instruments
UPDATE performance_schema.setup_instruments
SET ENABLED = 'YES', TIMED = 'YES'
WHERE NAME LIKE 'statement/%';

-- Enable wait event monitoring
UPDATE performance_schema.setup_instruments
SET ENABLED = 'YES', TIMED = 'YES'
WHERE NAME LIKE 'wait/%';

-- View consumers (where data goes)
SELECT * FROM performance_schema.setup_consumers;

-- Enable statement history
UPDATE performance_schema.setup_consumers
SET ENABLED = 'YES'
WHERE NAME LIKE '%statement%';

-- Find slowest queries currently executing
SELECT
    THREAD_ID,
    EVENT_NAME,
    TRUNCATE(TIMER_WAIT/1000000000000, 2) AS duration_sec,
    TRUNCATE(LOCK_TIME/1000000000000, 2) AS lock_time_sec,
    SQL_TEXT,
    CURRENT_SCHEMA,
    ROWS_EXAMINED,
    ROWS_SENT
FROM performance_schema.events_statements_current
WHERE TIMER_WAIT IS NOT NULL
ORDER BY TIMER_WAIT DESC
LIMIT 10;

-- Historical statement analysis (last 100 statements per thread)
SELECT
    TRUNCATE(TIMER_WAIT/1000000000000, 2) AS duration_sec,
    SQL_TEXT,
    ROWS_EXAMINED,
    ROWS_SENT,
    CREATED_TMP_TABLES,
    CREATED_TMP_DISK_TABLES
FROM performance_schema.events_statements_history
ORDER BY TIMER_WAIT DESC
LIMIT 20;

-- Summary of statements by digest (grouped similar queries)
SELECT
    SCHEMA_NAME,
    DIGEST_TEXT,
    COUNT_STAR AS exec_count,
    TRUNCATE(AVG_TIMER_WAIT/1000000000000, 2) AS avg_sec,
    TRUNCATE(MAX_TIMER_WAIT/1000000000000, 2) AS max_sec,
    TRUNCATE(SUM_TIMER_WAIT/1000000000000, 2) AS total_sec,
    SUM_ROWS_EXAMINED AS total_rows_examined,
    SUM_ROWS_SENT AS total_rows_sent
FROM performance_schema.events_statements_summary_by_digest
WHERE SCHEMA_NAME IS NOT NULL
ORDER BY SUM_TIMER_WAIT DESC
LIMIT 10;

-- Table I/O statistics
SELECT
    OBJECT_SCHEMA,
    OBJECT_NAME,
    COUNT_READ,
    COUNT_WRITE,
    COUNT_FETCH,
    COUNT_INSERT,
    COUNT_UPDATE,
    COUNT_DELETE
FROM performance_schema.table_io_waits_summary_by_table
WHERE OBJECT_SCHEMA NOT IN ('mysql', 'performance_schema', 'information_schema')
ORDER BY COUNT_READ + COUNT_WRITE DESC
LIMIT 10;

-- Index usage statistics
SELECT
    OBJECT_SCHEMA,
    OBJECT_NAME,
    INDEX_NAME,
    COUNT_FETCH,
    COUNT_INSERT,
    COUNT_UPDATE,
    COUNT_DELETE
FROM performance_schema.table_io_waits_summary_by_index_usage
WHERE OBJECT_SCHEMA NOT IN ('mysql', 'performance_schema')
AND INDEX_NAME IS NOT NULL
ORDER BY COUNT_FETCH DESC
LIMIT 10;

-- Find unused indexes
SELECT
    OBJECT_SCHEMA,
    OBJECT_NAME,
    INDEX_NAME
FROM performance_schema.table_io_waits_summary_by_index_usage
WHERE OBJECT_SCHEMA NOT IN ('mysql', 'performance_schema', 'information_schema')
AND INDEX_NAME IS NOT NULL
AND COUNT_STAR = 0
ORDER BY OBJECT_SCHEMA, OBJECT_NAME;

-- Wait events analysis (what's causing delays)
SELECT
    EVENT_NAME,
    COUNT_STAR AS count,
    TRUNCATE(SUM_TIMER_WAIT/1000000000000, 2) AS total_sec,
    TRUNCATE(AVG_TIMER_WAIT/1000000000000, 6) AS avg_sec,
    TRUNCATE(MAX_TIMER_WAIT/1000000000000, 2) AS max_sec
FROM performance_schema.events_waits_summary_global_by_event_name
WHERE COUNT_STAR > 0
ORDER BY SUM_TIMER_WAIT DESC
LIMIT 10;

-- Memory usage by thread
SELECT
    THREAD_ID,
    EVENT_NAME,
    CURRENT_NUMBER_OF_BYTES_USED / 1024 / 1024 AS current_mb,
    HIGH_NUMBER_OF_BYTES_USED / 1024 / 1024 AS high_mb
FROM performance_schema.memory_summary_by_thread_by_event_name
WHERE CURRENT_NUMBER_OF_BYTES_USED > 0
ORDER BY CURRENT_NUMBER_OF_BYTES_USED DESC
LIMIT 10;

-- Current connections
SELECT
    PROCESSLIST_ID,
    PROCESSLIST_USER,
    PROCESSLIST_HOST,
    PROCESSLIST_DB,
    PROCESSLIST_COMMAND,
    PROCESSLIST_TIME,
    PROCESSLIST_STATE
FROM performance_schema.threads
WHERE TYPE = 'FOREGROUND'
ORDER BY PROCESSLIST_TIME DESC;

-- Reset/truncate collected statistics
TRUNCATE TABLE performance_schema.events_statements_summary_by_digest;
TRUNCATE TABLE performance_schema.events_statements_history;

-- Configuration in my.cnf
# [mysqld]
# performance_schema = ON
# performance_schema_max_table_instances = 12500
# performance_schema_events_statements_history_size = 10
# performance_schema_events_statements_history_long_size = 10000

References:

↑ Back to top

Indexing

What is a clustered index in InnoDB?

The 30-Second Answer:

A clustered index in InnoDB is the primary key index that determines the physical storage order of table data. The table data itself is stored in the leaf nodes of the clustered index B-tree. If no primary key is defined, InnoDB uses the first unique non-null index, or creates a hidden 6-byte row ID. Each InnoDB table has exactly one clustered index.

The 2-Minute Answer (If They Want More):

The clustered index is fundamental to InnoDB's storage architecture. Unlike secondary indexes that store only key values and primary key references, the clustered index stores the actual row data in its leaf nodes. This has several important implications:

Physical Order: Rows are physically stored on disk in clustered index order, making range scans on the primary key extremely efficient.
No Separate Lookup: Querying by primary key requires only traversing the clustered index B-tree - no additional lookup needed since the data is right there.
Choice Matters: Your primary key choice affects all queries. Sequential primary keys (like AUTO_INCREMENT) avoid page splits and fragmentation, while UUIDs can cause performance issues due to random insertions.
Secondary Index Impact: All secondary indexes store the clustered index key value as their "pointer" to the row, so a large primary key makes all indexes larger.
Implicit in Queries: When you query by primary key, you're using the clustered index even without explicitly creating an index.

Best practices: Use a small, sequential primary key when possible. For most tables, an AUTO_INCREMENT integer is ideal. Avoid large composite primary keys or UUIDs unless you have specific requirements.

Code Example:

-- Table with explicit primary key (becomes clustered index)
CREATE TABLE users (
    user_id INT AUTO_INCREMENT PRIMARY KEY,
    email VARCHAR(255),
    created_at TIMESTAMP
);

-- This query uses the clustered index efficiently
SELECT * FROM users WHERE user_id = 12345;

-- Range query benefits from physical ordering
SELECT * FROM users WHERE user_id BETWEEN 1000 AND 2000;

-- View index information
SHOW INDEX FROM users;
-- PRIMARY key will show as the clustered index

-- Table without primary key - InnoDB creates hidden clustered index
CREATE TABLE logs (
    message TEXT,
    created_at TIMESTAMP
);
-- InnoDB internally creates a 6-byte hidden row ID as clustered index

-- Composite primary key (becomes clustered index)
CREATE TABLE order_items (
    order_id INT,
    item_id INT,
    quantity INT,
    PRIMARY KEY (order_id, item_id)
);
-- Data is physically stored ordered by order_id, then item_id
-- Queries on order_id benefit from clustering
SELECT * FROM order_items WHERE order_id = 100;

References:

↑ Back to top

What is a secondary index and how does it differ from a clustered index?

The 30-Second Answer:

A secondary index (also called non-clustered index) is any index other than the clustered index. In InnoDB, secondary index leaf nodes contain the indexed column values plus the primary key value, not the full row data. Querying via secondary index requires two lookups: first to find the primary key in the secondary index, then to retrieve the full row from the clustered index. This is called a "bookmark lookup" or "clustered index lookup."

The 2-Minute Answer (If They Want More):

Secondary indexes differ from clustered indexes in several critical ways:

Storage Structure: Secondary index leaf nodes store only the indexed columns and the primary key value. The actual row data remains in the clustered index.
Two-Step Lookup: When you query using a secondary index, InnoDB:
- First traverses the secondary index B-tree to find matching entries
- Extracts the primary key value from each match
- Uses that primary key to look up the full row in the clustered index
This double lookup is why primary key queries are faster than secondary index queries.
Multiple Allowed: A table can have multiple secondary indexes, but only one clustered index.
Index Size: Because secondary indexes include the primary key value, a large primary key increases the size of every secondary index. This is why keeping primary keys small is important.
Covering Index Optimization: If your query only needs columns that are in the secondary index (including the primary key), MySQL can skip the second lookup - this is called an "index-only scan" or "covering index."
Maintenance Cost: Secondary indexes must be updated on INSERT, UPDATE, and DELETE operations, adding write overhead. Each additional index slows down writes.

Code Example:

CREATE TABLE products (
    product_id INT AUTO_INCREMENT PRIMARY KEY,  -- Clustered index
    name VARCHAR(255),
    category VARCHAR(100),
    price DECIMAL(10,2),
    created_at TIMESTAMP,
    INDEX idx_category (category),              -- Secondary index
    INDEX idx_price (price),                    -- Secondary index
    INDEX idx_name (name)                       -- Secondary index
);

-- Query using secondary index (two-step lookup)
SELECT * FROM products WHERE category = 'Electronics';
-- Step 1: Find primary keys in idx_category where category='Electronics'
-- Step 2: For each primary key, fetch full row from clustered index

-- Covering index query (single lookup)
SELECT product_id, category FROM products WHERE category = 'Electronics';
-- Only uses idx_category - no clustered index lookup needed
-- idx_category contains both category (indexed column) and product_id (primary key)

-- Composite secondary index
CREATE INDEX idx_category_price ON products (category, price);

-- This query uses the composite index efficiently
SELECT * FROM products WHERE category = 'Electronics' AND price < 500;

-- View how queries use indexes
EXPLAIN SELECT * FROM products WHERE category = 'Electronics'\G
-- Shows: type: ref, key: idx_category, Extra: Using index condition

EXPLAIN SELECT product_id, category FROM products WHERE category = 'Electronics'\G
-- Shows: Extra: Using index (covering index - no table access needed)

-- Impact of large primary key on secondary indexes
CREATE TABLE bad_example (
    uuid CHAR(36) PRIMARY KEY,  -- 36 bytes
    data VARCHAR(100),
    INDEX idx_data (data)
);
-- idx_data stores: data value (100 bytes) + uuid (36 bytes) = large index
-- Better: use AUTO_INCREMENT INT (4 bytes) and add uuid as a secondary index

References:

↑ Back to top

Query Optimization

What is filesort and how do you avoid it?

The 30-Second Answer:

Filesort appears in EXPLAIN output's Extra column when MySQL must perform an additional sorting pass because results aren't returned in the required order from the index. Despite the name, it may use memory (up to sort_buffer_size) or disk. Avoid filesort by: creating indexes that match the ORDER BY clause, ensuring WHERE and ORDER BY use the same index, ordering by indexed columns in index order, and keeping sort_buffer_size adequate. Filesort isn't always bad - it's only problematic for large result sets.

The 2-Minute Answer (If They Want More):

Filesort is MySQL's external sorting operation, triggered when the optimizer cannot retrieve rows in the required sort order using an index. Despite its name, filesort can use:

In-memory sorting: When dataset fits in sort_buffer_size (faster)
Disk-based sorting: When data exceeds memory buffer (slower, uses temporary files)

When Filesort Occurs:

No Index on ORDER BY columns

SELECT * FROM users ORDER BY last_login;  -- No index on last_login

ORDER BY uses columns from multiple tables

SELECT * FROM t1 JOIN t2 ON t1.id = t2.id ORDER BY t1.a, t2.b;

ORDER BY direction mismatch with index

-- Index: (a ASC, b ASC)
SELECT * FROM t ORDER BY a ASC, b DESC;  -- Can't use index (before 8.0)

WHERE and ORDER BY use different indexes

SELECT * FROM t WHERE col_a = 5 ORDER BY col_b;  -- Different columns

ORDER BY on expression or function

SELECT * FROM t ORDER BY UPPER(name);  -- Function prevents index use

Types of Filesort (visible in optimizer trace):

Modified quicksort: Default algorithm for small datasets
Merge sort: For larger datasets
Priority queue: For LIMIT queries (optimizes for top N rows)

Avoidance Strategies:

Create Covering Index: Index includes ORDER BY columns

CREATE INDEX idx_covering ON users(status, last_login, username);
SELECT username FROM users WHERE status = 'active' ORDER BY last_login;

Match Index Column Order: ORDER BY matches index prefix

-- Index: (country, city, created_at)
SELECT * FROM users
WHERE country = 'USA'
ORDER BY city, created_at;  -- Uses index

Use Descending Indexes (8.0+): Match mixed sort directions

CREATE INDEX idx_mixed ON orders(customer_id ASC, order_date DESC);
SELECT * FROM orders
WHERE customer_id = 123
ORDER BY customer_id ASC, order_date DESC;  -- No filesort

Increase sort_buffer_size: Keep sorting in memory

SET SESSION sort_buffer_size = 2097152;  -- 2MB

Reduce Selected Columns: Smaller rows fit better in sort buffer

SELECT id, name FROM users ORDER BY name;  -- Better than SELECT *

When Filesort is Acceptable:

Small result sets (hundreds/few thousand rows)
Infrequent queries
Result set already small after WHERE filtering
Query with LIMIT (uses priority queue optimization)

Monitoring Impact:

Check Sort_merge_passes status variable - non-zero indicates disk-based sorting occurred.

Code Example:

-- Example 1: Problem - filesort on large table
CREATE TABLE users (
  user_id INT PRIMARY KEY,
  username VARCHAR(50),
  email VARCHAR(100),
  created_at DATETIME,
  last_login DATETIME,
  status ENUM('active', 'inactive')
);

-- Bad: Full filesort
EXPLAIN SELECT * FROM users ORDER BY last_login DESC LIMIT 10;
-- Extra: Using filesort

-- Solution 1: Add index
CREATE INDEX idx_last_login ON users(last_login DESC);

EXPLAIN SELECT * FROM users ORDER BY last_login DESC LIMIT 10;
-- Extra: Backward index scan (8.0+) or just uses index
-- No filesort!

-- Example 2: Compound WHERE and ORDER BY
-- Bad: Different columns for WHERE and ORDER BY
EXPLAIN SELECT * FROM users
WHERE status = 'active'
ORDER BY last_login DESC;
-- Extra: Using where; Using filesort

-- Solution 2: Composite index covering both
CREATE INDEX idx_status_login ON users(status, last_login DESC);

EXPLAIN SELECT * FROM users
WHERE status = 'active'
ORDER BY last_login DESC;
-- Extra: Using where (filesort eliminated)
-- type: ref (uses index)

-- Example 3: Multi-column sort optimization
-- Bad: Sorting by multiple columns without index
EXPLAIN SELECT * FROM users
ORDER BY status, last_login DESC, username;
-- Extra: Using filesort

-- Solution 3: Composite index matching ORDER BY
CREATE INDEX idx_multi_sort ON users(status, last_login DESC, username);

EXPLAIN SELECT * FROM users
ORDER BY status, last_login DESC, username;
-- Uses index, no filesort

-- Example 4: Mixed ASC/DESC before MySQL 8.0
-- Problem in MySQL 5.7: Can't use index
EXPLAIN SELECT * FROM users
ORDER BY status ASC, last_login DESC;
-- Extra: Using filesort (MySQL 5.7)

-- Solution 4: Descending index (MySQL 8.0+)
CREATE INDEX idx_mixed_sort ON users(status ASC, last_login DESC);

EXPLAIN SELECT * FROM users
ORDER BY status ASC, last_login DESC;
-- Uses index, no filesort (MySQL 8.0+)

-- Example 5: Function in ORDER BY
-- Bad: Function prevents index use
EXPLAIN SELECT * FROM users
ORDER BY YEAR(created_at), username;
-- Extra: Using filesort

-- Solution 5a: Generated column (8.0+)
ALTER TABLE users
ADD COLUMN created_year INT AS (YEAR(created_at)) STORED;

CREATE INDEX idx_year_username ON users(created_year, username);

EXPLAIN SELECT * FROM users
ORDER BY created_year, username;
-- Uses index, no filesort

-- Solution 5b: Reorganize query if possible
CREATE INDEX idx_created_username ON users(created_at, username);

EXPLAIN SELECT * FROM users
WHERE created_at >= '2024-01-01' AND created_at < '2025-01-01'
ORDER BY created_at, username;
-- Uses index for both WHERE and ORDER BY

-- Example 6: Covering index eliminates filesort
-- Bad: Retrieves all columns
EXPLAIN SELECT * FROM users
WHERE status = 'active'
ORDER BY last_login
LIMIT 100;
-- Extra: Using where; Using filesort

-- Solution 6: Covering index for specific columns
CREATE INDEX idx_covering ON users(status, last_login, user_id, username);

EXPLAIN SELECT user_id, username FROM users
WHERE status = 'active'
ORDER BY last_login
LIMIT 100;
-- Extra: Using where; Using index
-- No filesort, no table access!

-- Example 7: Join with ORDER BY
CREATE TABLE orders (
  order_id INT PRIMARY KEY,
  customer_id INT,
  order_date DATETIME,
  total_amount DECIMAL(10,2),
  INDEX idx_customer(customer_id)
);

-- Bad: ORDER BY from joined table
EXPLAIN SELECT u.username, o.order_date, o.total_amount
FROM users u
JOIN orders o ON u.user_id = o.customer_id
ORDER BY o.order_date DESC;
-- Extra: Using filesort

-- Solution 7: Index on order_date, optimize JOIN order
CREATE INDEX idx_order_date ON orders(order_date DESC);

EXPLAIN SELECT u.username, o.order_date, o.total_amount
FROM orders o
JOIN users u ON o.customer_id = u.user_id
ORDER BY o.order_date DESC;
-- Starts with orders table, uses index for ORDER BY
-- May eliminate or reduce filesort

-- Example 8: Monitoring filesort performance
-- Check sort buffer usage
SHOW VARIABLES LIKE 'sort_buffer_size';  -- Default: 262144 (256KB)

-- Monitor filesort statistics
SHOW SESSION STATUS LIKE 'Sort%';
/*
Sort_merge_passes: 0      -- If > 0, sorts spilling to disk
Sort_range: X             -- Sorts done via range access
Sort_rows: X              -- Total rows sorted
Sort_scan: X              -- Sorts done via table scan
*/

-- Reset counters
FLUSH STATUS;

-- Run query
SELECT * FROM users ORDER BY last_login DESC LIMIT 1000;

-- Check if disk sorting occurred
SHOW SESSION STATUS LIKE 'Sort_merge_passes';
-- If > 0, consider increasing sort_buffer_size or adding index

-- Increase sort buffer for session
SET SESSION sort_buffer_size = 524288;  -- 512KB

-- Example 9: LIMIT optimization with filesort
-- Priority queue optimization (good even with filesort)
EXPLAIN SELECT * FROM users
ORDER BY last_login DESC
LIMIT 10;
-- Extra: Using filesort
-- But uses priority queue (memory-efficient for small LIMIT)

-- Much worse without LIMIT
EXPLAIN SELECT * FROM users
ORDER BY last_login DESC;
-- Extra: Using filesort
-- Must sort ALL rows

-- Example 10: Optimizer trace showing filesort decision
SET optimizer_trace='enabled=on';

SELECT user_id, username FROM users
WHERE status = 'active'
ORDER BY last_login DESC
LIMIT 20;

SELECT TRACE->>'$.steps[*].filesort_information'
FROM information_schema.OPTIMIZER_TRACE\G

/*
Shows:
- sort_mode: packed or row_id mode
- sort_algorithm: quicksort, merge, priority queue
- Memory usage
- Whether filesort is needed
*/

SET optimizer_trace='enabled=off';

References:

↑ Back to top

Security

What is caching_sha2_password?

The 30-Second Answer:

caching_sha2_password is MySQL 8.0's default authentication plugin that uses SHA-256 hashing for password storage and verification. It improves security over the legacy mysql_native_password (SHA-1) while maintaining performance through server-side password caching. It requires either an encrypted connection (SSL/TLS) or RSA key-pair encryption for the initial password exchange, making it more secure but requiring proper client support.

The 2-Minute Answer (If They Want More):

caching_sha2_password was introduced in MySQL 8.0 as the default authentication plugin to address security limitations of mysql_native_password:

Security Improvements:

SHA-256 Hashing: Uses SHA-256 instead of SHA-1 (which has known vulnerabilities)
Salt Mechanism: Includes per-user salt values to prevent rainbow table attacks
Secure Password Exchange: Requires encrypted connection or RSA encryption for password transmission
FIPS Compliance: Meets federal security standards for cryptographic modules

How It Works:

Initial Authentication:

Client connects to server
Server sends authentication challenge with salt
Client must send password either:
- Over encrypted SSL/TLS connection, OR
- Encrypted with server's RSA public key
Server verifies password and caches the result

Subsequent Authentications:

If user is in cache and connects from same source, authentication is fast
Cache persists until server restart
Combines security with performance

Performance vs. Security:

Slightly slower than mysql_native_password on first connection
Comparable performance on subsequent connections due to caching
Much more secure cryptographically

Client Requirements:

Client must support caching_sha2_password plugin
MySQL 8.0+ clients, MySQL Connector/J 8.0+, etc.
Older clients may need to use mysql_native_password compatibility

Migration Considerations: When upgrading from MySQL 5.7 to 8.0:

Existing users keep their authentication plugin
New users get caching_sha2_password by default
Applications using old connectors may need updates
Can set default_authentication_plugin for compatibility

Code Example:

-- Check current default authentication plugin
SHOW VARIABLES LIKE 'default_authentication_plugin';

-- Create user with caching_sha2_password (default in MySQL 8.0)
CREATE USER 'secure_user'@'localhost'
IDENTIFIED WITH caching_sha2_password
BY 'StrongPassword123!';

-- Create user with legacy plugin for compatibility
CREATE USER 'legacy_app'@'localhost'
IDENTIFIED WITH mysql_native_password
BY 'Password123!';

-- Check user's authentication plugin
SELECT user, host, plugin
FROM mysql.user
WHERE user = 'secure_user';

-- Convert existing user to caching_sha2_password
ALTER USER 'existing_user'@'localhost'
IDENTIFIED WITH caching_sha2_password
BY 'NewSecurePassword123!';

-- Set default plugin for backward compatibility (my.cnf/my.ini)
-- [mysqld]
-- default_authentication_plugin=mysql_native_password

-- Check if SSL/TLS is available for secure password exchange
SHOW VARIABLES LIKE 'have_ssl';

-- Require SSL for specific user
ALTER USER 'secure_user'@'localhost' REQUIRE SSL;

-- Get RSA public key for client-side encryption (if not using SSL)
-- From MySQL client:
-- mysql> GET_DIAGNOSTICS CONDITION 1 @p1 = RSA_PUBLIC_KEY;

-- Clear authentication cache (restart required)
-- FLUSH PRIVILEGES; -- doesn't clear caching_sha2_password cache
-- Service restart required to clear cache

-- Monitor authentication plugin usage
SELECT plugin, COUNT(*) as user_count
FROM mysql.user
GROUP BY plugin;

-- Create user with require SSL and modern auth
CREATE USER 'api_user'@'%'
IDENTIFIED WITH caching_sha2_password BY 'ApiKey123!'
REQUIRE SSL;

-- Connection example with SSL (from command line)
-- mysql -u secure_user -p \
--   --ssl-mode=REQUIRED \
--   --ssl-ca=/path/to/ca.pem

-- Check connection encryption status
SHOW STATUS LIKE 'Ssl_cipher';

-- View current user's authentication plugin
SELECT CURRENT_USER(),
       plugin
FROM mysql.user
WHERE user = SUBSTRING_INDEX(CURRENT_USER(), '@', 1)
  AND host = SUBSTRING_INDEX(CURRENT_USER(), '@', -1);

-- Programmatic client example (Python with mysql-connector-python)
-- import mysql.connector
--
-- config = {
--     'user': 'secure_user',
--     'password': 'StrongPassword123!',
--     'host': 'localhost',
--     'database': 'mydb',
--     'ssl_disabled': False,  # Enable SSL
--     'auth_plugin': 'caching_sha2_password'
-- }
--
-- connection = mysql.connector.connect(**config)

References:

↑ Back to top

MySQL Architecture

What is the data dictionary in MySQL 8.0?

The 30-Second Answer:

The data dictionary in MySQL 8.0 is a transactional, centralized metadata repository stored in InnoDB tables that replaced the old file-based system (.frm, .par, .opt files). It stores information about database objects (tables, columns, indexes, foreign keys) in a consistent, crash-safe format. This change enables atomic DDL operations, better performance, simplified architecture, and makes MySQL more reliable and easier to manage.

The 2-Minute Answer (If They Want More):

MySQL 8.0 introduced a fundamental architectural change by moving metadata from the file system to InnoDB:

Key Improvements Over Legacy System:

1. Atomic DDL Operations:

DDL operations (CREATE, ALTER, DROP) are now atomic and crash-safe
If a DDL operation fails midway, changes are rolled back completely
No more orphaned files or inconsistent metadata
Example: Dropping a partitioned table is now a single atomic operation instead of individual file deletions

2. Centralized Storage:

All metadata stored in InnoDB tables in the mysql schema
Tables like mysql.tables, mysql.columns, mysql.indexes, mysql.foreign_keys
Eliminates .frm (table format), .par (partition), and db.opt (database options) files
Metadata is transactional and benefits from InnoDB's ACID properties

3. Improved Performance:

Faster metadata lookups using InnoDB indexes instead of file system operations
Better concurrency with InnoDB's MVCC
Reduced I/O for information_schema queries
Cached in InnoDB buffer pool like regular data

4. Better Consistency:

Single source of truth for metadata
No synchronization issues between server cache and files
Consistent across replication topology

5. Enhanced INFORMATION_SCHEMA:

INFORMATION_SCHEMA views now query data dictionary tables directly
Many views implemented as actual views on data dictionary tables
Faster and more efficient than old implementation
Some new tables like INFORMATION_SCHEMA.ST_GEOMETRY_COLUMNS

System Tables:

Data dictionary tables are stored in the mysql schema with dd_ prefix
These tables are hidden and not directly accessible
Access metadata through INFORMATION_SCHEMA or SHOW commands
Protected by MySQL internal access control

Migration from MySQL 5.7:

MySQL 8.0 upgrade process automatically migrates from .frm files
mysql_upgrade converts file-based metadata to data dictionary
Old file formats are no longer created or used

Code Example:

-- View data dictionary tables (indirectly through INFORMATION_SCHEMA)
SELECT TABLE_NAME, ENGINE, TABLE_ROWS, DATA_LENGTH
FROM INFORMATION_SCHEMA.TABLES
WHERE TABLE_SCHEMA = 'mysql' AND TABLE_NAME LIKE '%tables%';

-- Data dictionary enables atomic DDL
START TRANSACTION;
CREATE TABLE test_atomic (
    id INT PRIMARY KEY,
    data VARCHAR(100)
);
-- If this fails, no orphaned files are left behind
ALTER TABLE test_atomic ADD COLUMN created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP;
COMMIT;

-- INFORMATION_SCHEMA now queries data dictionary tables efficiently
SELECT
    TABLE_SCHEMA,
    TABLE_NAME,
    COLUMN_NAME,
    DATA_TYPE,
    IS_NULLABLE
FROM INFORMATION_SCHEMA.COLUMNS
WHERE TABLE_SCHEMA = 'ecommerce'
    AND TABLE_NAME = 'orders';

-- View indexes stored in data dictionary
SELECT
    TABLE_NAME,
    INDEX_NAME,
    INDEX_TYPE,
    NON_UNIQUE,
    SEQ_IN_INDEX,
    COLUMN_NAME
FROM INFORMATION_SCHEMA.STATISTICS
WHERE TABLE_SCHEMA = 'ecommerce'
ORDER BY TABLE_NAME, INDEX_NAME, SEQ_IN_INDEX;

-- Check foreign key constraints from data dictionary
SELECT
    CONSTRAINT_NAME,
    TABLE_NAME,
    COLUMN_NAME,
    REFERENCED_TABLE_NAME,
    REFERENCED_COLUMN_NAME
FROM INFORMATION_SCHEMA.KEY_COLUMN_USAGE
WHERE TABLE_SCHEMA = 'ecommerce'
    AND REFERENCED_TABLE_NAME IS NOT NULL;

-- View table statistics from data dictionary
SELECT
    TABLE_SCHEMA,
    TABLE_NAME,
    TABLE_ROWS,
    AVG_ROW_LENGTH,
    DATA_LENGTH / 1024 / 1024 AS data_size_mb,
    INDEX_LENGTH / 1024 / 1024 AS index_size_mb,
    CREATE_TIME,
    UPDATE_TIME
FROM INFORMATION_SCHEMA.TABLES
WHERE TABLE_SCHEMA NOT IN ('mysql', 'information_schema', 'performance_schema', 'sys')
ORDER BY DATA_LENGTH DESC;

-- Atomic DDL: Drop database with all tables atomically
DROP DATABASE IF EXISTS old_project;
-- In MySQL 5.7, this could leave orphaned files if interrupted
-- In MySQL 8.0, it's completely atomic

-- View partitions information (now in data dictionary)
SELECT
    TABLE_NAME,
    PARTITION_NAME,
    PARTITION_METHOD,
    PARTITION_EXPRESSION,
    TABLE_ROWS
FROM INFORMATION_SCHEMA.PARTITIONS
WHERE TABLE_SCHEMA = 'analytics'
    AND PARTITION_NAME IS NOT NULL;

-- Check data dictionary version
SELECT * FROM mysql.dd_properties;

-- Verify no .frm files in data directory (MySQL 8.0)
-- Run from shell:
-- ls -la /var/lib/mysql/mydb/*.frm  # Should return no results

-- Performance comparison: Metadata queries are faster
SELECT COUNT(*) FROM INFORMATION_SCHEMA.TABLES;
-- This is much faster in MySQL 8.0 than 5.7

References:

↑ Back to top

Transactions and Locking

What is a phantom read and how does InnoDB prevent it?

The 30-Second Answer:

A phantom read occurs when a transaction re-executes a query and finds different rows than before due to another transaction's INSERT or DELETE. InnoDB prevents phantom reads at REPEATABLE READ level using next-key locks - a combination of row locks and gap locks that lock both existing rows and the gaps between them, preventing other transactions from inserting rows in the locked range.

The 2-Minute Answer (If They Want More):

What is a Phantom Read?

A phantom read happens when:

Transaction A executes a query with a WHERE clause
Transaction B inserts or deletes rows that match that WHERE clause
Transaction A re-executes the same query and sees different rows ("phantoms")

Example scenario:

T1: SELECT COUNT(*) FROM users WHERE age > 25;  -- Returns 10
T2: INSERT INTO users (name, age) VALUES ('Alice', 30);
T2: COMMIT;
T1: SELECT COUNT(*) FROM users WHERE age > 25;  -- Returns 11 (phantom!)

This violates the REPEATABLE READ guarantee that a transaction should see consistent data.

How InnoDB Prevents Phantoms:

InnoDB uses next-key locking - a combination of:

Record Locks - Lock individual index records
Gap Locks - Lock the space between index records
Next-Key Locks - Lock both a record and the gap before it

When you use a locking read (SELECT ... FOR UPDATE or SELECT ... FOR SHARE) with a range condition, InnoDB locks:

All matching rows
All gaps where new matching rows could be inserted

This prevents other transactions from inserting phantoms into the locked range.

Important notes:

Gap locking only occurs at REPEATABLE READ and SERIALIZABLE levels
Gap locks are only on non-unique indexes and range scans
Unique index equality searches only use record locks (no gap lock needed)
At READ COMMITTED level, gap locking is disabled (phantoms are possible)
MVCC provides phantom protection for regular SELECTs without locking

InnoDB's default REPEATABLE READ with MVCC means regular (non-locking) SELECTs don't see phantoms because they read from a consistent snapshot. Next-key locks prevent phantoms for locking reads and write operations.

Code Example:

-- Create a test table
CREATE TABLE products (
    id INT PRIMARY KEY AUTO_INCREMENT,
    category VARCHAR(50),
    price DECIMAL(10,2),
    INDEX idx_category (category)
) ENGINE=InnoDB;

INSERT INTO products (category, price) VALUES
('Electronics', 100),
('Electronics', 300),
('Electronics', 500),
('Books', 20),
('Books', 35);

-- Demonstrate phantom read prevention
-- Session 1: Start transaction with locking read
START TRANSACTION;
SELECT * FROM products
WHERE category = 'Electronics'
FOR UPDATE;
-- This locks all 'Electronics' rows AND the gaps around them

-- Session 2: Try to insert into locked range
INSERT INTO products (category, price)
VALUES ('Electronics', 200);
-- This will BLOCK because it tries to insert into a locked gap

-- Session 1: The same query still returns same results (no phantoms)
SELECT * FROM products WHERE category = 'Electronics' FOR UPDATE;
COMMIT;

-- Now Session 2's INSERT completes

-- Example with range query
-- Session 1:
START TRANSACTION;
SELECT * FROM products
WHERE price BETWEEN 100 AND 400
FOR UPDATE;
-- Locks rows with price 100, 300 and gaps between them

-- Session 2:
INSERT INTO products (category, price) VALUES ('Electronics', 250);
-- BLOCKS - trying to insert into locked gap

-- View locks in information schema
SELECT
    lock_type,
    lock_mode,
    lock_status,
    lock_data
FROM performance_schema.data_locks
WHERE object_name = 'products';

-- REPEATABLE READ with non-locking read (MVCC prevents phantoms)
-- Session 1:
SET SESSION TRANSACTION ISOLATION LEVEL REPEATABLE READ;
START TRANSACTION;
SELECT COUNT(*) FROM products WHERE category = 'Electronics';  -- Returns 3

-- Session 2:
INSERT INTO products (category, price) VALUES ('Electronics', 450);
COMMIT;

-- Session 1:
SELECT COUNT(*) FROM products WHERE category = 'Electronics';  -- Still returns 3 (no phantom)
COMMIT;

-- Now a new transaction sees the inserted row
SELECT COUNT(*) FROM products WHERE category = 'Electronics';  -- Returns 4

-- READ COMMITTED allows phantoms
-- Session 1:
SET SESSION TRANSACTION ISOLATION LEVEL READ COMMITTED;
START TRANSACTION;
SELECT COUNT(*) FROM products WHERE category = 'Books';  -- Returns 2

-- Session 2:
INSERT INTO products (category, price) VALUES ('Books', 45);
COMMIT;

-- Session 1:
SELECT COUNT(*) FROM products WHERE category = 'Books';  -- Returns 3 (phantom read!)
COMMIT;

References:

↑ Back to top

Partitioning

What is table partitioning in MySQL?

The 30-Second Answer:

Table partitioning is a database design technique that divides a large table into smaller, more manageable pieces called partitions, while logically maintaining it as a single table. Each partition can be stored, indexed, and managed independently, improving query performance and maintenance operations. MySQL supports partitioning at the storage engine level, allowing you to distribute data across multiple physical files based on defined rules.

The 2-Minute Answer (If They Want More):

Table partitioning in MySQL is a method of decomposing large tables into smaller physical segments (partitions) that are transparent to applications. The table remains a single logical entity for queries, but internally MySQL can optimize operations by accessing only relevant partitions.

Key benefits include:

Performance Improvement: Queries that access a subset of data can use partition pruning to scan only relevant partitions instead of the entire table
Easier Maintenance: You can perform maintenance operations (backup, restore, rebuild indexes) on individual partitions
Bulk Data Management: Efficiently add or remove large amounts of data by adding/dropping partitions
Improved Archival: Old data can be archived by simply dropping or archiving specific partitions

Partitioning works by using a partitioning function on one or more columns (the partitioning key) to determine which partition stores each row. Common use cases include time-series data (partitioned by date), geographical data (partitioned by region), or any large dataset that has a natural division criterion.

MySQL evaluates the partitioning expression for each row and routes it to the appropriate partition. When querying, if the WHERE clause includes the partitioning key, MySQL can eliminate irrelevant partitions from the search (partition pruning), dramatically reducing I/O and improving performance.

Code Example:

-- Create a partitioned table by RANGE (common for time-series data)
CREATE TABLE sales (
    id INT NOT NULL AUTO_INCREMENT,
    sale_date DATE NOT NULL,
    amount DECIMAL(10, 2),
    region VARCHAR(50),
    PRIMARY KEY (id, sale_date)
) PARTITION BY RANGE (YEAR(sale_date)) (
    PARTITION p2020 VALUES LESS THAN (2021),
    PARTITION p2021 VALUES LESS THAN (2022),
    PARTITION p2022 VALUES LESS THAN (2023),
    PARTITION p2023 VALUES LESS THAN (2024),
    PARTITION p_future VALUES LESS THAN MAXVALUE
);

-- View partition information
SELECT
    PARTITION_NAME,
    PARTITION_EXPRESSION,
    TABLE_ROWS,
    DATA_LENGTH
FROM INFORMATION_SCHEMA.PARTITIONS
WHERE TABLE_NAME = 'sales';

-- Add a new partition for 2025
ALTER TABLE sales
REORGANIZE PARTITION p_future INTO (
    PARTITION p2024 VALUES LESS THAN (2025),
    PARTITION p_future VALUES LESS THAN MAXVALUE
);

-- Drop old data efficiently by removing a partition
ALTER TABLE sales DROP PARTITION p2020;

-- Query with partition pruning (only scans p2023 partition)
EXPLAIN PARTITIONS
SELECT * FROM sales
WHERE sale_date BETWEEN '2023-01-01' AND '2023-12-31';

-- Check which partitions are accessed
SELECT * FROM sales
WHERE sale_date = '2023-06-15'
-- MySQL will only scan the p2023 partition

References:

↑ Back to top

Want more questions?

You've seen 15 sample questions. Unlock all 53 En interview questions with detailed explanations, code examples, and expert insights.

53+ questions

Code examples

Expert explanations

Instant access

Unlock Full Access

Item added to your cart

MySQL Interview Questions (Free Preview)

Free sample of 15 from 53 questions available

Backup and Recovery

What is the difference between logical and physical backups?

What is Percona XtraBackup and how does it work?

Storage Engines

What is the difference between InnoDB and MyISAM?

InnoDB Internals

What is the InnoDB redo log and how does it work?

Replication

What is the difference between asynchronous and semi-synchronous replication?

What is the binary log and how does it enable replication?

High Availability

What is the difference between ProxySQL and MySQL Router?

Performance Tuning

What is Performance Schema?

Indexing

What is a clustered index in InnoDB?

What is a secondary index and how does it differ from a clustered index?

Query Optimization

What is filesort and how do you avoid it?

Security

What is caching_sha2_password?

MySQL Architecture

What is the data dictionary in MySQL 8.0?

Transactions and Locking

What is a phantom read and how does InnoDB prevent it?

Partitioning

What is table partitioning in MySQL?

Want more questions?

Explore all question sets

Country/region

MySQL Interview Questions (Free Preview)

Free sample of 15 from 53 questions available

Backup and Recovery

What is the difference between logical and physical backups?

What is Percona XtraBackup and how does it work?

Storage Engines

What is the difference between InnoDB and MyISAM?

InnoDB Internals

What is the InnoDB redo log and how does it work?

Replication

What is the difference between asynchronous and semi-synchronous replication?

What is the binary log and how does it enable replication?

High Availability

What is the difference between ProxySQL and MySQL Router?

Performance Tuning

What is Performance Schema?

Indexing

What is a clustered index in InnoDB?

What is a secondary index and how does it differ from a clustered index?

Query Optimization

What is filesort and how do you avoid it?

Security

What is caching_sha2_password?

MySQL Architecture

What is the data dictionary in MySQL 8.0?

Transactions and Locking

What is a phantom read and how does InnoDB prevent it?

Partitioning

What is table partitioning in MySQL?

Want more questions?

Related Interview Questions

Explore all question sets