MySQL Interview Questions (Free Preview)
Free sample of 15 from 53 questions available
Backup and Recovery
What is the difference between logical and physical backups?
The 30-Second Answer:
Logical backups export database structure and data as SQL statements (using tools like mysqldump), making them portable across platforms and MySQL versions but slower to restore. Physical backups copy the actual database files and directories (using tools like Percona XtraBackup or MySQL Enterprise Backup), offering faster backup and restore but requiring the same MySQL version and storage engine. Logical backups are ideal for smaller databases and migrations; physical backups are preferred for large production systems requiring minimal downtime.
The 2-Minute Answer (If They Want More):
Logical Backups create a representation of your database by extracting data and converting it into SQL statements. Tools like mysqldump generate CREATE TABLE and INSERT statements that can recreate your database. Advantages include:
- Portability: Works across different MySQL versions, operating systems, and even other databases
- Flexibility: Easy to edit, version control, and selectively restore specific tables or databases
- Compression: Text-based output compresses well
- Inspection: Human-readable format for verification
Disadvantages:
- Slower: Must execute all INSERT statements during restore
- Table locking: Can impact production during backup
- Larger size: SQL statements are verbose before compression
Physical Backups copy the raw database files from disk, including data files, logs, and configuration. Tools like Percona XtraBackup, MySQL Enterprise Backup, or simple file system snapshots fall into this category. Advantages include:
- Speed: Direct file copy is much faster for large databases
- Minimal downtime: Hot backup tools can backup InnoDB without locking
- Complete state: Captures exact database state including indexes
- Point-in-time recovery: Combined with binary logs enables precise recovery
Disadvantages:
- Platform dependency: Requires same MySQL version and architecture
- Less portable: Can't easily migrate between different MySQL configurations
- Storage engine specific: Some tools work only with specific engines (e.g., XtraBackup with InnoDB)
- Binary format: Not human-readable, harder to verify
Best Practices: Use logical backups for development, testing, and smaller databases (< 100GB). Use physical backups for production systems, large databases, and scenarios requiring quick recovery. Implement both for comprehensive disaster recovery: physical for quick restoration, logical for long-term archival and migration flexibility.
Code Example:
-- LOGICAL BACKUP EXAMPLE
-- Backup entire database with mysqldump
mysqldump -u root -p \
--single-transaction \
--routines \
--triggers \
--events \
my_database > my_database_backup.sql
-- Backup specific tables
mysqldump -u root -p my_database users orders > tables_backup.sql
-- Backup all databases
mysqldump -u root -p --all-databases > all_databases.sql
-- Restore from logical backup
mysql -u root -p my_database < my_database_backup.sql
-- PHYSICAL BACKUP EXAMPLE (using filesystem copy - requires shutdown)
-- Stop MySQL server
sudo systemctl stop mysql
-- Copy data directory
sudo cp -r /var/lib/mysql /backup/mysql_$(date +%Y%m%d)
-- Start MySQL server
sudo systemctl start mysql
-- PHYSICAL BACKUP (using Percona XtraBackup - hot backup)
-- Full backup without stopping MySQL
xtrabackup --backup \
--target-dir=/backup/full_backup \
--user=root \
--password=yourpassword
-- Prepare backup for restore
xtrabackup --prepare --target-dir=/backup/full_backup
-- Restore (requires MySQL to be stopped)
sudo systemctl stop mysql
sudo rm -rf /var/lib/mysql/*
xtrabackup --copy-back --target-dir=/backup/full_backup
sudo chown -R mysql:mysql /var/lib/mysql
sudo systemctl start mysql
-- HYBRID APPROACH: Logical backup with compression
mysqldump -u root -p --single-transaction my_database | gzip > backup.sql.gz
-- Restore compressed logical backup
gunzip < backup.sql.gz | mysql -u root -p my_database
References:
↑ Back to topWhat is Percona XtraBackup and how does it work?
The 30-Second Answer:
Percona XtraBackup is an open-source hot backup tool for MySQL that creates physical backups of InnoDB, XtraDB, and MyISAM tables without blocking database operations. It works by copying InnoDB data files while recording concurrent transactions in a redo log, then applies those changes during the "prepare" phase to create a consistent backup. Key advantages include zero downtime for InnoDB backups, faster backup and restore compared to mysqldump, incremental backup support, and compression capabilities. It's the preferred solution for backing up large MySQL databases in production.
The 2-Minute Answer (If They Want More):
Percona XtraBackup creates physical backups by copying actual database files rather than exporting SQL. The process works in several phases:
Backup Phase:
- Copy Data Files: XtraBackup begins copying InnoDB data files (.ibd) to the backup directory while the database remains online
- Monitor Changes: Simultaneously, it monitors the InnoDB redo log (transaction log) and copies all new entries to a file called
xtrabackup_logfile - Lock for Non-InnoDB: Brief lock on non-InnoDB tables (MyISAM) to ensure consistency
- Record LSN: Records the Log Sequence Number (LSN) representing the backup's point-in-time
Prepare Phase: After backup completes, you must "prepare" it before restoration:
- Apply Redo Log: XtraBackup applies committed transactions from
xtrabackup_logfileto the data files - Rollback Uncommitted: Rolls back any uncommitted transactions
- Result: Creates a consistent point-in-time snapshot ready for restore
Key Features:
- Hot Backup: InnoDB/XtraDB tables backed up without locking or downtime
- Incremental Backups: Only copies changed pages since last backup, saving time and space
- Compression: Built-in compression reduces backup size and transfer time
- Partial Backups: Backup specific databases or tables
- Streaming: Direct backup to remote server without local storage
- Encryption: Encrypt backups for security
- Parallel Processing: Multi-threaded for faster operations
Advantages over mysqldump:
- 10-100x faster for large databases (TBs)
- No table locks for InnoDB (zero downtime)
- Binary format smaller than SQL dumps
- Faster restore (file copy vs. SQL execution)
- Incremental backup capability
Limitations:
- Requires same MySQL version for restore (or compatible)
- More complex than mysqldump
- Binary format not portable across platforms
- Requires adequate disk space for data files plus redo log changes
Best Use Cases:
- Production databases > 100GB
- Databases requiring minimal downtime
- Systems needing fast recovery time objectives (RTO)
- Environments with regular incremental backup needs
- Disaster recovery scenarios requiring point-in-time recovery
Code Example:
-- INSTALLATION
-- Ubuntu/Debian
wget https://repo.percona.com/apt/percona-release_latest.$(lsb_release -sc)_all.deb
sudo dpkg -i percona-release_latest.$(lsb_release -sc)_all.deb
sudo apt-get update
sudo apt-get install percona-xtrabackup-80
-- RedHat/CentOS
sudo yum install https://repo.percona.com/yum/percona-release-latest.noarch.rpm
sudo yum install percona-xtrabackup-80
-- BASIC FULL BACKUP
-- Create backup directory
mkdir -p /backup/full
-- Perform full backup
xtrabackup --backup \
--target-dir=/backup/full \
--user=root \
--password=yourpassword
-- Alternative: using MySQL config file
xtrabackup --backup \
--target-dir=/backup/full \
--defaults-file=/etc/mysql/my.cnf
-- PREPARE BACKUP (required before restore)
xtrabackup --prepare --target-dir=/backup/full
-- RESTORE BACKUP
-- Stop MySQL
sudo systemctl stop mysql
-- Clear datadir (DANGEROUS - ensure you have backup!)
sudo rm -rf /var/lib/mysql/*
-- Copy backup to datadir
xtrabackup --copy-back --target-dir=/backup/full
-- Fix permissions
sudo chown -R mysql:mysql /var/lib/mysql
-- Start MySQL
sudo systemctl start mysql
-- COMPRESSED BACKUP
-- Compress during backup (saves space)
xtrabackup --backup \
--compress \
--compress-threads=4 \
--target-dir=/backup/compressed \
--user=root \
--password=yourpassword
-- Decompress before prepare
xtrabackup --decompress --target-dir=/backup/compressed
-- Remove compressed files
find /backup/compressed -name "*.qp" -delete
-- Then prepare
xtrabackup --prepare --target-dir=/backup/compressed
-- INCREMENTAL BACKUP
-- First, create full backup (base)
xtrabackup --backup \
--target-dir=/backup/base \
--user=root \
--password=yourpassword
-- Create first incremental backup
xtrabackup --backup \
--target-dir=/backup/inc1 \
--incremental-basedir=/backup/base \
--user=root \
--password=yourpassword
-- Create second incremental backup (based on inc1)
xtrabackup --backup \
--target-dir=/backup/inc2 \
--incremental-basedir=/backup/inc1 \
--user=root \
--password=yourpassword
-- PREPARE INCREMENTAL BACKUPS
-- Prepare base backup (with --apply-log-only to keep redo logs)
xtrabackup --prepare --apply-log-only --target-dir=/backup/base
-- Apply first incremental to base
xtrabackup --prepare --apply-log-only \
--target-dir=/backup/base \
--incremental-dir=/backup/inc1
-- Apply second incremental to base (no --apply-log-only on last)
xtrabackup --prepare \
--target-dir=/backup/base \
--incremental-dir=/backup/inc2
-- Final prepare
xtrabackup --prepare --target-dir=/backup/base
-- Now restore from /backup/base as shown above
-- STREAMING BACKUP TO REMOTE SERVER
-- Stream backup via SSH
xtrabackup --backup --stream=xbstream --user=root --password=yourpassword | \
ssh user@remotehost "xbstream -x -C /backup/remote"
-- Stream and compress
xtrabackup --backup --stream=xbstream --compress --user=root | \
ssh user@remotehost "xbstream -x -C /backup/remote"
-- ENCRYPTED BACKUP
-- Generate encryption key
openssl rand -base64 24 > /root/.xtrabackup_encrypt_key
-- Backup with encryption
xtrabackup --backup \
--encrypt=AES256 \
--encrypt-key-file=/root/.xtrabackup_encrypt_key \
--target-dir=/backup/encrypted \
--user=root \
--password=yourpassword
-- Decrypt before prepare
xtrabackup --decrypt=AES256 \
--encrypt-key-file=/root/.xtrabackup_encrypt_key \
--target-dir=/backup/encrypted
-- Remove encrypted files
find /backup/encrypted -name "*.xbcrypt" -delete
-- PARTIAL BACKUP (specific databases)
xtrabackup --backup \
--databases="database1 database2" \
--target-dir=/backup/partial \
--user=root \
--password=yourpassword
-- PARALLEL PROCESSING (faster for large databases)
xtrabackup --backup \
--parallel=4 \
--target-dir=/backup/parallel \
--user=root \
--password=yourpassword
-- THROTTLING (limit I/O impact)
-- Limit to 10MB/s to reduce impact on production
xtrabackup --backup \
--throttle=10 \
--target-dir=/backup/throttled \
--user=root \
--password=yourpassword
-- VERIFY BACKUP
-- Check backup integrity
xtrabackup --backup --target-dir=/backup/verify --user=root --password=yourpassword
-- Prepare and check for errors
xtrabackup --prepare --target-dir=/backup/verify
-- If no errors, backup is valid
-- AUTOMATED BACKUP SCRIPT
#!/bin/bash
# Percona XtraBackup Automation Script
BACKUP_DIR="/backup/mysql"
FULL_BACKUP_DIR="$BACKUP_DIR/full"
INC_BACKUP_DIR="$BACKUP_DIR/incremental"
DATE=$(date +%Y%m%d_%H%M%S)
MYSQL_USER="root"
MYSQL_PASSWORD="yourpassword"
RETENTION_DAYS=7
# Create backup directories
mkdir -p $FULL_BACKUP_DIR
mkdir -p $INC_BACKUP_DIR
# Determine if we need full or incremental backup
# Full backup on Sunday, incremental rest of week
if [ $(date +%u) -eq 7 ]; then
# Full backup
echo "Starting full backup..."
xtrabackup --backup \
--target-dir=$FULL_BACKUP_DIR/$DATE \
--user=$MYSQL_USER \
--password=$MYSQL_PASSWORD \
--compress \
--compress-threads=4
if [ $? -eq 0 ]; then
echo "Full backup completed: $DATE"
# Mark as latest full backup
echo $DATE > $BACKUP_DIR/latest_full
else
echo "ERROR: Full backup failed"
exit 1
fi
else
# Incremental backup
LATEST_FULL=$(cat $BACKUP_DIR/latest_full)
echo "Starting incremental backup based on $LATEST_FULL..."
xtrabackup --backup \
--target-dir=$INC_BACKUP_DIR/$DATE \
--incremental-basedir=$FULL_BACKUP_DIR/$LATEST_FULL \
--user=$MYSQL_USER \
--password=$MYSQL_PASSWORD \
--compress \
--compress-threads=4
if [ $? -eq 0 ]; then
echo "Incremental backup completed: $DATE"
else
echo "ERROR: Incremental backup failed"
exit 1
fi
fi
# Cleanup old backups
find $FULL_BACKUP_DIR -type d -mtime +$RETENTION_DAYS -exec rm -rf {} +
find $INC_BACKUP_DIR -type d -mtime +$RETENTION_DAYS -exec rm -rf {} +
-- MONITORING BACKUP PROGRESS
-- In another terminal, monitor backup size
watch -n 1 'du -sh /backup/full'
-- Monitor XtraBackup log output
tail -f /backup/full/xtrabackup.log
-- Check LSN (Log Sequence Number) in backup
cat /backup/full/xtrabackup_checkpoints
-- OUTPUT:
-- backup_type = full-backuped
-- from_lsn = 0
-- to_lsn = 2456789
-- last_lsn = 2456789
References:
↑ Back to topStorage Engines
What is the difference between InnoDB and MyISAM?
The 30-Second Answer:
InnoDB and MyISAM are MySQL storage engines with fundamentally different architectures. InnoDB is transaction-safe with ACID compliance, row-level locking, crash recovery, and foreign key support - ideal for applications requiring data integrity. MyISAM is simpler and faster for read-heavy workloads but lacks transactions, uses table-level locking (causing concurrency issues), and has no crash recovery. InnoDB is the default since MySQL 5.5 and recommended for most applications.
The 2-Minute Answer (If They Want More):
InnoDB and MyISAM represent two different philosophies in database storage engine design:
InnoDB is a robust, transaction-safe storage engine designed for reliability and data integrity. It implements ACID properties (Atomicity, Consistency, Isolation, Durability), making it suitable for mission-critical applications. InnoDB uses row-level locking, which allows multiple transactions to modify different rows simultaneously, providing excellent concurrency for write-heavy workloads. It supports foreign key constraints for referential integrity, automatic crash recovery through the doublewrite buffer and redo logs, and MVCC (Multi-Version Concurrency Control) for non-blocking reads. InnoDB stores data in clustered indexes organized by primary key, which optimizes primary key lookups.
MyISAM is a simpler, older storage engine optimized for read-heavy workloads with minimal writes. It uses table-level locking, meaning the entire table is locked during write operations, which can cause significant performance bottlenecks in concurrent environments. MyISAM doesn't support transactions, foreign keys, or automatic crash recovery - a server crash can corrupt tables requiring manual repair with myisamchk. However, MyISAM has advantages in specific scenarios: smaller disk footprint, faster for full-table scans and COUNT(*) operations (maintains row count), and supports full-text indexing (though InnoDB added this in MySQL 5.6.4).
Key Differences Summary:
| Feature | InnoDB | MyISAM |
|---|---|---|
| Transactions | Yes (ACID) | No |
| Locking | Row-level | Table-level |
| Foreign Keys | Yes | No |
| Crash Recovery | Automatic | Manual repair needed |
| Concurrency | Excellent | Poor for writes |
| Storage | More disk space | Less disk space |
| Use Case | OLTP, general purpose | Read-heavy, logging |
Code Example:
-- Check storage engine of existing tables
SELECT TABLE_NAME, ENGINE
FROM information_schema.TABLES
WHERE TABLE_SCHEMA = 'your_database';
-- Create table with InnoDB (default since MySQL 5.5)
CREATE TABLE users (
id INT PRIMARY KEY AUTO_INCREMENT,
username VARCHAR(50) NOT NULL,
email VARCHAR(100) NOT NULL,
created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
) ENGINE=InnoDB;
-- Create table with MyISAM
CREATE TABLE access_logs (
id BIGINT PRIMARY KEY AUTO_INCREMENT,
user_id INT,
action VARCHAR(100),
log_time TIMESTAMP DEFAULT CURRENT_TIMESTAMP
) ENGINE=MyISAM;
-- Demonstrate InnoDB transaction support
START TRANSACTION;
INSERT INTO users (username, email) VALUES ('john_doe', 'john@example.com');
INSERT INTO users (username, email) VALUES ('jane_smith', 'jane@example.com');
COMMIT; -- Both inserts succeed or both fail
-- MyISAM doesn't support transactions
-- Each INSERT is immediately committed, no rollback possible
-- Convert MyISAM table to InnoDB
ALTER TABLE access_logs ENGINE=InnoDB;
-- Check InnoDB status and performance metrics
SHOW ENGINE InnoDB STATUS;
-- Compare row count performance
-- MyISAM: Fast (maintains count)
SELECT COUNT(*) FROM myisam_table; -- Instant
-- InnoDB: Slower (scans index)
SELECT COUNT(*) FROM innodb_table; -- Scans clustered index
-- Demonstrate row-level locking (InnoDB)
-- Session 1:
START TRANSACTION;
UPDATE users SET email = 'newemail@example.com' WHERE id = 1;
-- (don't commit yet)
-- Session 2 (can update different row concurrently):
UPDATE users SET email = 'another@example.com' WHERE id = 2; -- Works!
-- With MyISAM, Session 2 would wait for table lock
References:
- MySQL InnoDB Storage Engine
- MySQL MyISAM Storage Engine
- Alternative Storage Engines
- Converting Tables from MyISAM to InnoDB
InnoDB Internals
What is the InnoDB redo log and how does it work?
The 30-Second Answer:
The redo log (also called transaction log) is a circular write-ahead log that records all changes made to InnoDB tables. When a transaction modifies data, changes are first written to the redo log (sequential writes, very fast), then applied to the buffer pool. This enables crash recovery - if MySQL crashes, it replays the redo log on restart to recover committed transactions. The redo log uses a circular buffer with configurable size (innodb_redo_log_capacity in MySQL 8.0.30+).
The 2-Minute Answer (If They Want More):
The InnoDB redo log is critical for ACID durability and crash recovery:
How It Works:
- Write-Ahead Logging (WAL): Before data pages are modified in the buffer pool and flushed to disk, the changes are written to the redo log
- Fast Sequential Writes: Redo log writes are sequential, making them much faster than random disk writes
- Circular Buffer: The redo log wraps around when full (older committed transactions are overwritten)
- Checkpoint Process: InnoDB periodically flushes dirty pages to disk and advances the checkpoint, freeing redo log space
- Crash Recovery: On restart after a crash, InnoDB reads the redo log and reapplies all committed transactions since the last checkpoint
Redo Log Components:
- Log Buffer (
innodb_log_buffer_size): In-memory buffer for redo log entries - Redo Log Files: Physical files on disk (ib_logfile0, ib_logfile1, or #ib_redo files in MySQL 8.0.30+)
- Log Sequence Number (LSN): Monotonically increasing counter tracking log position
Flush Behavior (controlled by innodb_flush_log_at_trx_commit):
- 0: Write to log buffer only, flush every second (fastest, least durable)
- 1: Write and flush to disk on every commit (slowest, fully durable - default)
- 2: Write to OS cache on commit, flush every second (middle ground)
Performance Trade-offs:
- Larger redo log = fewer checkpoints, better write performance, longer recovery time
- Smaller redo log = more frequent checkpoints, slower writes, faster recovery
- Modern SSDs benefit from larger redo logs (8GB+ recommended)
Code Example:
-- View current redo log configuration (MySQL 8.0.30+)
SELECT
@@innodb_redo_log_capacity / 1024 / 1024 / 1024 as redo_log_gb;
-- For older MySQL versions (before 8.0.30)
SELECT
@@innodb_log_file_size / 1024 / 1024 as log_file_size_mb,
@@innodb_log_files_in_group as num_log_files,
(@@innodb_log_file_size * @@innodb_log_files_in_group) / 1024 / 1024 as total_redo_log_mb;
-- Configure redo log size (MySQL 8.0.30+ - dynamic)
SET GLOBAL innodb_redo_log_capacity = 8589934592; -- 8GB
-- For older versions (requires restart)
-- Add to my.cnf:
-- innodb_log_file_size = 2G
-- innodb_log_files_in_group = 2
-- View log buffer size
SELECT @@innodb_log_buffer_size / 1024 / 1024 as log_buffer_mb;
-- Configure log buffer (dynamic)
SET GLOBAL innodb_log_buffer_size = 67108864; -- 64MB
-- Check flush behavior
SELECT @@innodb_flush_log_at_trx_commit as flush_policy;
-- 1 = full durability (default)
-- 0 = flush every second (performance)
-- 2 = OS cache (compromise)
-- Set flush policy (dynamic)
SET GLOBAL innodb_flush_log_at_trx_commit = 1;
-- Monitor redo log usage
SELECT
NAME,
COUNT
FROM performance_schema.global_status
WHERE NAME LIKE 'Innodb_log%';
-- Key metrics to monitor
SELECT
VARIABLE_NAME,
VARIABLE_VALUE
FROM performance_schema.global_status
WHERE VARIABLE_NAME IN (
'Innodb_log_waits', -- Log buffer too small if > 0
'Innodb_log_writes', -- Number of log writes
'Innodb_log_write_requests', -- Number of log write requests
'Innodb_os_log_written' -- Bytes written to redo log
);
-- View redo log files (MySQL 8.0+)
SELECT
FILE_NAME,
FILE_TYPE,
TABLESPACE_NAME,
TOTAL_EXTENTS,
EXTENT_SIZE
FROM information_schema.FILES
WHERE FILE_TYPE = 'REDO LOG';
-- Monitor checkpoint age (how far behind checkpointing is)
SELECT
(SELECT VARIABLE_VALUE FROM performance_schema.global_status
WHERE VARIABLE_NAME = 'Innodb_lsn_current') -
(SELECT VARIABLE_VALUE FROM performance_schema.global_status
WHERE VARIABLE_NAME = 'Innodb_lsn_flushed')
AS checkpoint_age_bytes;
-- Check if redo log writes are causing waits
-- High Innodb_log_waits means log buffer is too small
SELECT
VARIABLE_VALUE as log_waits
FROM performance_schema.global_status
WHERE VARIABLE_NAME = 'Innodb_log_waits';
-- If > 0, increase innodb_log_buffer_size
-- Example: Simulate redo log activity
START TRANSACTION;
UPDATE users SET last_login = NOW() WHERE id = 1;
-- Change is written to redo log buffer
INSERT INTO audit_log (user_id, action, timestamp)
VALUES (1, 'login', NOW());
-- Another redo log entry
COMMIT;
-- With innodb_flush_log_at_trx_commit=1,
-- redo log is flushed to disk NOW
-- Configuration recommendations for different scenarios
-- High-performance scenario (SSD, acceptable 1-second data loss)
SET GLOBAL innodb_flush_log_at_trx_commit = 2;
SET GLOBAL innodb_redo_log_capacity = 17179869184; -- 16GB
-- Full durability scenario (financial data)
SET GLOBAL innodb_flush_log_at_trx_commit = 1;
SET GLOBAL innodb_redo_log_capacity = 8589934592; -- 8GB
-- Development/testing (maximum performance)
SET GLOBAL innodb_flush_log_at_trx_commit = 0;
SET GLOBAL innodb_redo_log_capacity = 4294967296; -- 4GB
References:
- InnoDB Redo Log
- InnoDB Startup Configuration
- innodb_flush_log_at_trx_commit
- Redo Log Capacity Configuration
Replication
What is the difference between asynchronous and semi-synchronous replication?
The 30-Second Answer:
Asynchronous replication is MySQL's default mode where the primary doesn't wait for replicas to acknowledge transactions - it commits immediately and replicas catch up independently. Semi-synchronous replication requires at least one replica to acknowledge receiving the transaction before the primary commits, providing better durability at the cost of slightly higher latency. Asynchronous is faster but risks data loss if the primary crashes; semi-synchronous ensures at least one replica has the data but can impact write performance.
The 2-Minute Answer (If They Want More):
Asynchronous Replication (Default)
In asynchronous replication, the primary server commits transactions and returns to the client immediately without waiting for any replica confirmation. Replicas read the binary log events at their own pace, which means:
- Advantages: Maximum write performance, no blocking on the primary, works well even with network latency or slow replicas
- Disadvantages: No guarantee that replicas have received transactions, potential data loss if primary crashes before replicas catch up, possible replica lag during high load
- Use case: Most production scenarios where performance is prioritized and some replication lag is acceptable
Semi-Synchronous Replication
Semi-synchronous replication adds an acknowledgment step. After committing a transaction, the primary waits for at least one replica to:
- Receive the binary log events
- Write them to its relay log
- Send an acknowledgment back to the primary
Only after receiving this acknowledgment does the primary return success to the client.
- Advantages: Better durability - at least one replica has the transaction, reduces data loss risk during primary failures, provides stronger consistency guarantees
- Disadvantages: Increased write latency (typically 1-10ms per transaction depending on network), can fall back to asynchronous if no replicas acknowledge within timeout, requires plugin installation
- Use case: High-availability scenarios where data durability is critical, financial systems, or when you need to minimize RPO (Recovery Point Objective)
Configuration Details:
Semi-synchronous replication requires the rpl_semi_sync_master and rpl_semi_sync_slave plugins. Key configuration parameters include:
rpl_semi_sync_master_timeout: How long to wait for acknowledgment before falling back to asynchronous (default 10 seconds)rpl_semi_sync_master_wait_for_slave_count: Number of replicas that must acknowledge (MySQL 5.7.3+)rpl_semi_sync_master_wait_point: When to wait - AFTER_SYNC (default, safer) or AFTER_COMMIT
The Fallback Mechanism:
If no replicas acknowledge within the timeout period, semi-synchronous replication automatically falls back to asynchronous mode to prevent blocking the primary. It automatically switches back to semi-synchronous when replicas reconnect and catch up.
Code Example:
-- Enable semi-synchronous replication on PRIMARY
-- First, install the plugin
INSTALL PLUGIN rpl_semi_sync_master SONAME 'semisync_master.so';
-- Enable semi-sync replication
SET GLOBAL rpl_semi_sync_master_enabled = 1;
-- Configure timeout (10 seconds)
SET GLOBAL rpl_semi_sync_master_timeout = 10000;
-- Configure wait point (AFTER_SYNC is safer - default in MySQL 5.7+)
SET GLOBAL rpl_semi_sync_master_wait_point = 'AFTER_SYNC';
-- Require acknowledgment from at least 1 replica (MySQL 5.7.3+)
SET GLOBAL rpl_semi_sync_master_wait_for_slave_count = 1;
-- Make settings persistent (add to my.cnf)
/*
[mysqld]
rpl_semi_sync_master_enabled=1
rpl_semi_sync_master_timeout=10000
rpl_semi_sync_master_wait_point=AFTER_SYNC
rpl_semi_sync_master_wait_for_slave_count=1
*/
-- Enable semi-synchronous replication on REPLICA
-- Install the plugin
INSTALL PLUGIN rpl_semi_sync_slave SONAME 'semisync_slave.so';
-- Enable semi-sync on replica
SET GLOBAL rpl_semi_sync_slave_enabled = 1;
-- Make persistent (add to my.cnf)
/*
[mysqld]
rpl_semi_sync_slave_enabled=1
*/
-- Restart replication IO thread to activate semi-sync
STOP SLAVE IO_THREAD;
START SLAVE IO_THREAD;
-- Monitor semi-synchronous replication status on PRIMARY
SHOW STATUS LIKE 'Rpl_semi_sync_master%';
-- Key metrics:
-- Rpl_semi_sync_master_status: ON/OFF
-- Rpl_semi_sync_master_clients: Number of semi-sync replicas
-- Rpl_semi_sync_master_yes_tx: Transactions acknowledged
-- Rpl_semi_sync_master_no_tx: Transactions not acknowledged (fell back to async)
-- Rpl_semi_sync_master_wait_sessions: Current waiting sessions
-- Rpl_semi_sync_master_tx_wait_time: Total wait time
-- Monitor on REPLICA
SHOW STATUS LIKE 'Rpl_semi_sync_slave%';
-- Key metric:
-- Rpl_semi_sync_slave_status: ON/OFF
-- Example monitoring query
SELECT
VARIABLE_NAME,
VARIABLE_VALUE
FROM performance_schema.global_status
WHERE VARIABLE_NAME LIKE 'Rpl_semi_sync_master%'
ORDER BY VARIABLE_NAME;
-- Calculate acknowledgment ratio
SELECT
(SELECT VARIABLE_VALUE
FROM performance_schema.global_status
WHERE VARIABLE_NAME = 'Rpl_semi_sync_master_yes_tx') /
(SELECT VARIABLE_VALUE
FROM performance_schema.global_status
WHERE VARIABLE_NAME = 'Rpl_semi_sync_master_yes_tx' +
SELECT VARIABLE_VALUE
FROM performance_schema.global_status
WHERE VARIABLE_NAME = 'Rpl_semi_sync_master_no_tx')
AS ack_ratio;
-- Testing semi-sync behavior
-- On primary, create a transaction and observe the wait
BEGIN;
INSERT INTO test_table (data) VALUES ('testing semi-sync');
COMMIT; -- This will wait for replica acknowledgment
-- If you stop all replicas, transactions will wait until timeout
-- then fall back to async mode
References:
- MySQL Replication
- Semisynchronous Replication
- Semisynchronous Replication Installation
- Semisynchronous Replication Configuration
What is the binary log and how does it enable replication?
The 30-Second Answer:
The binary log (binlog) is a set of files that record all changes to the database - DDL and DML statements that modify data. It contains "events" describing database modifications as SQL statements (SBR), row changes (RBR), or both (MIXED). The binary log enables replication by allowing replicas to read and replay these events to maintain synchronized copies of the data. It also enables point-in-time recovery and is essential for many HA (High Availability) setups.
The 2-Minute Answer (If They Want More):
Purpose of the Binary Log:
- Replication: Primary mechanism for data synchronization between primary and replica servers
- Point-in-Time Recovery: Restore database to a specific moment by replaying binary logs after a backup
- Auditing: Track all data changes for compliance and debugging
- Change Data Capture (CDC): Extract database changes for data pipelines (tools like Debezium)
Binary Log Structure:
- Binary Log Files: Numbered sequence files (
mysql-bin.000001,mysql-bin.000002, etc.) - Binary Log Index: Index file tracking all binary log files (
mysql-bin.index) - Events: Individual units of change recorded in the binary log
- Log Rotation: New file created when current reaches
max_binlog_sizeor onFLUSH LOGS
Types of Binary Log Events:
- Query Events: DDL statements and DML in statement-based format
- Row Events: Row changes in row-based format (Insert_rows, Update_rows, Delete_rows)
- GTID Events: Global Transaction Identifiers when GTID mode is enabled
- Format Description Events: Metadata about binary log format
- Rotate Events: Indicate switching to a new binary log file
- XID Events: Transaction commit markers for XA transactions
How Binary Log Enables Replication:
Primary Server:
- Executes client transactions
- Writes changes to binary log (binlog)
- Binary log is durable (survives crashes if
sync_binlog=1)
Replica Server:
- IO Thread: Connects to primary, reads binary log events, writes to local relay log
- SQL Thread: Reads relay log, executes events to apply changes
- Maintains position (file/position or GTID) to track replication progress
Replication Flow:
PRIMARY: Transaction → Binary Log → Network ↓ REPLICA: Network → Relay Log → SQL Thread → Data Files
Binary Log Configuration:
- log_bin: Enable binary logging (path and base name)
- binlog_format: STATEMENT, ROW, or MIXED
- sync_binlog: Sync to disk frequency (1 = every commit, safest)
- max_binlog_size: Maximum size before rotation (default 1GB)
- binlog_expire_logs_seconds: Auto-purge old logs after N seconds (MySQL 8.0+)
- expire_logs_days: Auto-purge old logs after N days (deprecated, use binlog_expire_logs_seconds)
- binlog_do_db / binlog_ignore_db: Filter which databases to log
Performance Considerations:
- Binary logging adds ~5-15% overhead to write operations
sync_binlog=1is safest but slowest (syncs after every commit)sync_binlog=0is fastest but risks losing transactions on crashsync_binlog=Nsyncs after N commits (balance between safety and performance)- Row-based replication generates more log data for bulk operations
- Use SSDs for binary log storage to reduce I/O latency
Code Example:
-- Check if binary logging is enabled
SHOW VARIABLES LIKE 'log_bin';
-- ON means enabled
-- View binary log configuration
SHOW VARIABLES LIKE 'log_bin%';
SHOW VARIABLES LIKE 'binlog%';
SHOW VARIABLES LIKE 'sync_binlog';
SHOW VARIABLES LIKE 'max_binlog_size';
-- Enable binary logging (my.cnf configuration)
/*
[mysqld]
# Enable binary log
log_bin = /var/lib/mysql/mysql-bin
server_id = 1 # Required, must be unique in replication topology
# Binary log format
binlog_format = ROW # or STATEMENT, MIXED
# Durability settings
sync_binlog = 1 # Safest: sync to disk after every commit
innodb_flush_log_at_trx_commit = 1 # Sync InnoDB logs too
# Size and retention
max_binlog_size = 1G
binlog_expire_logs_seconds = 604800 # 7 days in seconds
# Optional: GTID mode
gtid_mode = ON
enforce_gtid_consistency = ON
*/
-- View current binary logs
SHOW BINARY LOGS;
/*
+------------------+-----------+-----------+
| Log_name | File_size | Encrypted |
+------------------+-----------+-----------+
| mysql-bin.000001 | 1073742 | No |
| mysql-bin.000002 | 2048576 | No |
| mysql-bin.000003 | 512000 | No |
+------------------+-----------+-----------+
*/
-- View current binary log position
SHOW MASTER STATUS;
/*
+------------------+----------+--------------+------------------+-------------------+
| File | Position | Binlog_Do_DB | Binlog_Ignore_DB | Executed_Gtid_Set |
+------------------+----------+--------------+------------------+-------------------+
| mysql-bin.000003 | 512000 | | | uuid:1-1000 |
+------------------+----------+--------------+------------------+-------------------+
*/
-- View events in a binary log
SHOW BINLOG EVENTS IN 'mysql-bin.000003';
-- Limit output for readability
SHOW BINLOG EVENTS IN 'mysql-bin.000003' LIMIT 10;
SHOW BINLOG EVENTS IN 'mysql-bin.000003' FROM 1024 LIMIT 5;
/*
+------------------+-----+----------------+-----------+-------------+--------------------------------------+
| Log_name | Pos | Event_type | Server_id | End_log_pos | Info |
+------------------+-----+----------------+-----------+-------------+--------------------------------------+
| mysql-bin.000003 | 4 | Format_desc | 1 | 124 | Server ver: 8.0.32-MySQL, Binlog... |
| mysql-bin.000003 | 124 | Previous_gtids | 1 | 155 | uuid:1-999 |
| mysql-bin.000003 | 155 | Gtid | 1 | 234 | SET @@SESSION.GTID_NEXT= 'uuid:1000' |
| mysql-bin.000003 | 234 | Query | 1 | 315 | BEGIN |
| mysql-bin.000003 | 315 | Table_map | 1 | 378 | table_id: 108 (mydb.users) |
| mysql-bin.000003 | 378 | Write_rows | 1 | 438 | table_id: 108 flags: STMT_END_F |
| mysql-bin.000003 | 438 | Xid | 1 | 469 | COMMIT /* xid=1234 */ |
+------------------+-----+----------------+-----------+-------------+--------------------------------------+
*/
-- Manually rotate binary log (create new file)
FLUSH BINARY LOGS;
-- Purge old binary logs
-- Purge logs before specific log file
PURGE BINARY LOGS TO 'mysql-bin.000003';
-- Purge logs before specific date/time
PURGE BINARY LOGS BEFORE '2025-12-20 00:00:00';
-- Purge logs older than N days
PURGE BINARY LOGS BEFORE DATE_SUB(NOW(), INTERVAL 7 DAY);
-- DO NOT PURGE if replicas are reading from those logs!
-- Check replica positions first:
SHOW SLAVE HOSTS; -- See connected replicas
-- Reset binary logs (DELETE ALL - dangerous!)
-- Only use on fresh setup or after all replicas are reconfigured
RESET MASTER;
-- View binary log on replica
SHOW BINARY LOGS; -- Replica's own binary log (if log_slave_updates=ON)
-- View relay log (replica's copy of primary's binary log)
SHOW RELAYLOG EVENTS;
-- Monitor binary log disk usage
SELECT
SUBSTRING_INDEX(Log_name, '.', 1) AS log_base,
COUNT(*) AS file_count,
ROUND(SUM(File_size) / 1024 / 1024, 2) AS total_size_mb
FROM information_schema.binary_logs
GROUP BY log_base;
-- Read binary log from command line using mysqlbinlog utility
-- Statement-based format:
-- mysqlbinlog /var/lib/mysql/mysql-bin.000003
-- Row-based format (verbose to see row data):
-- mysqlbinlog --verbose --base64-output=DECODE-ROWS /var/lib/mysql/mysql-bin.000003
-- Point-in-time recovery example
-- Restore backup taken at 2025-12-24 00:00:00
-- Then replay binary logs from that point until 2025-12-24 10:30:00
-- mysqlbinlog --start-datetime="2025-12-24 00:00:00" \
-- --stop-datetime="2025-12-24 10:30:00" \
-- mysql-bin.000003 mysql-bin.000004 | mysql -u root -p
-- Or use positions instead of datetime
-- mysqlbinlog --start-position=512000 \
-- --stop-position=1024000 \
-- mysql-bin.000003 | mysql -u root -p
-- Check binary log encryption status (MySQL 8.0.14+)
SHOW VARIABLES LIKE 'binlog_encryption';
-- Enable binary log encryption (my.cnf)
/*
[mysqld]
binlog_encryption = ON
*/
-- Monitor binary log performance impact
-- Check binary log write performance
SHOW GLOBAL STATUS LIKE 'Binlog%';
/*
Key metrics:
- Binlog_cache_disk_use: Transactions too large for binlog_cache_size
- Binlog_cache_use: Transactions using binlog cache
- Binlog_stmt_cache_disk_use: Statement cache disk usage
- Binlog_stmt_cache_use: Statement cache usage
*/
-- Optimize binlog cache size if many disk uses
SET GLOBAL binlog_cache_size = 32768; -- Default, increase if needed
SET GLOBAL binlog_stmt_cache_size = 32768;
-- Check transaction rate vs binlog size growth
SELECT
VARIABLE_VALUE AS transactions
FROM performance_schema.global_status
WHERE VARIABLE_NAME = 'Com_commit';
-- Example: Create a transaction and verify it's in binlog
CREATE TABLE binlog_test (
id INT PRIMARY KEY AUTO_INCREMENT,
data VARCHAR(100),
created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
);
INSERT INTO binlog_test (data) VALUES ('test data');
-- Check the last few binlog events
SHOW BINLOG EVENTS IN 'mysql-bin.000003' FROM 500000 LIMIT 20;
-- Should see events for CREATE TABLE and INSERT
-- Replication setup using binary log position
-- On PRIMARY:
SHOW MASTER STATUS;
-- Note File and Position
-- On REPLICA:
CHANGE MASTER TO
MASTER_HOST = '192.168.1.10',
MASTER_USER = 'repl_user',
MASTER_PASSWORD = 'password',
MASTER_LOG_FILE = 'mysql-bin.000003', -- From SHOW MASTER STATUS
MASTER_LOG_POS = 512000; -- From SHOW MASTER STATUS
START SLAVE;
-- Monitor replication using binary log positions
SHOW SLAVE STATUS\G
/*
Key fields:
- Master_Log_File: Current binary log file being read from primary
- Read_Master_Log_Pos: Position in that file
- Relay_Master_Log_File: Binary log file being executed
- Exec_Master_Log_Pos: Position being executed
- Seconds_Behind_Master: Replication lag
*/
-- Binary log filtering (use with caution)
-- In my.cnf:
/*
[mysqld]
# Only log specific databases
binlog_do_db = mydb
binlog_do_db = another_db
# Ignore specific databases (NOT recommended, breaks point-in-time recovery)
binlog_ignore_db = test
binlog_ignore_db = temp
*/
References:
- The Binary Log
- Binary Logging Options and Variables
- Binary Log Event Types
- mysqlbinlog Utility
- Point-in-Time Recovery
High Availability
What is the difference between ProxySQL and MySQL Router?
The 30-Second Answer:
ProxySQL is a feature-rich, open-source proxy with advanced query routing, caching, query rewriting, and connection pooling, designed for complex database architectures. MySQL Router is Oracle's lightweight routing solution specifically optimized for MySQL InnoDB Cluster with automatic configuration and simpler setup. ProxySQL offers more features and flexibility but requires more configuration, while Router provides easier integration with InnoDB Cluster but fewer advanced features.
The 2-Minute Answer (If They Want More):
ProxySQL and MySQL Router serve similar purposes but differ significantly in capabilities and use cases:
ProxySQL Advantages:
- Query Analysis and Routing: Can route queries based on content (SELECT to replicas, writes to primary), regex patterns, and query digests
- Query Caching: Built-in result set caching to reduce database load
- Query Rewriting: Ability to modify queries on-the-fly for optimization or compatibility
- Advanced Connection Pooling: Sophisticated multiplexing reduces backend connections significantly
- Traffic Mirroring: Send duplicate traffic to test environments
- Firewall Capabilities: Block queries matching specific patterns
- Extensive Monitoring: Rich statistics and performance metrics through admin interface
- Scheduler: Built-in job scheduler for maintenance tasks
- Flexibility: Works with any MySQL-compatible database, not just InnoDB Cluster
MySQL Router Advantages:
- InnoDB Cluster Integration: Automatic discovery and configuration with InnoDB Cluster
- Simplicity: Bootstrap mode eliminates most manual configuration
- Metadata-Driven: Automatically tracks cluster topology changes through metadata
- Official Support: Backed by Oracle as part of the MySQL ecosystem
- X Protocol Support: Native support for MySQL X Protocol
- Lower Resource Usage: Lightweight design with minimal memory footprint
- Easier Upgrades: Synchronized with MySQL version releases
ProxySQL Disadvantages:
- Requires manual configuration and rule setup
- Steeper learning curve
- Configuration complexity for advanced features
- More resource intensive
- Requires separate monitoring setup
MySQL Router Disadvantages:
- Limited to basic routing (read-write split by port)
- No query-based routing or caching
- Less flexible with non-InnoDB Cluster setups
- Fewer monitoring and statistics features
- No query rewriting or firewall capabilities
- Limited customization options
Use Case Recommendations:
- Choose ProxySQL for: Complex routing requirements, query caching needs, legacy replication setups, multi-cluster environments, need for query analysis and rewriting
- Choose MySQL Router for: InnoDB Cluster deployments, simple read-write splitting, preference for official Oracle solutions, minimal configuration requirements
Code Example:
-- ProxySQL Configuration
-- Connect to ProxySQL admin interface
-- mysql -h 127.0.0.1 -P 6032 -u admin -p
-- Add MySQL servers
INSERT INTO mysql_servers (hostgroup_id, hostname, port)
VALUES
(0, 'primary.example.com', 3306),
(1, 'replica1.example.com', 3306),
(1, 'replica2.example.com', 3306);
LOAD MYSQL SERVERS TO RUNTIME;
SAVE MYSQL SERVERS TO DISK;
-- Configure users
INSERT INTO mysql_users (username, password, default_hostgroup, transaction_persistent)
VALUES ('app_user', 'password', 0, 1);
LOAD MYSQL USERS TO RUNTIME;
SAVE MYSQL USERS TO DISK;
-- Query routing rules (reads to replicas, writes to primary)
INSERT INTO mysql_query_rules (rule_id, active, match_digest, destination_hostgroup, apply)
VALUES
(1, 1, '^SELECT.*FOR UPDATE', 0, 1),
(2, 1, '^SELECT', 1, 1);
LOAD MYSQL QUERY RULES TO RUNTIME;
SAVE MYSQL QUERY RULES TO DISK;
-- Enable query caching
INSERT INTO mysql_query_rules (rule_id, active, match_digest, cache_ttl, apply)
VALUES (10, 1, '^SELECT COUNT', 60000, 1);
-- Query rewriting example
INSERT INTO mysql_query_rules (rule_id, active, match_pattern, replace_pattern, apply)
VALUES (20, 1, '^SELECT \* FROM users$', 'SELECT id, name, email FROM users', 1);
-- Configure connection pooling
UPDATE global_variables SET variable_value='1000'
WHERE variable_name='mysql-max_connections';
UPDATE global_variables SET variable_value='200'
WHERE variable_name='mysql-default_max_latency_ms';
LOAD MYSQL VARIABLES TO RUNTIME;
SAVE MYSQL VARIABLES TO DISK;
-- Monitor statistics
SELECT * FROM stats_mysql_query_digest ORDER BY sum_time DESC LIMIT 10;
SELECT * FROM stats_mysql_connection_pool;
SELECT * FROM stats_mysql_commands_counters;
-- Health check configuration
UPDATE mysql_servers SET max_replication_lag=10 WHERE hostgroup_id=1;
UPDATE global_variables SET variable_value='2000'
WHERE variable_name='mysql-monitor_connect_interval';
-- --- MySQL Router Configuration ---
-- Router is configured primarily through mysqlrouter.conf
-- Bootstrap creates configuration automatically
-- View router status (MySQL Shell)
// var cluster = dba.getCluster();
// cluster.listRouters();
-- Check router REST API (if enabled)
-- curl http://localhost:8443/api/20190715/routes
-- Monitor connections through router
-- netstat -an | grep :6446 # Read-write port
-- netstat -an | grep :6447 # Read-only port
-- Application connection comparison
-- ProxySQL: All traffic through single port (3306 by default)
-- mysql -h proxysql-host -P 6033 -u app_user -p
-- MySQL Router: Different ports for read-write vs read-only
-- mysql -h router-host -P 6446 -u app_user -p # Read-write
-- mysql -h router-host -P 6447 -u app_user -p # Read-only
-- ProxySQL monitoring queries
SELECT hostgroup, srv_host, status, Queries, Bytes_sent, Bytes_recv
FROM stats_mysql_connection_pool;
SELECT digest_text, count_star, sum_time, min_time, max_time
FROM stats_mysql_query_digest
ORDER BY sum_time DESC LIMIT 20;
-- ProxySQL health check
SELECT * FROM mysql_server_ping_log ORDER BY time_start DESC LIMIT 10;
SELECT * FROM mysql_server_replication_lag_log ORDER BY time_start DESC LIMIT 10;
-- Advanced ProxySQL features
-- Traffic mirroring
INSERT INTO mysql_query_rules (rule_id, active, match_digest, mirror_hostgroup, apply)
VALUES (100, 1, '^SELECT.*FROM orders', 2, 1);
-- Query firewall
INSERT INTO mysql_query_rules (rule_id, active, match_pattern, error_msg, apply)
VALUES (200, 1, '.*DROP TABLE.*', 'DROP TABLE not allowed', 1);
-- Scheduler for automated tasks
INSERT INTO scheduler (active, interval_ms, filename, arg1)
VALUES (1, 300000, '/var/lib/proxysql/check_readonly.sh', '1');
LOAD SCHEDULER TO RUNTIME;
SAVE SCHEDULER TO DISK;
-- ProxySQL query analysis
SELECT hostgroup, schemaname, username, digest_text, count_star
FROM stats_mysql_query_digest
WHERE digest_text LIKE '%JOIN%'
ORDER BY sum_time DESC;
-- Connection pool efficiency
SELECT hostgroup,
SUM(ConnUsed) as used_connections,
SUM(ConnFree) as free_connections,
SUM(Queries) as total_queries
FROM stats_mysql_connection_pool
GROUP BY hostgroup;
References:
- ProxySQL Documentation
- ProxySQL GitHub
- MySQL Router Documentation
- ProxySQL Query Rules
- MySQL Router Configuration
Performance Tuning
What is Performance Schema?
The 30-Second Answer:
Performance Schema is MySQL's built-in instrumentation framework that collects real-time performance metrics at low overhead. It monitors server execution at runtime, tracking statement execution, table I/O, locks, memory usage, and more through in-memory tables. Unlike the slow query log, it provides detailed timing breakdowns, waits analysis, and is queryable like regular tables, making it essential for performance troubleshooting and monitoring.
The 2-Minute Answer (If They Want More):
Performance Schema is a feature for monitoring MySQL server execution at a low level, introduced in MySQL 5.5 and significantly enhanced in later versions. It operates as a storage engine with special in-memory tables that capture performance data.
Key Characteristics:
Data Collection:
- Instruments server code to collect timing and execution metrics
- Measures statement execution, I/O operations, locks, memory allocation
- Tracks metadata operations, table I/O, index usage
- Records connection activity, prepared statements, stages
- Monitors replication performance
Design Philosophy:
- Low overhead (typically 5-10% when fully enabled)
- In-memory tables (data lost on restart)
- Queryable with standard SQL
- Configurable instrumentation levels
- No need to restart server for most changes
Key Table Categories:
- Setup Tables (
setup_*): Configure what to monitor - Instance Tables (
*_instances): Objects being monitored - Event Tables (
events_*): Current and historical events - Summary Tables (
*_summary_*): Aggregated statistics - Status Variables (
status_by_*): Status variable summaries - Connection Tables: Current and historical connections
- Replication Tables: Replication performance metrics
Common Use Cases:
- Identify slow queries with detailed timing breakdown
- Find tables with most I/O operations
- Analyze wait events (what's blocking queries)
- Monitor memory usage by user/thread
- Track index usage and missing indexes
- Debug locking and deadlock issues
- Analyze prepared statement performance
- Monitor replication lag and throughput
Advantages Over Slow Query Log:
- Real-time, queryable data
- Detailed execution stage timing
- Wait event analysis
- No need for log file parsing
- Can aggregate and filter with SQL
- Lower overhead than table-based slow log
Configuration Considerations:
- Enabled by default in MySQL 5.6.6+
- Many instruments disabled by default for performance
- Can enable/disable instruments dynamically
- Memory usage configurable
- Use
sysschema for easier querying
When to Use:
- Performance troubleshooting and optimization
- Monitoring production workloads
- Capacity planning and trend analysis
- Identifying resource bottlenecks
- Comparing performance before/after changes
Code Example:
-- Check if Performance Schema is enabled
SHOW VARIABLES LIKE 'performance_schema';
-- View memory usage by Performance Schema
SELECT * FROM performance_schema.memory_summary_global_by_event_name
WHERE EVENT_NAME LIKE 'memory/performance_schema%'
ORDER BY CURRENT_NUMBER_OF_BYTES_USED DESC;
-- List all setup/configuration tables
SHOW TABLES FROM performance_schema LIKE 'setup%';
-- View enabled instruments
SELECT NAME, ENABLED, TIMED
FROM performance_schema.setup_instruments
WHERE ENABLED = 'YES'
LIMIT 20;
-- Enable specific instruments
UPDATE performance_schema.setup_instruments
SET ENABLED = 'YES', TIMED = 'YES'
WHERE NAME LIKE 'statement/%';
-- Enable wait event monitoring
UPDATE performance_schema.setup_instruments
SET ENABLED = 'YES', TIMED = 'YES'
WHERE NAME LIKE 'wait/%';
-- View consumers (where data goes)
SELECT * FROM performance_schema.setup_consumers;
-- Enable statement history
UPDATE performance_schema.setup_consumers
SET ENABLED = 'YES'
WHERE NAME LIKE '%statement%';
-- Find slowest queries currently executing
SELECT
THREAD_ID,
EVENT_NAME,
TRUNCATE(TIMER_WAIT/1000000000000, 2) AS duration_sec,
TRUNCATE(LOCK_TIME/1000000000000, 2) AS lock_time_sec,
SQL_TEXT,
CURRENT_SCHEMA,
ROWS_EXAMINED,
ROWS_SENT
FROM performance_schema.events_statements_current
WHERE TIMER_WAIT IS NOT NULL
ORDER BY TIMER_WAIT DESC
LIMIT 10;
-- Historical statement analysis (last 100 statements per thread)
SELECT
TRUNCATE(TIMER_WAIT/1000000000000, 2) AS duration_sec,
SQL_TEXT,
ROWS_EXAMINED,
ROWS_SENT,
CREATED_TMP_TABLES,
CREATED_TMP_DISK_TABLES
FROM performance_schema.events_statements_history
ORDER BY TIMER_WAIT DESC
LIMIT 20;
-- Summary of statements by digest (grouped similar queries)
SELECT
SCHEMA_NAME,
DIGEST_TEXT,
COUNT_STAR AS exec_count,
TRUNCATE(AVG_TIMER_WAIT/1000000000000, 2) AS avg_sec,
TRUNCATE(MAX_TIMER_WAIT/1000000000000, 2) AS max_sec,
TRUNCATE(SUM_TIMER_WAIT/1000000000000, 2) AS total_sec,
SUM_ROWS_EXAMINED AS total_rows_examined,
SUM_ROWS_SENT AS total_rows_sent
FROM performance_schema.events_statements_summary_by_digest
WHERE SCHEMA_NAME IS NOT NULL
ORDER BY SUM_TIMER_WAIT DESC
LIMIT 10;
-- Table I/O statistics
SELECT
OBJECT_SCHEMA,
OBJECT_NAME,
COUNT_READ,
COUNT_WRITE,
COUNT_FETCH,
COUNT_INSERT,
COUNT_UPDATE,
COUNT_DELETE
FROM performance_schema.table_io_waits_summary_by_table
WHERE OBJECT_SCHEMA NOT IN ('mysql', 'performance_schema', 'information_schema')
ORDER BY COUNT_READ + COUNT_WRITE DESC
LIMIT 10;
-- Index usage statistics
SELECT
OBJECT_SCHEMA,
OBJECT_NAME,
INDEX_NAME,
COUNT_FETCH,
COUNT_INSERT,
COUNT_UPDATE,
COUNT_DELETE
FROM performance_schema.table_io_waits_summary_by_index_usage
WHERE OBJECT_SCHEMA NOT IN ('mysql', 'performance_schema')
AND INDEX_NAME IS NOT NULL
ORDER BY COUNT_FETCH DESC
LIMIT 10;
-- Find unused indexes
SELECT
OBJECT_SCHEMA,
OBJECT_NAME,
INDEX_NAME
FROM performance_schema.table_io_waits_summary_by_index_usage
WHERE OBJECT_SCHEMA NOT IN ('mysql', 'performance_schema', 'information_schema')
AND INDEX_NAME IS NOT NULL
AND COUNT_STAR = 0
ORDER BY OBJECT_SCHEMA, OBJECT_NAME;
-- Wait events analysis (what's causing delays)
SELECT
EVENT_NAME,
COUNT_STAR AS count,
TRUNCATE(SUM_TIMER_WAIT/1000000000000, 2) AS total_sec,
TRUNCATE(AVG_TIMER_WAIT/1000000000000, 6) AS avg_sec,
TRUNCATE(MAX_TIMER_WAIT/1000000000000, 2) AS max_sec
FROM performance_schema.events_waits_summary_global_by_event_name
WHERE COUNT_STAR > 0
ORDER BY SUM_TIMER_WAIT DESC
LIMIT 10;
-- Memory usage by thread
SELECT
THREAD_ID,
EVENT_NAME,
CURRENT_NUMBER_OF_BYTES_USED / 1024 / 1024 AS current_mb,
HIGH_NUMBER_OF_BYTES_USED / 1024 / 1024 AS high_mb
FROM performance_schema.memory_summary_by_thread_by_event_name
WHERE CURRENT_NUMBER_OF_BYTES_USED > 0
ORDER BY CURRENT_NUMBER_OF_BYTES_USED DESC
LIMIT 10;
-- Current connections
SELECT
PROCESSLIST_ID,
PROCESSLIST_USER,
PROCESSLIST_HOST,
PROCESSLIST_DB,
PROCESSLIST_COMMAND,
PROCESSLIST_TIME,
PROCESSLIST_STATE
FROM performance_schema.threads
WHERE TYPE = 'FOREGROUND'
ORDER BY PROCESSLIST_TIME DESC;
-- Reset/truncate collected statistics
TRUNCATE TABLE performance_schema.events_statements_summary_by_digest;
TRUNCATE TABLE performance_schema.events_statements_history;
-- Configuration in my.cnf
# [mysqld]
# performance_schema = ON
# performance_schema_max_table_instances = 12500
# performance_schema_events_statements_history_size = 10
# performance_schema_events_statements_history_long_size = 10000
References:
- MySQL Performance Schema
- Performance Schema Quick Start
- Performance Schema Configuration
- Performance Schema Statement Tables
Indexing
What is a clustered index in InnoDB?
The 30-Second Answer:
A clustered index in InnoDB is the primary key index that determines the physical storage order of table data. The table data itself is stored in the leaf nodes of the clustered index B-tree. If no primary key is defined, InnoDB uses the first unique non-null index, or creates a hidden 6-byte row ID. Each InnoDB table has exactly one clustered index.
The 2-Minute Answer (If They Want More):
The clustered index is fundamental to InnoDB's storage architecture. Unlike secondary indexes that store only key values and primary key references, the clustered index stores the actual row data in its leaf nodes. This has several important implications:
Physical Order: Rows are physically stored on disk in clustered index order, making range scans on the primary key extremely efficient.
No Separate Lookup: Querying by primary key requires only traversing the clustered index B-tree - no additional lookup needed since the data is right there.
Choice Matters: Your primary key choice affects all queries. Sequential primary keys (like AUTO_INCREMENT) avoid page splits and fragmentation, while UUIDs can cause performance issues due to random insertions.
Secondary Index Impact: All secondary indexes store the clustered index key value as their "pointer" to the row, so a large primary key makes all indexes larger.
Implicit in Queries: When you query by primary key, you're using the clustered index even without explicitly creating an index.
Best practices: Use a small, sequential primary key when possible. For most tables, an AUTO_INCREMENT integer is ideal. Avoid large composite primary keys or UUIDs unless you have specific requirements.
Code Example:
-- Table with explicit primary key (becomes clustered index)
CREATE TABLE users (
user_id INT AUTO_INCREMENT PRIMARY KEY,
email VARCHAR(255),
created_at TIMESTAMP
);
-- This query uses the clustered index efficiently
SELECT * FROM users WHERE user_id = 12345;
-- Range query benefits from physical ordering
SELECT * FROM users WHERE user_id BETWEEN 1000 AND 2000;
-- View index information
SHOW INDEX FROM users;
-- PRIMARY key will show as the clustered index
-- Table without primary key - InnoDB creates hidden clustered index
CREATE TABLE logs (
message TEXT,
created_at TIMESTAMP
);
-- InnoDB internally creates a 6-byte hidden row ID as clustered index
-- Composite primary key (becomes clustered index)
CREATE TABLE order_items (
order_id INT,
item_id INT,
quantity INT,
PRIMARY KEY (order_id, item_id)
);
-- Data is physically stored ordered by order_id, then item_id
-- Queries on order_id benefit from clustering
SELECT * FROM order_items WHERE order_id = 100;
References:
↑ Back to topWhat is a secondary index and how does it differ from a clustered index?
The 30-Second Answer:
A secondary index (also called non-clustered index) is any index other than the clustered index. In InnoDB, secondary index leaf nodes contain the indexed column values plus the primary key value, not the full row data. Querying via secondary index requires two lookups: first to find the primary key in the secondary index, then to retrieve the full row from the clustered index. This is called a "bookmark lookup" or "clustered index lookup."
The 2-Minute Answer (If They Want More):
Secondary indexes differ from clustered indexes in several critical ways:
Storage Structure: Secondary index leaf nodes store only the indexed columns and the primary key value. The actual row data remains in the clustered index.
Two-Step Lookup: When you query using a secondary index, InnoDB:
- First traverses the secondary index B-tree to find matching entries
- Extracts the primary key value from each match
- Uses that primary key to look up the full row in the clustered index
This double lookup is why primary key queries are faster than secondary index queries.
Multiple Allowed: A table can have multiple secondary indexes, but only one clustered index.
Index Size: Because secondary indexes include the primary key value, a large primary key increases the size of every secondary index. This is why keeping primary keys small is important.
Covering Index Optimization: If your query only needs columns that are in the secondary index (including the primary key), MySQL can skip the second lookup - this is called an "index-only scan" or "covering index."
Maintenance Cost: Secondary indexes must be updated on INSERT, UPDATE, and DELETE operations, adding write overhead. Each additional index slows down writes.
Code Example:
CREATE TABLE products (
product_id INT AUTO_INCREMENT PRIMARY KEY, -- Clustered index
name VARCHAR(255),
category VARCHAR(100),
price DECIMAL(10,2),
created_at TIMESTAMP,
INDEX idx_category (category), -- Secondary index
INDEX idx_price (price), -- Secondary index
INDEX idx_name (name) -- Secondary index
);
-- Query using secondary index (two-step lookup)
SELECT * FROM products WHERE category = 'Electronics';
-- Step 1: Find primary keys in idx_category where category='Electronics'
-- Step 2: For each primary key, fetch full row from clustered index
-- Covering index query (single lookup)
SELECT product_id, category FROM products WHERE category = 'Electronics';
-- Only uses idx_category - no clustered index lookup needed
-- idx_category contains both category (indexed column) and product_id (primary key)
-- Composite secondary index
CREATE INDEX idx_category_price ON products (category, price);
-- This query uses the composite index efficiently
SELECT * FROM products WHERE category = 'Electronics' AND price < 500;
-- View how queries use indexes
EXPLAIN SELECT * FROM products WHERE category = 'Electronics'\G
-- Shows: type: ref, key: idx_category, Extra: Using index condition
EXPLAIN SELECT product_id, category FROM products WHERE category = 'Electronics'\G
-- Shows: Extra: Using index (covering index - no table access needed)
-- Impact of large primary key on secondary indexes
CREATE TABLE bad_example (
uuid CHAR(36) PRIMARY KEY, -- 36 bytes
data VARCHAR(100),
INDEX idx_data (data)
);
-- idx_data stores: data value (100 bytes) + uuid (36 bytes) = large index
-- Better: use AUTO_INCREMENT INT (4 bytes) and add uuid as a secondary index
References:
↑ Back to topQuery Optimization
What is filesort and how do you avoid it?
The 30-Second Answer:
Filesort appears in EXPLAIN output's Extra column when MySQL must perform an additional sorting pass because results aren't returned in the required order from the index. Despite the name, it may use memory (up to sort_buffer_size) or disk. Avoid filesort by: creating indexes that match the ORDER BY clause, ensuring WHERE and ORDER BY use the same index, ordering by indexed columns in index order, and keeping sort_buffer_size adequate. Filesort isn't always bad - it's only problematic for large result sets.
The 2-Minute Answer (If They Want More):
Filesort is MySQL's external sorting operation, triggered when the optimizer cannot retrieve rows in the required sort order using an index. Despite its name, filesort can use:
- In-memory sorting: When dataset fits in
sort_buffer_size(faster) - Disk-based sorting: When data exceeds memory buffer (slower, uses temporary files)
When Filesort Occurs:
No Index on ORDER BY columns
SELECT * FROM users ORDER BY last_login; -- No index on last_loginORDER BY uses columns from multiple tables
SELECT * FROM t1 JOIN t2 ON t1.id = t2.id ORDER BY t1.a, t2.b;ORDER BY direction mismatch with index
-- Index: (a ASC, b ASC) SELECT * FROM t ORDER BY a ASC, b DESC; -- Can't use index (before 8.0)WHERE and ORDER BY use different indexes
SELECT * FROM t WHERE col_a = 5 ORDER BY col_b; -- Different columnsORDER BY on expression or function
SELECT * FROM t ORDER BY UPPER(name); -- Function prevents index use
Types of Filesort (visible in optimizer trace):
- Modified quicksort: Default algorithm for small datasets
- Merge sort: For larger datasets
- Priority queue: For
LIMITqueries (optimizes for top N rows)
Avoidance Strategies:
Create Covering Index: Index includes ORDER BY columns
CREATE INDEX idx_covering ON users(status, last_login, username); SELECT username FROM users WHERE status = 'active' ORDER BY last_login;Match Index Column Order: ORDER BY matches index prefix
-- Index: (country, city, created_at) SELECT * FROM users WHERE country = 'USA' ORDER BY city, created_at; -- Uses indexUse Descending Indexes (8.0+): Match mixed sort directions
CREATE INDEX idx_mixed ON orders(customer_id ASC, order_date DESC); SELECT * FROM orders WHERE customer_id = 123 ORDER BY customer_id ASC, order_date DESC; -- No filesortIncrease sort_buffer_size: Keep sorting in memory
SET SESSION sort_buffer_size = 2097152; -- 2MBReduce Selected Columns: Smaller rows fit better in sort buffer
SELECT id, name FROM users ORDER BY name; -- Better than SELECT *
When Filesort is Acceptable:
- Small result sets (hundreds/few thousand rows)
- Infrequent queries
- Result set already small after WHERE filtering
- Query with LIMIT (uses priority queue optimization)
Monitoring Impact:
Check Sort_merge_passes status variable - non-zero indicates disk-based sorting occurred.
Code Example:
-- Example 1: Problem - filesort on large table
CREATE TABLE users (
user_id INT PRIMARY KEY,
username VARCHAR(50),
email VARCHAR(100),
created_at DATETIME,
last_login DATETIME,
status ENUM('active', 'inactive')
);
-- Bad: Full filesort
EXPLAIN SELECT * FROM users ORDER BY last_login DESC LIMIT 10;
-- Extra: Using filesort
-- Solution 1: Add index
CREATE INDEX idx_last_login ON users(last_login DESC);
EXPLAIN SELECT * FROM users ORDER BY last_login DESC LIMIT 10;
-- Extra: Backward index scan (8.0+) or just uses index
-- No filesort!
-- Example 2: Compound WHERE and ORDER BY
-- Bad: Different columns for WHERE and ORDER BY
EXPLAIN SELECT * FROM users
WHERE status = 'active'
ORDER BY last_login DESC;
-- Extra: Using where; Using filesort
-- Solution 2: Composite index covering both
CREATE INDEX idx_status_login ON users(status, last_login DESC);
EXPLAIN SELECT * FROM users
WHERE status = 'active'
ORDER BY last_login DESC;
-- Extra: Using where (filesort eliminated)
-- type: ref (uses index)
-- Example 3: Multi-column sort optimization
-- Bad: Sorting by multiple columns without index
EXPLAIN SELECT * FROM users
ORDER BY status, last_login DESC, username;
-- Extra: Using filesort
-- Solution 3: Composite index matching ORDER BY
CREATE INDEX idx_multi_sort ON users(status, last_login DESC, username);
EXPLAIN SELECT * FROM users
ORDER BY status, last_login DESC, username;
-- Uses index, no filesort
-- Example 4: Mixed ASC/DESC before MySQL 8.0
-- Problem in MySQL 5.7: Can't use index
EXPLAIN SELECT * FROM users
ORDER BY status ASC, last_login DESC;
-- Extra: Using filesort (MySQL 5.7)
-- Solution 4: Descending index (MySQL 8.0+)
CREATE INDEX idx_mixed_sort ON users(status ASC, last_login DESC);
EXPLAIN SELECT * FROM users
ORDER BY status ASC, last_login DESC;
-- Uses index, no filesort (MySQL 8.0+)
-- Example 5: Function in ORDER BY
-- Bad: Function prevents index use
EXPLAIN SELECT * FROM users
ORDER BY YEAR(created_at), username;
-- Extra: Using filesort
-- Solution 5a: Generated column (8.0+)
ALTER TABLE users
ADD COLUMN created_year INT AS (YEAR(created_at)) STORED;
CREATE INDEX idx_year_username ON users(created_year, username);
EXPLAIN SELECT * FROM users
ORDER BY created_year, username;
-- Uses index, no filesort
-- Solution 5b: Reorganize query if possible
CREATE INDEX idx_created_username ON users(created_at, username);
EXPLAIN SELECT * FROM users
WHERE created_at >= '2024-01-01' AND created_at < '2025-01-01'
ORDER BY created_at, username;
-- Uses index for both WHERE and ORDER BY
-- Example 6: Covering index eliminates filesort
-- Bad: Retrieves all columns
EXPLAIN SELECT * FROM users
WHERE status = 'active'
ORDER BY last_login
LIMIT 100;
-- Extra: Using where; Using filesort
-- Solution 6: Covering index for specific columns
CREATE INDEX idx_covering ON users(status, last_login, user_id, username);
EXPLAIN SELECT user_id, username FROM users
WHERE status = 'active'
ORDER BY last_login
LIMIT 100;
-- Extra: Using where; Using index
-- No filesort, no table access!
-- Example 7: Join with ORDER BY
CREATE TABLE orders (
order_id INT PRIMARY KEY,
customer_id INT,
order_date DATETIME,
total_amount DECIMAL(10,2),
INDEX idx_customer(customer_id)
);
-- Bad: ORDER BY from joined table
EXPLAIN SELECT u.username, o.order_date, o.total_amount
FROM users u
JOIN orders o ON u.user_id = o.customer_id
ORDER BY o.order_date DESC;
-- Extra: Using filesort
-- Solution 7: Index on order_date, optimize JOIN order
CREATE INDEX idx_order_date ON orders(order_date DESC);
EXPLAIN SELECT u.username, o.order_date, o.total_amount
FROM orders o
JOIN users u ON o.customer_id = u.user_id
ORDER BY o.order_date DESC;
-- Starts with orders table, uses index for ORDER BY
-- May eliminate or reduce filesort
-- Example 8: Monitoring filesort performance
-- Check sort buffer usage
SHOW VARIABLES LIKE 'sort_buffer_size'; -- Default: 262144 (256KB)
-- Monitor filesort statistics
SHOW SESSION STATUS LIKE 'Sort%';
/*
Sort_merge_passes: 0 -- If > 0, sorts spilling to disk
Sort_range: X -- Sorts done via range access
Sort_rows: X -- Total rows sorted
Sort_scan: X -- Sorts done via table scan
*/
-- Reset counters
FLUSH STATUS;
-- Run query
SELECT * FROM users ORDER BY last_login DESC LIMIT 1000;
-- Check if disk sorting occurred
SHOW SESSION STATUS LIKE 'Sort_merge_passes';
-- If > 0, consider increasing sort_buffer_size or adding index
-- Increase sort buffer for session
SET SESSION sort_buffer_size = 524288; -- 512KB
-- Example 9: LIMIT optimization with filesort
-- Priority queue optimization (good even with filesort)
EXPLAIN SELECT * FROM users
ORDER BY last_login DESC
LIMIT 10;
-- Extra: Using filesort
-- But uses priority queue (memory-efficient for small LIMIT)
-- Much worse without LIMIT
EXPLAIN SELECT * FROM users
ORDER BY last_login DESC;
-- Extra: Using filesort
-- Must sort ALL rows
-- Example 10: Optimizer trace showing filesort decision
SET optimizer_trace='enabled=on';
SELECT user_id, username FROM users
WHERE status = 'active'
ORDER BY last_login DESC
LIMIT 20;
SELECT TRACE->>'$.steps[*].filesort_information'
FROM information_schema.OPTIMIZER_TRACE\G
/*
Shows:
- sort_mode: packed or row_id mode
- sort_algorithm: quicksort, merge, priority queue
- Memory usage
- Whether filesort is needed
*/
SET optimizer_trace='enabled=off';
References:
↑ Back to topSecurity
What is caching_sha2_password?
The 30-Second Answer:
caching_sha2_password is MySQL 8.0's default authentication plugin that uses SHA-256 hashing for password storage and verification. It improves security over the legacy mysql_native_password (SHA-1) while maintaining performance through server-side password caching. It requires either an encrypted connection (SSL/TLS) or RSA key-pair encryption for the initial password exchange, making it more secure but requiring proper client support.
The 2-Minute Answer (If They Want More):
caching_sha2_password was introduced in MySQL 8.0 as the default authentication plugin to address security limitations of mysql_native_password:
Security Improvements:
- SHA-256 Hashing: Uses SHA-256 instead of SHA-1 (which has known vulnerabilities)
- Salt Mechanism: Includes per-user salt values to prevent rainbow table attacks
- Secure Password Exchange: Requires encrypted connection or RSA encryption for password transmission
- FIPS Compliance: Meets federal security standards for cryptographic modules
How It Works:
Initial Authentication:
- Client connects to server
- Server sends authentication challenge with salt
- Client must send password either:
- Over encrypted SSL/TLS connection, OR
- Encrypted with server's RSA public key
- Server verifies password and caches the result
Subsequent Authentications:
- If user is in cache and connects from same source, authentication is fast
- Cache persists until server restart
- Combines security with performance
Performance vs. Security:
- Slightly slower than
mysql_native_passwordon first connection - Comparable performance on subsequent connections due to caching
- Much more secure cryptographically
Client Requirements:
- Client must support
caching_sha2_passwordplugin - MySQL 8.0+ clients, MySQL Connector/J 8.0+, etc.
- Older clients may need to use
mysql_native_passwordcompatibility
Migration Considerations: When upgrading from MySQL 5.7 to 8.0:
- Existing users keep their authentication plugin
- New users get
caching_sha2_passwordby default - Applications using old connectors may need updates
- Can set
default_authentication_pluginfor compatibility
Code Example:
-- Check current default authentication plugin
SHOW VARIABLES LIKE 'default_authentication_plugin';
-- Create user with caching_sha2_password (default in MySQL 8.0)
CREATE USER 'secure_user'@'localhost'
IDENTIFIED WITH caching_sha2_password
BY 'StrongPassword123!';
-- Create user with legacy plugin for compatibility
CREATE USER 'legacy_app'@'localhost'
IDENTIFIED WITH mysql_native_password
BY 'Password123!';
-- Check user's authentication plugin
SELECT user, host, plugin
FROM mysql.user
WHERE user = 'secure_user';
-- Convert existing user to caching_sha2_password
ALTER USER 'existing_user'@'localhost'
IDENTIFIED WITH caching_sha2_password
BY 'NewSecurePassword123!';
-- Set default plugin for backward compatibility (my.cnf/my.ini)
-- [mysqld]
-- default_authentication_plugin=mysql_native_password
-- Check if SSL/TLS is available for secure password exchange
SHOW VARIABLES LIKE 'have_ssl';
-- Require SSL for specific user
ALTER USER 'secure_user'@'localhost' REQUIRE SSL;
-- Get RSA public key for client-side encryption (if not using SSL)
-- From MySQL client:
-- mysql> GET_DIAGNOSTICS CONDITION 1 @p1 = RSA_PUBLIC_KEY;
-- Clear authentication cache (restart required)
-- FLUSH PRIVILEGES; -- doesn't clear caching_sha2_password cache
-- Service restart required to clear cache
-- Monitor authentication plugin usage
SELECT plugin, COUNT(*) as user_count
FROM mysql.user
GROUP BY plugin;
-- Create user with require SSL and modern auth
CREATE USER 'api_user'@'%'
IDENTIFIED WITH caching_sha2_password BY 'ApiKey123!'
REQUIRE SSL;
-- Connection example with SSL (from command line)
-- mysql -u secure_user -p \
-- --ssl-mode=REQUIRED \
-- --ssl-ca=/path/to/ca.pem
-- Check connection encryption status
SHOW STATUS LIKE 'Ssl_cipher';
-- View current user's authentication plugin
SELECT CURRENT_USER(),
plugin
FROM mysql.user
WHERE user = SUBSTRING_INDEX(CURRENT_USER(), '@', 1)
AND host = SUBSTRING_INDEX(CURRENT_USER(), '@', -1);
-- Programmatic client example (Python with mysql-connector-python)
-- import mysql.connector
--
-- config = {
-- 'user': 'secure_user',
-- 'password': 'StrongPassword123!',
-- 'host': 'localhost',
-- 'database': 'mydb',
-- 'ssl_disabled': False, # Enable SSL
-- 'auth_plugin': 'caching_sha2_password'
-- }
--
-- connection = mysql.connector.connect(**config)
References:
- Caching SHA-2 Pluggable Authentication
- MySQL 8.0 Authentication Plugin Changes
- Authentication Plugin Comparison
- Migrating to caching_sha2_password
MySQL Architecture
What is the data dictionary in MySQL 8.0?
The 30-Second Answer:
The data dictionary in MySQL 8.0 is a transactional, centralized metadata repository stored in InnoDB tables that replaced the old file-based system (.frm, .par, .opt files). It stores information about database objects (tables, columns, indexes, foreign keys) in a consistent, crash-safe format. This change enables atomic DDL operations, better performance, simplified architecture, and makes MySQL more reliable and easier to manage.
The 2-Minute Answer (If They Want More):
MySQL 8.0 introduced a fundamental architectural change by moving metadata from the file system to InnoDB:
Key Improvements Over Legacy System:
1. Atomic DDL Operations:
- DDL operations (CREATE, ALTER, DROP) are now atomic and crash-safe
- If a DDL operation fails midway, changes are rolled back completely
- No more orphaned files or inconsistent metadata
- Example: Dropping a partitioned table is now a single atomic operation instead of individual file deletions
2. Centralized Storage:
- All metadata stored in InnoDB tables in the
mysqlschema - Tables like
mysql.tables,mysql.columns,mysql.indexes,mysql.foreign_keys - Eliminates .frm (table format), .par (partition), and db.opt (database options) files
- Metadata is transactional and benefits from InnoDB's ACID properties
3. Improved Performance:
- Faster metadata lookups using InnoDB indexes instead of file system operations
- Better concurrency with InnoDB's MVCC
- Reduced I/O for information_schema queries
- Cached in InnoDB buffer pool like regular data
4. Better Consistency:
- Single source of truth for metadata
- No synchronization issues between server cache and files
- Consistent across replication topology
5. Enhanced INFORMATION_SCHEMA:
- INFORMATION_SCHEMA views now query data dictionary tables directly
- Many views implemented as actual views on data dictionary tables
- Faster and more efficient than old implementation
- Some new tables like
INFORMATION_SCHEMA.ST_GEOMETRY_COLUMNS
System Tables:
- Data dictionary tables are stored in the
mysqlschema withdd_prefix - These tables are hidden and not directly accessible
- Access metadata through INFORMATION_SCHEMA or SHOW commands
- Protected by MySQL internal access control
Migration from MySQL 5.7:
- MySQL 8.0 upgrade process automatically migrates from .frm files
mysql_upgradeconverts file-based metadata to data dictionary- Old file formats are no longer created or used
Code Example:
-- View data dictionary tables (indirectly through INFORMATION_SCHEMA)
SELECT TABLE_NAME, ENGINE, TABLE_ROWS, DATA_LENGTH
FROM INFORMATION_SCHEMA.TABLES
WHERE TABLE_SCHEMA = 'mysql' AND TABLE_NAME LIKE '%tables%';
-- Data dictionary enables atomic DDL
START TRANSACTION;
CREATE TABLE test_atomic (
id INT PRIMARY KEY,
data VARCHAR(100)
);
-- If this fails, no orphaned files are left behind
ALTER TABLE test_atomic ADD COLUMN created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP;
COMMIT;
-- INFORMATION_SCHEMA now queries data dictionary tables efficiently
SELECT
TABLE_SCHEMA,
TABLE_NAME,
COLUMN_NAME,
DATA_TYPE,
IS_NULLABLE
FROM INFORMATION_SCHEMA.COLUMNS
WHERE TABLE_SCHEMA = 'ecommerce'
AND TABLE_NAME = 'orders';
-- View indexes stored in data dictionary
SELECT
TABLE_NAME,
INDEX_NAME,
INDEX_TYPE,
NON_UNIQUE,
SEQ_IN_INDEX,
COLUMN_NAME
FROM INFORMATION_SCHEMA.STATISTICS
WHERE TABLE_SCHEMA = 'ecommerce'
ORDER BY TABLE_NAME, INDEX_NAME, SEQ_IN_INDEX;
-- Check foreign key constraints from data dictionary
SELECT
CONSTRAINT_NAME,
TABLE_NAME,
COLUMN_NAME,
REFERENCED_TABLE_NAME,
REFERENCED_COLUMN_NAME
FROM INFORMATION_SCHEMA.KEY_COLUMN_USAGE
WHERE TABLE_SCHEMA = 'ecommerce'
AND REFERENCED_TABLE_NAME IS NOT NULL;
-- View table statistics from data dictionary
SELECT
TABLE_SCHEMA,
TABLE_NAME,
TABLE_ROWS,
AVG_ROW_LENGTH,
DATA_LENGTH / 1024 / 1024 AS data_size_mb,
INDEX_LENGTH / 1024 / 1024 AS index_size_mb,
CREATE_TIME,
UPDATE_TIME
FROM INFORMATION_SCHEMA.TABLES
WHERE TABLE_SCHEMA NOT IN ('mysql', 'information_schema', 'performance_schema', 'sys')
ORDER BY DATA_LENGTH DESC;
-- Atomic DDL: Drop database with all tables atomically
DROP DATABASE IF EXISTS old_project;
-- In MySQL 5.7, this could leave orphaned files if interrupted
-- In MySQL 8.0, it's completely atomic
-- View partitions information (now in data dictionary)
SELECT
TABLE_NAME,
PARTITION_NAME,
PARTITION_METHOD,
PARTITION_EXPRESSION,
TABLE_ROWS
FROM INFORMATION_SCHEMA.PARTITIONS
WHERE TABLE_SCHEMA = 'analytics'
AND PARTITION_NAME IS NOT NULL;
-- Check data dictionary version
SELECT * FROM mysql.dd_properties;
-- Verify no .frm files in data directory (MySQL 8.0)
-- Run from shell:
-- ls -la /var/lib/mysql/mydb/*.frm # Should return no results
-- Performance comparison: Metadata queries are faster
SELECT COUNT(*) FROM INFORMATION_SCHEMA.TABLES;
-- This is much faster in MySQL 8.0 than 5.7
References:
↑ Back to topTransactions and Locking
What is a phantom read and how does InnoDB prevent it?
The 30-Second Answer:
A phantom read occurs when a transaction re-executes a query and finds different rows than before due to another transaction's INSERT or DELETE. InnoDB prevents phantom reads at REPEATABLE READ level using next-key locks - a combination of row locks and gap locks that lock both existing rows and the gaps between them, preventing other transactions from inserting rows in the locked range.
The 2-Minute Answer (If They Want More):
What is a Phantom Read?
A phantom read happens when:
- Transaction A executes a query with a WHERE clause
- Transaction B inserts or deletes rows that match that WHERE clause
- Transaction A re-executes the same query and sees different rows ("phantoms")
Example scenario:
T1: SELECT COUNT(*) FROM users WHERE age > 25; -- Returns 10
T2: INSERT INTO users (name, age) VALUES ('Alice', 30);
T2: COMMIT;
T1: SELECT COUNT(*) FROM users WHERE age > 25; -- Returns 11 (phantom!)
This violates the REPEATABLE READ guarantee that a transaction should see consistent data.
How InnoDB Prevents Phantoms:
InnoDB uses next-key locking - a combination of:
- Record Locks - Lock individual index records
- Gap Locks - Lock the space between index records
- Next-Key Locks - Lock both a record and the gap before it
When you use a locking read (SELECT ... FOR UPDATE or SELECT ... FOR SHARE) with a range condition, InnoDB locks:
- All matching rows
- All gaps where new matching rows could be inserted
This prevents other transactions from inserting phantoms into the locked range.
Important notes:
- Gap locking only occurs at REPEATABLE READ and SERIALIZABLE levels
- Gap locks are only on non-unique indexes and range scans
- Unique index equality searches only use record locks (no gap lock needed)
- At READ COMMITTED level, gap locking is disabled (phantoms are possible)
- MVCC provides phantom protection for regular SELECTs without locking
InnoDB's default REPEATABLE READ with MVCC means regular (non-locking) SELECTs don't see phantoms because they read from a consistent snapshot. Next-key locks prevent phantoms for locking reads and write operations.
Code Example:
-- Create a test table
CREATE TABLE products (
id INT PRIMARY KEY AUTO_INCREMENT,
category VARCHAR(50),
price DECIMAL(10,2),
INDEX idx_category (category)
) ENGINE=InnoDB;
INSERT INTO products (category, price) VALUES
('Electronics', 100),
('Electronics', 300),
('Electronics', 500),
('Books', 20),
('Books', 35);
-- Demonstrate phantom read prevention
-- Session 1: Start transaction with locking read
START TRANSACTION;
SELECT * FROM products
WHERE category = 'Electronics'
FOR UPDATE;
-- This locks all 'Electronics' rows AND the gaps around them
-- Session 2: Try to insert into locked range
INSERT INTO products (category, price)
VALUES ('Electronics', 200);
-- This will BLOCK because it tries to insert into a locked gap
-- Session 1: The same query still returns same results (no phantoms)
SELECT * FROM products WHERE category = 'Electronics' FOR UPDATE;
COMMIT;
-- Now Session 2's INSERT completes
-- Example with range query
-- Session 1:
START TRANSACTION;
SELECT * FROM products
WHERE price BETWEEN 100 AND 400
FOR UPDATE;
-- Locks rows with price 100, 300 and gaps between them
-- Session 2:
INSERT INTO products (category, price) VALUES ('Electronics', 250);
-- BLOCKS - trying to insert into locked gap
-- View locks in information schema
SELECT
lock_type,
lock_mode,
lock_status,
lock_data
FROM performance_schema.data_locks
WHERE object_name = 'products';
-- REPEATABLE READ with non-locking read (MVCC prevents phantoms)
-- Session 1:
SET SESSION TRANSACTION ISOLATION LEVEL REPEATABLE READ;
START TRANSACTION;
SELECT COUNT(*) FROM products WHERE category = 'Electronics'; -- Returns 3
-- Session 2:
INSERT INTO products (category, price) VALUES ('Electronics', 450);
COMMIT;
-- Session 1:
SELECT COUNT(*) FROM products WHERE category = 'Electronics'; -- Still returns 3 (no phantom)
COMMIT;
-- Now a new transaction sees the inserted row
SELECT COUNT(*) FROM products WHERE category = 'Electronics'; -- Returns 4
-- READ COMMITTED allows phantoms
-- Session 1:
SET SESSION TRANSACTION ISOLATION LEVEL READ COMMITTED;
START TRANSACTION;
SELECT COUNT(*) FROM products WHERE category = 'Books'; -- Returns 2
-- Session 2:
INSERT INTO products (category, price) VALUES ('Books', 45);
COMMIT;
-- Session 1:
SELECT COUNT(*) FROM products WHERE category = 'Books'; -- Returns 3 (phantom read!)
COMMIT;
References:
↑ Back to topPartitioning
What is table partitioning in MySQL?
The 30-Second Answer:
Table partitioning is a database design technique that divides a large table into smaller, more manageable pieces called partitions, while logically maintaining it as a single table. Each partition can be stored, indexed, and managed independently, improving query performance and maintenance operations. MySQL supports partitioning at the storage engine level, allowing you to distribute data across multiple physical files based on defined rules.
The 2-Minute Answer (If They Want More):
Table partitioning in MySQL is a method of decomposing large tables into smaller physical segments (partitions) that are transparent to applications. The table remains a single logical entity for queries, but internally MySQL can optimize operations by accessing only relevant partitions.
Key benefits include:
- Performance Improvement: Queries that access a subset of data can use partition pruning to scan only relevant partitions instead of the entire table
- Easier Maintenance: You can perform maintenance operations (backup, restore, rebuild indexes) on individual partitions
- Bulk Data Management: Efficiently add or remove large amounts of data by adding/dropping partitions
- Improved Archival: Old data can be archived by simply dropping or archiving specific partitions
Partitioning works by using a partitioning function on one or more columns (the partitioning key) to determine which partition stores each row. Common use cases include time-series data (partitioned by date), geographical data (partitioned by region), or any large dataset that has a natural division criterion.
MySQL evaluates the partitioning expression for each row and routes it to the appropriate partition. When querying, if the WHERE clause includes the partitioning key, MySQL can eliminate irrelevant partitions from the search (partition pruning), dramatically reducing I/O and improving performance.
Code Example:
-- Create a partitioned table by RANGE (common for time-series data)
CREATE TABLE sales (
id INT NOT NULL AUTO_INCREMENT,
sale_date DATE NOT NULL,
amount DECIMAL(10, 2),
region VARCHAR(50),
PRIMARY KEY (id, sale_date)
) PARTITION BY RANGE (YEAR(sale_date)) (
PARTITION p2020 VALUES LESS THAN (2021),
PARTITION p2021 VALUES LESS THAN (2022),
PARTITION p2022 VALUES LESS THAN (2023),
PARTITION p2023 VALUES LESS THAN (2024),
PARTITION p_future VALUES LESS THAN MAXVALUE
);
-- View partition information
SELECT
PARTITION_NAME,
PARTITION_EXPRESSION,
TABLE_ROWS,
DATA_LENGTH
FROM INFORMATION_SCHEMA.PARTITIONS
WHERE TABLE_NAME = 'sales';
-- Add a new partition for 2025
ALTER TABLE sales
REORGANIZE PARTITION p_future INTO (
PARTITION p2024 VALUES LESS THAN (2025),
PARTITION p_future VALUES LESS THAN MAXVALUE
);
-- Drop old data efficiently by removing a partition
ALTER TABLE sales DROP PARTITION p2020;
-- Query with partition pruning (only scans p2023 partition)
EXPLAIN PARTITIONS
SELECT * FROM sales
WHERE sale_date BETWEEN '2023-01-01' AND '2023-12-31';
-- Check which partitions are accessed
SELECT * FROM sales
WHERE sale_date = '2023-06-15'
-- MySQL will only scan the p2023 partition
References:
↑ Back to top