SQL Interview Questions (Free Preview)
Free sample of 15 from 105 questions available
SQL Interview Questions - Fundamentals
What is the difference between SQL and NoSQL databases?
| Aspect | SQL Databases | NoSQL Databases |
|---|---|---|
| Structure | Structured, table-based with rows and columns | Flexible: document, key-value, column-family, or graph |
| Schema | Fixed schema, predefined structure | Dynamic/flexible schema |
| Query Language | Standardized SQL | Varies by database type |
| ACID Properties | Strong ACID compliance | Eventually consistent (BASE properties) |
| Scalability | Vertical scaling (scale-up) | Horizontal scaling (scale-out) |
| Relationships | Complex relationships via JOINs | Denormalized data, embedded documents |
| Use Cases | Financial systems, CRM, ERP | Big data, real-time analytics, content management |
Examples:
- SQL: MySQL, PostgreSQL, Oracle, SQL Server
- NoSQL: MongoDB (document), Redis (key-value), Cassandra (column-family), Neo4j (graph)
References:
↑ Back to topWhat is a primary key and why is it important?
A primary key is a column (or combination of columns) that uniquely identifies each record in a table. It serves as the main identifier for rows and ensures data integrity.
Key Characteristics:
- Uniqueness: No two records can have the same primary key value
- Non-null: Primary key values cannot be NULL
- Immutable: Primary key values should not change once assigned
- Minimal: Should contain the minimum number of columns necessary for uniqueness
-- Single column primary key
CREATE TABLE customers (
customer_id INT PRIMARY KEY,
customer_name VARCHAR(100),
email VARCHAR(100)
);
-- Composite primary key (multiple columns)
CREATE TABLE order_items (
order_id INT,
product_id INT,
quantity INT,
PRIMARY KEY (order_id, product_id)
);
Why Primary Keys Are Important:
- Data Integrity: Prevents duplicate records
- Indexing: Automatically creates a clustered index for fast data retrieval
- Relationships: Serves as a reference point for foreign keys
- Replication: Essential for database replication and synchronization
- Performance: Optimizes query execution and JOIN operations
-- Foreign key relationship using primary key
CREATE TABLE orders (
order_id INT PRIMARY KEY,
customer_id INT,
order_date DATE,
FOREIGN KEY (customer_id) REFERENCES customers(customer_id)
);
Reference: Primary Key Constraints
↑ Back to topBasic Queries
What is the difference between WHERE and HAVING clauses?
The key difference between WHERE and HAVING lies in when they are applied during query execution and what they can filter.
| Aspect | WHERE | HAVING |
|---|---|---|
| Applied | Before grouping | After grouping |
| Filters | Individual rows | Groups of rows |
| Can use aggregates | No | Yes |
| Performance | Generally faster | Can be slower |
| Execution order | Earlier in query processing | Later in query processing |
Here's a practical example demonstrating both clauses:
-- Using WHERE clause (filters individual rows before grouping)
SELECT department, COUNT(*), AVG(salary)
FROM employees
WHERE hire_date > '2020-01-01' -- Filter individual employees
GROUP BY department
HAVING COUNT(*) > 5; -- Filter departments with more than 5 employees
-- This query:
-- 1. Filters employees hired after 2020-01-01 (WHERE)
-- 2. Groups remaining employees by department
-- 3. Calculates count and average salary for each department
-- 4. Only shows departments with more than 5 employees (HAVING)
-- Example showing what each clause can and cannot do
SELECT department, AVG(salary) as avg_salary
FROM employees
WHERE salary > 50000 -- ✅ Valid: filters individual rows
GROUP BY department
HAVING AVG(salary) > 80000; -- ✅ Valid: filters groups using aggregate
-- WHERE AVG(salary) > 80000 -- ❌ Invalid: cannot use aggregates in WHERE
-- HAVING salary > 50000 -- ❌ Inefficient: should use WHERE instead
When to use each:
- Use
WHEREto filter individual records before expensive grouping operations - Use
HAVINGto filter groups based on aggregate calculations - Combine both for optimal performance and flexibility
Joins and Relationships
What are the different types of JOINs in SQL?
SQL provides several types of JOIN operations to combine data from multiple tables based on related columns. The main types of JOINs are:
| JOIN Type | Description | Returns |
|---|---|---|
INNER JOIN |
Returns only matching records | Records that exist in both tables |
LEFT JOIN (LEFT OUTER JOIN) |
Returns all records from left table | All left table records + matching right table records |
RIGHT JOIN (RIGHT OUTER JOIN) |
Returns all records from right table | All right table records + matching left table records |
FULL OUTER JOIN |
Returns all records from both tables | All records from both tables, matched where possible |
CROSS JOIN |
Cartesian product of both tables | Every combination of rows from both tables |
SELF JOIN |
Table joined with itself | Records from the same table based on a relationship |
Here's a visual representation of the most common JOINs:
-- Example tables for demonstration
CREATE TABLE employees (
id INT PRIMARY KEY,
name VARCHAR(50),
department_id INT
);
CREATE TABLE departments (
id INT PRIMARY KEY,
department_name VARCHAR(50)
);
This foundational understanding helps developers choose the appropriate JOIN type based on the specific data retrieval requirements.
References:
↑ Back to topAggregate Functions and Grouping
What are aggregate functions? Name the most common ones.
Aggregate functions are SQL functions that perform calculations on a set of values and return a single value. They operate on multiple rows of data to produce summary statistics and are essential for data analysis and reporting. These functions ignore NULL values (except for COUNT(*)).
Most Common Aggregate Functions:
| Function | Purpose | Example Usage |
|---|---|---|
COUNT() |
Counts the number of rows | COUNT(*), COUNT(column_name) |
SUM() |
Calculates the total sum | SUM(salary) |
AVG() |
Calculates the average value | AVG(age) |
MIN() |
Finds the minimum value | MIN(price) |
MAX() |
Finds the maximum value | MAX(score) |
GROUP_CONCAT() |
Concatenates values from multiple rows | GROUP_CONCAT(name) (MySQL) |
STRING_AGG() |
Concatenates string values | STRING_AGG(name, ',') (PostgreSQL) |
Example Implementation:
-- Basic aggregate functions example
SELECT
COUNT(*) as total_employees,
SUM(salary) as total_payroll,
AVG(salary) as average_salary,
MIN(salary) as lowest_salary,
MAX(salary) as highest_salary
FROM employees;
This query demonstrates how aggregate functions summarize data from the entire employees table, providing key statistics about employee salaries in a single result row.
References:
↑ Back to topSchema Design Interview Questions - Professional Answers
What is database normalization and why is it important?
Database normalization is a systematic process of organizing data in a relational database to minimize redundancy and improve data integrity. It involves decomposing tables into smaller, related tables and defining relationships between them using foreign keys.
Why normalization is important:
| Benefit | Description |
|---|---|
| Data Integrity | Eliminates inconsistent data by storing each piece of information in only one place |
| Storage Efficiency | Reduces data redundancy, saving storage space |
| Update Anomalies Prevention | Prevents inconsistencies when updating data across multiple locations |
| Insertion Anomalies Prevention | Eliminates the need to enter redundant data when inserting new records |
| Deletion Anomalies Prevention | Prevents loss of important data when deleting records |
-- Example of unnormalized data (problematic)
CREATE TABLE customer_orders (
order_id INT,
customer_name VARCHAR(100),
customer_email VARCHAR(100),
customer_phone VARCHAR(20),
product_name VARCHAR(100),
product_price DECIMAL(10,2)
);
-- Normalized approach (better)
CREATE TABLE customers (
customer_id INT PRIMARY KEY,
customer_name VARCHAR(100),
customer_email VARCHAR(100),
customer_phone VARCHAR(20)
);
CREATE TABLE products (
product_id INT PRIMARY KEY,
product_name VARCHAR(100),
product_price DECIMAL(10,2)
);
CREATE TABLE orders (
order_id INT PRIMARY KEY,
customer_id INT,
product_id INT,
FOREIGN KEY (customer_id) REFERENCES customers(customer_id),
FOREIGN KEY (product_id) REFERENCES products(product_id)
);
The normalized approach eliminates redundancy by separating customer and product information into their own tables, creating cleaner relationships and preventing data inconsistencies.
References:
↑ Back to topWhat is a foreign key and how does it enforce referential integrity?
A foreign key is a column or combination of columns in one table that refers to the primary key of another table. It establishes a link between two tables and enforces referential integrity by ensuring that the relationship between tables remains consistent.
How foreign keys enforce referential integrity:
-- Parent table (referenced table)
CREATE TABLE customers (
customer_id INT PRIMARY KEY,
customer_name VARCHAR(100),
email VARCHAR(100)
);
-- Child table (referencing table)
CREATE TABLE orders (
order_id INT PRIMARY KEY,
customer_id INT,
order_date DATE,
total_amount DECIMAL(10,2),
FOREIGN KEY (customer_id) REFERENCES customers(customer_id)
);
This foreign key constraint ensures that every customer_id in the orders table must correspond to an existing customer_id in the customers table.
Referential integrity rules enforced:
| Rule | Description | Example |
|---|---|---|
| Insert Rule | Cannot insert a record with a foreign key value that doesn't exist in parent table | Cannot create order for non-existent customer |
| Update Rule | Cannot update foreign key to a value that doesn't exist in parent table | Cannot change order's customer_id to invalid customer |
| Delete Rule | Cannot delete a parent record if child records reference it | Cannot delete customer who has orders (without handling children) |
-- These operations will FAIL due to referential integrity:
-- 1. Insert order for non-existent customer
INSERT INTO orders (order_id, customer_id, order_date, total_amount)
VALUES (101, 999, '2024-01-15', 150.00); -- ERROR: customer_id 999 doesn't exist
-- 2. Update order to reference non-existent customer
UPDATE orders SET customer_id = 888 WHERE order_id = 101; -- ERROR: customer_id 888 doesn't exist
-- 3. Delete customer who has orders
DELETE FROM customers WHERE customer_id = 1; -- ERROR: orders reference this customer
Handling referential integrity with cascading actions:
CREATE TABLE orders (
order_id INT PRIMARY KEY,
customer_id INT,
order_date DATE,
total_amount DECIMAL(10,2),
FOREIGN KEY (customer_id) REFERENCES customers(customer_id)
ON DELETE CASCADE -- Delete orders when customer is deleted
ON UPDATE CASCADE -- Update orders when customer_id changes
);
-- Alternative cascading options:
-- ON DELETE SET NULL -- Set foreign key to NULL when parent is deleted
-- ON DELETE RESTRICT -- Prevent deletion of parent if children exist
-- ON UPDATE SET NULL -- Set foreign key to NULL when parent key changes
-- ON UPDATE RESTRICT -- Prevent update of parent key if children exist
The CASCADE option automatically handles child records when parent records are modified, maintaining referential integrity while allowing necessary operations.
Benefits of foreign keys:
- Data Consistency: Prevents orphaned records and invalid references
- Data Quality: Ensures relationships between tables remain valid
- Documentation: Explicitly defines table relationships in schema
- Query Optimization: Database can use foreign keys for join optimization
References:
↑ Back to topWhat is the difference between a clustered and non-clustered index?
Indexes are database structures that improve query performance by creating shortcuts to data. The fundamental difference between clustered and non-clustered indexes lies in how they store and organize the actual table data.
Clustered Index
A clustered index determines the physical storage order of data in the table. The table data is stored in the same order as the clustered index key.
-- Creating a clustered index (usually on primary key)
CREATE TABLE employees (
emp_id INT PRIMARY KEY, -- Automatically creates clustered index
last_name VARCHAR(50),
first_name VARCHAR(50),
department_id INT,
salary DECIMAL(10,2)
);
-- Explicit clustered index creation (SQL Server syntax)
CREATE CLUSTERED INDEX IX_employees_emp_id ON employees(emp_id);
Non-Clustered Index
A non-clustered index is a separate structure that contains pointers to the actual data rows. The table data remains in its original physical order.
-- Creating non-clustered indexes for better query performance
CREATE INDEX IX_employees_lastname ON employees(last_name);
CREATE INDEX IX_employees_dept_salary ON employees(department_id, salary);
-- Index with included columns (covering index)
CREATE INDEX IX_employees_covering
ON employees(department_id)
INCLUDE (first_name, last_name, salary);
The covering index includes additional columns that can satisfy queries without accessing the base table, further improving performance.
Key Differences Comparison
| Aspect | Clustered Index | Non-Clustered Index |
|---|---|---|
| Data Storage | Data pages stored in index order | Separate structure with pointers to data |
| Physical Order | Determines table's physical data order | Doesn't affect data storage order |
| Per Table Limit | Only one per table | Multiple allowed (typically 999+ depending on DBMS) |
| Storage Space | No additional space (data IS the index) | Additional storage required for index structure |
| Insert Performance | Slower (may require page splits) | Faster (just add pointer) |
| Range Queries | Excellent (sequential reads) | Good (but may require random access) |
| Key Lookups | Direct data access | Requires bookmark lookup to get full row |
Performance Implications
-- Query benefiting from clustered index (range scan)
SELECT * FROM employees
WHERE emp_id BETWEEN 1000 AND 2000
ORDER BY emp_id;
-- Very efficient: data is physically ordered by emp_id
-- Query benefiting from non-clustered index
SELECT emp_id, first_name, last_name
FROM employees
WHERE last_name = 'Smith';
-- Uses IX_employees_lastname index to quickly find matching rows
-- Query using covering index (no key lookup needed)
SELECT first_name, last_name, salary
FROM employees
WHERE department_id = 10;
-- IX_employees_covering provides all needed data without accessing base table
Choosing Between Index Types
Use Clustered Index for:
- Primary key columns (default behavior)
- Columns frequently used in range queries
- Columns used in ORDER BY clauses
- Columns with high selectivity and frequent access
Use Non-Clustered Index for:
- Columns frequently used in WHERE clauses
- Foreign key columns used in joins
- Columns used in GROUP BY operations
- Supporting multiple query patterns
-- Example: Optimal indexing strategy for an orders table
CREATE TABLE orders (
order_id INT PRIMARY KEY, -- Clustered index (automatic)
customer_id INT,
order_date DATE,
status VARCHAR(20),
total_amount DECIMAL(10,2)
);
-- Non-clustered indexes for common query patterns
CREATE INDEX IX_orders_customer ON orders(customer_id);
CREATE INDEX IX_orders_date ON orders(order_date);
CREATE INDEX IX_orders_status ON orders(status);
-- Covering index for reporting queries
CREATE INDEX IX_orders_reporting
ON orders(order_date, status)
INCLUDE (customer_id, total_amount);
This indexing strategy supports various query patterns efficiently while maintaining reasonable storage overhead and insert performance.
References:
↑ Back to topSubqueries and CTEs
What is a subquery and what are the different types?
A subquery (also called a nested query or inner query) is a SQL query that is embedded within another SQL query. The subquery is executed first, and its result is used by the outer query to complete the operation.
Types of Subqueries
| Type | Description | Usage Context |
|---|---|---|
| Scalar Subquery | Returns a single value (one row, one column) | Used in SELECT, WHERE, or HAVING clauses |
| Row Subquery | Returns a single row with multiple columns | Used with row constructors |
| Column Subquery | Returns multiple rows with a single column | Used with IN, ANY, ALL operators |
| Table Subquery | Returns multiple rows and columns | Used in FROM clause |
Examples of Different Subquery Types
-- Scalar subquery: Find employees earning more than average salary
SELECT employee_name, salary
FROM employees
WHERE salary > (SELECT AVG(salary) FROM employees);
-- Column subquery: Find employees in specific departments
SELECT employee_name
FROM employees
WHERE department_id IN (SELECT department_id FROM departments WHERE location = 'New York');
-- Table subquery: Use subquery as a table source
SELECT emp.employee_name, dept_stats.avg_salary
FROM employees emp
JOIN (
SELECT department_id, AVG(salary) as avg_salary
FROM employees
GROUP BY department_id
) dept_stats ON emp.department_id = dept_stats.department_id;
These examples demonstrate how subqueries can be used to break down complex queries into manageable parts, making them more readable and maintainable.
↑ Back to topWindow Functions
What are window functions and how do they differ from aggregate functions?
Window functions are advanced SQL features that perform calculations across a set of rows related to the current row within a result set, without collapsing the rows into a single output row. They operate on a "window" of data and return a value for each row in the original result set.
Key differences from aggregate functions:
| Aspect | Window Functions | Aggregate Functions |
|---|---|---|
| Row preservation | Preserve all original rows | Collapse rows into groups |
| Output | One result per input row | One result per group |
| GROUP BY requirement | No GROUP BY needed | Requires GROUP BY for multiple rows |
| Calculation scope | Over a defined window of rows | Over entire group or table |
| Syntax | Uses OVER() clause | Standalone function calls |
Example demonstrating the difference:
-- Sample data: employee salaries by department
SELECT
employee_name,
department,
salary,
-- Window function: shows individual rows with additional calculated column
AVG(salary) OVER (PARTITION BY department) as dept_avg_salary,
-- This would require GROUP BY and collapse rows:
-- AVG(salary) as dept_avg -- This won't work without GROUP BY
FROM employees;
This query shows each employee with their department's average salary, preserving all individual rows. An aggregate function would require grouping and would only show one row per department.
References:
↑ Back to topAdvanced Data Manipulation
What is the difference between DELETE, DROP, and TRUNCATE?
These three SQL commands serve different purposes for data and structure removal, each with distinct characteristics and use cases.
Detailed Comparison
| Aspect | DELETE | TRUNCATE | DROP |
|---|---|---|---|
| Purpose | Remove specific rows | Remove all rows | Remove entire table/structure |
| Structure Impact | Preserves table structure | Preserves table structure | Removes table completely |
| WHERE Clause | Supported | Not supported | Not applicable |
| Transaction Log | Logs each row deletion | Minimal logging | Logs structure removal |
| Rollback | Can be rolled back | Can be rolled back (in transaction) | Can be rolled back (in transaction) |
| Auto-increment Reset | No | Yes | N/A (table removed) |
| Speed | Slower (row-by-row) | Faster (page deallocation) | Fast (metadata operation) |
| Triggers | Fires DELETE triggers | Does not fire triggers | Fires DROP triggers |
DELETE Command
-- Remove specific records based on conditions
DELETE FROM employees
WHERE department = 'Marketing'
AND hire_date < '2020-01-01';
-- Remove all records (but keep table structure)
DELETE FROM employees;
-- Delete with JOIN (remove employees from closed departments)
DELETE e
FROM employees e
INNER JOIN departments d ON e.department_id = d.department_id
WHERE d.status = 'CLOSED';
The DELETE command provides granular control over which records to remove and maintains full transaction logging. It's the safest option when you need to remove specific data while preserving the ability to rollback.
TRUNCATE Command
-- Remove all rows from table quickly
TRUNCATE TABLE employees;
-- Note: TRUNCATE cannot be used with WHERE clause
-- This would cause an error:
-- TRUNCATE TABLE employees WHERE department = 'IT';
TRUNCATE is optimized for removing all data from a table quickly. It deallocates data pages rather than deleting rows individually, making it much faster than DELETE for clearing entire tables.
DROP Command
-- Remove entire table structure and data
DROP TABLE employees;
-- Drop with existence check
DROP TABLE IF EXISTS temp_employees;
-- Drop multiple related objects
DROP TABLE employees CASCADE; -- PostgreSQL: drops dependent objects
DROP TABLE employees; -- SQL Server: requires manual dependency removal
-- Drop other database objects
DROP INDEX idx_employee_name ON employees;
DROP VIEW employee_summary;
DROP PROCEDURE calculate_bonus;
DROP completely removes database objects from the system. Once executed, both the data and structure are permanently removed (unless restored from backup).
Practical Examples and Best Practices
-- Scenario 1: Data cleanup (use DELETE)
DELETE FROM order_logs
WHERE created_date < DATEADD(month, -6, GETDATE());
-- Scenario 2: Reset table for new data load (use TRUNCATE)
TRUNCATE TABLE staging_data;
-- Then bulk insert new data
-- Scenario 3: Remove temporary table (use DROP)
DROP TABLE IF EXISTS #temp_calculations;
-- Best practice: Always use transactions for safety
BEGIN TRANSACTION;
DELETE FROM employees WHERE status = 'TERMINATED';
-- Verify the results
SELECT COUNT(*) FROM employees WHERE status = 'TERMINATED';
COMMIT; -- or ROLLBACK if something went wrong
This example demonstrates appropriate use cases for each command and shows how transactions provide safety when making destructive changes to data.
↑ Back to topQuery Performance
What are some common causes of poor query performance?
Poor query performance typically stems from several identifiable issues. Understanding these common causes helps developers proactively design efficient queries and database schemas.
1. Missing or inappropriate indexes:
-- Problem: No index on frequently queried column
SELECT * FROM orders WHERE customer_id = 12345; -- Table scan on millions of rows
-- Solution: Add appropriate index
CREATE INDEX idx_orders_customer_id ON orders(customer_id);
2. Inefficient WHERE clauses:
Non-sargable conditions:
-- Problem: Function in WHERE clause prevents index usage
SELECT * FROM employees WHERE UPPER(last_name) = 'SMITH';
-- Solution: Use functional index or store normalized data
CREATE INDEX idx_employees_upper_name ON employees(UPPER(last_name));
-- Or better: Store normalized data
SELECT * FROM employees WHERE last_name = 'Smith';
Leading wildcards:
-- Problem: Cannot use index effectively
SELECT * FROM products WHERE name LIKE '%phone%';
-- Better: Trailing wildcard can use index
SELECT * FROM products WHERE name LIKE 'phone%';
3. Suboptimal JOIN operations:
-- Problem: Cartesian product due to missing JOIN condition
SELECT * FROM orders o, customers c; -- Missing WHERE clause
-- Solution: Proper JOIN syntax
SELECT * FROM orders o
INNER JOIN customers c ON o.customer_id = c.id;
-- Problem: Wrong JOIN order (large table first)
SELECT * FROM huge_table h
JOIN small_lookup l ON h.lookup_id = l.id;
-- Better: Let optimizer choose or force small table first
SELECT * FROM small_lookup l
JOIN huge_table h ON l.id = h.lookup_id;
4. SELECT * abuse:
-- Problem: Retrieving unnecessary data
SELECT * FROM products WHERE category = 'Electronics';
-- Solution: Select only needed columns
SELECT id, name, price FROM products WHERE category = 'Electronics';
5. Lack of query limits:
-- Problem: Returning entire dataset
SELECT * FROM log_table ORDER BY created_date DESC;
-- Solution: Use appropriate limits
SELECT * FROM log_table
ORDER BY created_date DESC
LIMIT 100;
6. Outdated table statistics:
-- Problem: Query optimizer makes poor decisions due to stale statistics
-- Solution: Regular statistics updates
ANALYZE TABLE employees; -- MySQL
ANALYZE employees; -- PostgreSQL
UPDATE STATISTICS employees; -- SQL Server
7. Parameter sniffing (SQL Server specific):
-- Problem: Plan optimized for first parameter value
CREATE PROCEDURE GetOrdersByDate(@date DATE)
AS
SELECT * FROM orders WHERE order_date = @date;
-- Solution: Use OPTION (RECOMPILE) or local variables
CREATE PROCEDURE GetOrdersByDate(@date DATE)
AS
DECLARE @local_date DATE = @date;
SELECT * FROM orders WHERE order_date = @local_date;
8. Excessive data type conversions:
-- Problem: Implicit conversion prevents index usage
SELECT * FROM orders WHERE customer_id = '12345'; -- customer_id is INT
-- Solution: Use correct data types
SELECT * FROM orders WHERE customer_id = 12345;
Common performance killers summary:
| Issue | Impact | Solution |
|---|---|---|
| Missing indexes | Full table scans | Add appropriate indexes |
| Functions in WHERE | Index not usable | Functional indexes or data normalization |
| SELECT * | Unnecessary I/O | Select specific columns |
| No LIMIT clause | Memory exhaustion | Always use appropriate limits |
| Stale statistics | Poor execution plans | Regular ANALYZE/UPDATE STATISTICS |
| Wrong data types | Implicit conversions | Use matching data types |
| Complex subqueries | Multiple table scans | CTEs or JOINs |
| Lack of partitioning | Large table scans | Implement table partitioning |
Monitoring and detection:
// Node.js example: Query performance monitoring
const queryMonitor = {
async executeWithProfiling(query, params) {
const start = Date.now();
const result = await db.query(query, params);
const duration = Date.now() - start;
// Flag potential issues
if (duration > 5000) {
console.warn('Very slow query detected:', {
query: query.substring(0, 100),
duration,
rowCount: result.rowCount
});
}
return result;
}
};
This monitoring approach helps identify performance issues in production environments before they impact users significantly.
References:
↑ Back to topACID Properties
What are ACID properties in database transactions?
ACID is an acronym that represents four fundamental properties that guarantee reliable processing of database transactions. These properties ensure data integrity and consistency in database systems, especially in multi-user environments.
| Property | Description | Purpose |
|---|---|---|
| Atomicity | All operations in a transaction succeed or fail together | Prevents partial updates |
| Consistency | Database remains in a valid state before and after transaction | Maintains data integrity |
| Isolation | Concurrent transactions don't interfere with each other | Prevents data corruption |
| Durability | Committed changes persist even after system failures | Ensures data permanence |
Atomicity
Atomicity ensures that a transaction is treated as a single, indivisible unit. Either all operations within the transaction are completed successfully, or none of them are applied to the database.
-- Example: Bank transfer transaction
BEGIN TRANSACTION;
UPDATE accounts SET balance = balance - 100 WHERE account_id = 'A001';
UPDATE accounts SET balance = balance + 100 WHERE account_id = 'A002';
COMMIT;
-- If any operation fails, the entire transaction is rolled back
This code demonstrates a bank transfer where money is deducted from one account and added to another. Atomicity ensures that if either operation fails, both are rolled back, preventing inconsistent states where money could be lost or created.
Consistency
Consistency ensures that the database transitions from one valid state to another, maintaining all defined rules, constraints, and relationships.
Isolation
Isolation ensures that concurrent transactions don't interfere with each other, preventing issues like dirty reads, phantom reads, and non-repeatable reads.
Durability
Durability guarantees that once a transaction is committed, its effects are permanently stored and will survive system crashes, power failures, or other system errors.
References:
↑ Back to topWhat are the different isolation levels in SQL?
SQL defines four standard isolation levels that provide different degrees of isolation between concurrent transactions. Each level makes different trade-offs between data consistency and system performance.
SQL Standard Isolation Levels
| Level | Dirty Read | Non-repeatable Read | Phantom Read | Performance |
|---|---|---|---|---|
| READ UNCOMMITTED | Possible | Possible | Possible | Highest |
| READ COMMITTED | Prevented | Possible | Possible | High |
| REPEATABLE READ | Prevented | Prevented | Possible | Medium |
| SERIALIZABLE | Prevented | Prevented | Prevented | Lowest |
1. READ UNCOMMITTED
The lowest isolation level where transactions can read uncommitted data from other transactions.
-- Setting isolation level
SET TRANSACTION ISOLATION LEVEL READ UNCOMMITTED;
BEGIN TRANSACTION;
-- Can read data that other transactions haven't committed yet
SELECT balance FROM accounts WHERE account_id = 'A001';
COMMIT;
This code allows reading uncommitted changes from other transactions, providing maximum concurrency but risking dirty reads. It's typically used for reporting where approximate data is acceptable.
2. READ COMMITTED
Transactions can only read committed data, but the same query might return different results if run multiple times within the same transaction.
SET TRANSACTION ISOLATION LEVEL READ COMMITTED;
BEGIN TRANSACTION;
SELECT COUNT(*) FROM products WHERE price > 100; -- Returns 50
-- Another transaction commits new products
SELECT COUNT(*) FROM products WHERE price > 100; -- Might return 52
COMMIT;
This example shows how READ COMMITTED prevents dirty reads but allows non-repeatable reads. The count might change between reads if other transactions commit new data.
3. REPEATABLE READ
Ensures that if a transaction reads a row, subsequent reads of the same row within the transaction will return the same data.
SET TRANSACTION ISOLATION LEVEL REPEATABLE READ;
BEGIN TRANSACTION;
SELECT balance FROM accounts WHERE account_id = 'A001'; -- $1000
-- Other transactions cannot modify this specific account
SELECT balance FROM accounts WHERE account_id = 'A001'; -- Still $1000
COMMIT;
This code guarantees that the balance for account A001 remains constant throughout the transaction, preventing non-repeatable reads but still allowing phantom reads for range queries.
4. SERIALIZABLE
The highest isolation level that completely isolates transactions from each other, as if they were executed serially.
SET TRANSACTION ISOLATION LEVEL SERIALIZABLE;
BEGIN TRANSACTION;
SELECT COUNT(*) FROM orders WHERE customer_id = 100; -- Returns 5
-- No other transaction can modify orders for customer 100
SELECT COUNT(*) FROM orders WHERE customer_id = 100; -- Still returns 5
COMMIT;
This example shows how SERIALIZABLE prevents all concurrency anomalies, including phantom reads, but may significantly impact performance due to increased locking.
Database-Specific Variations
Different database systems may implement additional isolation levels:
- PostgreSQL: Implements only READ COMMITTED, REPEATABLE READ, and SERIALIZABLE
- MySQL: Supports all four levels with InnoDB engine
- SQL Server: Includes additional levels like SNAPSHOT isolation
References:
↑ Back to topSecurity and Access Control
What is SQL injection and how do you prevent it?
SQL injection is a critical security vulnerability that occurs when an attacker inserts malicious SQL code into application queries through user input fields. This happens when user input is directly concatenated into SQL statements without proper validation or sanitization, allowing attackers to manipulate the database, access unauthorized data, or even delete entire tables.
How SQL Injection Works
Consider this vulnerable code example:
// VULNERABLE CODE - Never do this!
const userId = req.body.userId; // User input: "1 OR 1=1"
const query = `SELECT * FROM users WHERE id = ${userId}`;
// Resulting query: SELECT * FROM users WHERE id = 1 OR 1=1
// This returns ALL users instead of just one!
In this example, an attacker could input 1 OR 1=1 which makes the WHERE clause always true, potentially exposing all user records.
Prevention Methods
| Method | Description | Effectiveness |
|---|---|---|
| Prepared Statements | Pre-compiled SQL with placeholders | Highest |
| Input Validation | Whitelist/blacklist validation | High |
| Stored Procedures | Pre-defined database procedures | High |
| Escaping Special Characters | Escape SQL metacharacters | Medium |
| Least Privilege Principle | Limit database user permissions | High |
Best Practices Implementation
// SECURE CODE - Using prepared statements with Node.js/MySQL
const mysql = require('mysql2/promise');
async function getUserById(userId) {
const connection = await mysql.createConnection(dbConfig);
// Using prepared statement with placeholder (?)
const [rows] = await connection.execute(
'SELECT * FROM users WHERE id = ?',
[userId] // Parameters are safely escaped
);
await connection.end();
return rows;
}
This approach ensures that user input is treated as data, not executable code, completely preventing SQL injection attacks.
↑ Back to top