FREE PREVIEW

You're viewing a free preview

This is a sample of 15 questions from our full collection of 105 interview questions.

Unlock all 105 questions with detailed explanations and code examples

Get Full Access

SQL Interview Questions - Fundamentals

What is the difference between SQL and NoSQL databases?

Aspect SQL Databases NoSQL Databases
Structure Structured, table-based with rows and columns Flexible: document, key-value, column-family, or graph
Schema Fixed schema, predefined structure Dynamic/flexible schema
Query Language Standardized SQL Varies by database type
ACID Properties Strong ACID compliance Eventually consistent (BASE properties)
Scalability Vertical scaling (scale-up) Horizontal scaling (scale-out)
Relationships Complex relationships via JOINs Denormalized data, embedded documents
Use Cases Financial systems, CRM, ERP Big data, real-time analytics, content management

Examples:

  • SQL: MySQL, PostgreSQL, Oracle, SQL Server
  • NoSQL: MongoDB (document), Redis (key-value), Cassandra (column-family), Neo4j (graph)

References:

↑ Back to top

What is a primary key and why is it important?

A primary key is a column (or combination of columns) that uniquely identifies each record in a table. It serves as the main identifier for rows and ensures data integrity.

Key Characteristics:

  • Uniqueness: No two records can have the same primary key value
  • Non-null: Primary key values cannot be NULL
  • Immutable: Primary key values should not change once assigned
  • Minimal: Should contain the minimum number of columns necessary for uniqueness
-- Single column primary key
CREATE TABLE customers (
    customer_id INT PRIMARY KEY,
    customer_name VARCHAR(100),
    email VARCHAR(100)
);

-- Composite primary key (multiple columns)
CREATE TABLE order_items (
    order_id INT,
    product_id INT,
    quantity INT,
    PRIMARY KEY (order_id, product_id)
);

Why Primary Keys Are Important:

  1. Data Integrity: Prevents duplicate records
  2. Indexing: Automatically creates a clustered index for fast data retrieval
  3. Relationships: Serves as a reference point for foreign keys
  4. Replication: Essential for database replication and synchronization
  5. Performance: Optimizes query execution and JOIN operations
-- Foreign key relationship using primary key
CREATE TABLE orders (
    order_id INT PRIMARY KEY,
    customer_id INT,
    order_date DATE,
    FOREIGN KEY (customer_id) REFERENCES customers(customer_id)
);

Reference: Primary Key Constraints

↑ Back to top

Basic Queries

What is the difference between WHERE and HAVING clauses?

The key difference between WHERE and HAVING lies in when they are applied during query execution and what they can filter.

Aspect WHERE HAVING
Applied Before grouping After grouping
Filters Individual rows Groups of rows
Can use aggregates No Yes
Performance Generally faster Can be slower
Execution order Earlier in query processing Later in query processing

Here's a practical example demonstrating both clauses:

-- Using WHERE clause (filters individual rows before grouping)
SELECT department, COUNT(*), AVG(salary)
FROM employees 
WHERE hire_date > '2020-01-01'  -- Filter individual employees
GROUP BY department 
HAVING COUNT(*) > 5;            -- Filter departments with more than 5 employees

-- This query:
-- 1. Filters employees hired after 2020-01-01 (WHERE)
-- 2. Groups remaining employees by department
-- 3. Calculates count and average salary for each department
-- 4. Only shows departments with more than 5 employees (HAVING)
-- Example showing what each clause can and cannot do
SELECT department, AVG(salary) as avg_salary
FROM employees 
WHERE salary > 50000           -- ✅ Valid: filters individual rows
GROUP BY department 
HAVING AVG(salary) > 80000;    -- ✅ Valid: filters groups using aggregate

-- WHERE AVG(salary) > 80000   -- ❌ Invalid: cannot use aggregates in WHERE
-- HAVING salary > 50000       -- ❌ Inefficient: should use WHERE instead

When to use each:

  • Use WHERE to filter individual records before expensive grouping operations
  • Use HAVING to filter groups based on aggregate calculations
  • Combine both for optimal performance and flexibility
↑ Back to top

Joins and Relationships

What are the different types of JOINs in SQL?

SQL provides several types of JOIN operations to combine data from multiple tables based on related columns. The main types of JOINs are:

JOIN Type Description Returns
INNER JOIN Returns only matching records Records that exist in both tables
LEFT JOIN (LEFT OUTER JOIN) Returns all records from left table All left table records + matching right table records
RIGHT JOIN (RIGHT OUTER JOIN) Returns all records from right table All right table records + matching left table records
FULL OUTER JOIN Returns all records from both tables All records from both tables, matched where possible
CROSS JOIN Cartesian product of both tables Every combination of rows from both tables
SELF JOIN Table joined with itself Records from the same table based on a relationship

Here's a visual representation of the most common JOINs:

-- Example tables for demonstration
CREATE TABLE employees (
    id INT PRIMARY KEY,
    name VARCHAR(50),
    department_id INT
);

CREATE TABLE departments (
    id INT PRIMARY KEY,
    department_name VARCHAR(50)
);

This foundational understanding helps developers choose the appropriate JOIN type based on the specific data retrieval requirements.

References:

↑ Back to top

Aggregate Functions and Grouping

What are aggregate functions? Name the most common ones.

Aggregate functions are SQL functions that perform calculations on a set of values and return a single value. They operate on multiple rows of data to produce summary statistics and are essential for data analysis and reporting. These functions ignore NULL values (except for COUNT(*)).

Most Common Aggregate Functions:

Function Purpose Example Usage
COUNT() Counts the number of rows COUNT(*), COUNT(column_name)
SUM() Calculates the total sum SUM(salary)
AVG() Calculates the average value AVG(age)
MIN() Finds the minimum value MIN(price)
MAX() Finds the maximum value MAX(score)
GROUP_CONCAT() Concatenates values from multiple rows GROUP_CONCAT(name) (MySQL)
STRING_AGG() Concatenates string values STRING_AGG(name, ',') (PostgreSQL)

Example Implementation:

-- Basic aggregate functions example
SELECT 
    COUNT(*) as total_employees,
    SUM(salary) as total_payroll,
    AVG(salary) as average_salary,
    MIN(salary) as lowest_salary,
    MAX(salary) as highest_salary
FROM employees;

This query demonstrates how aggregate functions summarize data from the entire employees table, providing key statistics about employee salaries in a single result row.

References:

↑ Back to top

Schema Design Interview Questions - Professional Answers

What is database normalization and why is it important?

Database normalization is a systematic process of organizing data in a relational database to minimize redundancy and improve data integrity. It involves decomposing tables into smaller, related tables and defining relationships between them using foreign keys.

Why normalization is important:

Benefit Description
Data Integrity Eliminates inconsistent data by storing each piece of information in only one place
Storage Efficiency Reduces data redundancy, saving storage space
Update Anomalies Prevention Prevents inconsistencies when updating data across multiple locations
Insertion Anomalies Prevention Eliminates the need to enter redundant data when inserting new records
Deletion Anomalies Prevention Prevents loss of important data when deleting records
-- Example of unnormalized data (problematic)
CREATE TABLE customer_orders (
    order_id INT,
    customer_name VARCHAR(100),
    customer_email VARCHAR(100),
    customer_phone VARCHAR(20),
    product_name VARCHAR(100),
    product_price DECIMAL(10,2)
);

-- Normalized approach (better)
CREATE TABLE customers (
    customer_id INT PRIMARY KEY,
    customer_name VARCHAR(100),
    customer_email VARCHAR(100),
    customer_phone VARCHAR(20)
);

CREATE TABLE products (
    product_id INT PRIMARY KEY,
    product_name VARCHAR(100),
    product_price DECIMAL(10,2)
);

CREATE TABLE orders (
    order_id INT PRIMARY KEY,
    customer_id INT,
    product_id INT,
    FOREIGN KEY (customer_id) REFERENCES customers(customer_id),
    FOREIGN KEY (product_id) REFERENCES products(product_id)
);

The normalized approach eliminates redundancy by separating customer and product information into their own tables, creating cleaner relationships and preventing data inconsistencies.

References:

↑ Back to top

What is a foreign key and how does it enforce referential integrity?

A foreign key is a column or combination of columns in one table that refers to the primary key of another table. It establishes a link between two tables and enforces referential integrity by ensuring that the relationship between tables remains consistent.

How foreign keys enforce referential integrity:

-- Parent table (referenced table)
CREATE TABLE customers (
    customer_id INT PRIMARY KEY,
    customer_name VARCHAR(100),
    email VARCHAR(100)
);

-- Child table (referencing table)
CREATE TABLE orders (
    order_id INT PRIMARY KEY,
    customer_id INT,
    order_date DATE,
    total_amount DECIMAL(10,2),
    FOREIGN KEY (customer_id) REFERENCES customers(customer_id)
);

This foreign key constraint ensures that every customer_id in the orders table must correspond to an existing customer_id in the customers table.

Referential integrity rules enforced:

Rule Description Example
Insert Rule Cannot insert a record with a foreign key value that doesn't exist in parent table Cannot create order for non-existent customer
Update Rule Cannot update foreign key to a value that doesn't exist in parent table Cannot change order's customer_id to invalid customer
Delete Rule Cannot delete a parent record if child records reference it Cannot delete customer who has orders (without handling children)
-- These operations will FAIL due to referential integrity:

-- 1. Insert order for non-existent customer
INSERT INTO orders (order_id, customer_id, order_date, total_amount)
VALUES (101, 999, '2024-01-15', 150.00); -- ERROR: customer_id 999 doesn't exist

-- 2. Update order to reference non-existent customer  
UPDATE orders SET customer_id = 888 WHERE order_id = 101; -- ERROR: customer_id 888 doesn't exist

-- 3. Delete customer who has orders
DELETE FROM customers WHERE customer_id = 1; -- ERROR: orders reference this customer

Handling referential integrity with cascading actions:

CREATE TABLE orders (
    order_id INT PRIMARY KEY,
    customer_id INT,
    order_date DATE,
    total_amount DECIMAL(10,2),
    FOREIGN KEY (customer_id) REFERENCES customers(customer_id)
        ON DELETE CASCADE    -- Delete orders when customer is deleted
        ON UPDATE CASCADE    -- Update orders when customer_id changes
);

-- Alternative cascading options:
-- ON DELETE SET NULL     -- Set foreign key to NULL when parent is deleted
-- ON DELETE RESTRICT     -- Prevent deletion of parent if children exist  
-- ON UPDATE SET NULL     -- Set foreign key to NULL when parent key changes
-- ON UPDATE RESTRICT     -- Prevent update of parent key if children exist

The CASCADE option automatically handles child records when parent records are modified, maintaining referential integrity while allowing necessary operations.

Benefits of foreign keys:

  • Data Consistency: Prevents orphaned records and invalid references
  • Data Quality: Ensures relationships between tables remain valid
  • Documentation: Explicitly defines table relationships in schema
  • Query Optimization: Database can use foreign keys for join optimization

References:

↑ Back to top

What is the difference between a clustered and non-clustered index?

Indexes are database structures that improve query performance by creating shortcuts to data. The fundamental difference between clustered and non-clustered indexes lies in how they store and organize the actual table data.

Clustered Index

A clustered index determines the physical storage order of data in the table. The table data is stored in the same order as the clustered index key.

-- Creating a clustered index (usually on primary key)
CREATE TABLE employees (
    emp_id INT PRIMARY KEY,        -- Automatically creates clustered index
    last_name VARCHAR(50),
    first_name VARCHAR(50),
    department_id INT,
    salary DECIMAL(10,2)
);

-- Explicit clustered index creation (SQL Server syntax)
CREATE CLUSTERED INDEX IX_employees_emp_id ON employees(emp_id);

Non-Clustered Index

A non-clustered index is a separate structure that contains pointers to the actual data rows. The table data remains in its original physical order.

-- Creating non-clustered indexes for better query performance
CREATE INDEX IX_employees_lastname ON employees(last_name);
CREATE INDEX IX_employees_dept_salary ON employees(department_id, salary);

-- Index with included columns (covering index)
CREATE INDEX IX_employees_covering 
ON employees(department_id) 
INCLUDE (first_name, last_name, salary);

The covering index includes additional columns that can satisfy queries without accessing the base table, further improving performance.

Key Differences Comparison

Aspect Clustered Index Non-Clustered Index
Data Storage Data pages stored in index order Separate structure with pointers to data
Physical Order Determines table's physical data order Doesn't affect data storage order
Per Table Limit Only one per table Multiple allowed (typically 999+ depending on DBMS)
Storage Space No additional space (data IS the index) Additional storage required for index structure
Insert Performance Slower (may require page splits) Faster (just add pointer)
Range Queries Excellent (sequential reads) Good (but may require random access)
Key Lookups Direct data access Requires bookmark lookup to get full row

Performance Implications

-- Query benefiting from clustered index (range scan)
SELECT * FROM employees 
WHERE emp_id BETWEEN 1000 AND 2000
ORDER BY emp_id;
-- Very efficient: data is physically ordered by emp_id

-- Query benefiting from non-clustered index
SELECT emp_id, first_name, last_name 
FROM employees 
WHERE last_name = 'Smith';
-- Uses IX_employees_lastname index to quickly find matching rows

-- Query using covering index (no key lookup needed)
SELECT first_name, last_name, salary 
FROM employees 
WHERE department_id = 10;
-- IX_employees_covering provides all needed data without accessing base table

Choosing Between Index Types

Use Clustered Index for:

  • Primary key columns (default behavior)
  • Columns frequently used in range queries
  • Columns used in ORDER BY clauses
  • Columns with high selectivity and frequent access

Use Non-Clustered Index for:

  • Columns frequently used in WHERE clauses
  • Foreign key columns used in joins
  • Columns used in GROUP BY operations
  • Supporting multiple query patterns
-- Example: Optimal indexing strategy for an orders table
CREATE TABLE orders (
    order_id INT PRIMARY KEY,           -- Clustered index (automatic)
    customer_id INT,
    order_date DATE,
    status VARCHAR(20),
    total_amount DECIMAL(10,2)
);

-- Non-clustered indexes for common query patterns
CREATE INDEX IX_orders_customer ON orders(customer_id);
CREATE INDEX IX_orders_date ON orders(order_date);
CREATE INDEX IX_orders_status ON orders(status);

-- Covering index for reporting queries
CREATE INDEX IX_orders_reporting 
ON orders(order_date, status) 
INCLUDE (customer_id, total_amount);

This indexing strategy supports various query patterns efficiently while maintaining reasonable storage overhead and insert performance.

References:

↑ Back to top

Subqueries and CTEs

What is a subquery and what are the different types?

A subquery (also called a nested query or inner query) is a SQL query that is embedded within another SQL query. The subquery is executed first, and its result is used by the outer query to complete the operation.

Types of Subqueries

Type Description Usage Context
Scalar Subquery Returns a single value (one row, one column) Used in SELECT, WHERE, or HAVING clauses
Row Subquery Returns a single row with multiple columns Used with row constructors
Column Subquery Returns multiple rows with a single column Used with IN, ANY, ALL operators
Table Subquery Returns multiple rows and columns Used in FROM clause

Examples of Different Subquery Types

-- Scalar subquery: Find employees earning more than average salary
SELECT employee_name, salary
FROM employees
WHERE salary > (SELECT AVG(salary) FROM employees);

-- Column subquery: Find employees in specific departments
SELECT employee_name
FROM employees
WHERE department_id IN (SELECT department_id FROM departments WHERE location = 'New York');

-- Table subquery: Use subquery as a table source
SELECT emp.employee_name, dept_stats.avg_salary
FROM employees emp
JOIN (
    SELECT department_id, AVG(salary) as avg_salary
    FROM employees
    GROUP BY department_id
) dept_stats ON emp.department_id = dept_stats.department_id;

These examples demonstrate how subqueries can be used to break down complex queries into manageable parts, making them more readable and maintainable.

↑ Back to top

Window Functions

What are window functions and how do they differ from aggregate functions?

Window functions are advanced SQL features that perform calculations across a set of rows related to the current row within a result set, without collapsing the rows into a single output row. They operate on a "window" of data and return a value for each row in the original result set.

Key differences from aggregate functions:

Aspect Window Functions Aggregate Functions
Row preservation Preserve all original rows Collapse rows into groups
Output One result per input row One result per group
GROUP BY requirement No GROUP BY needed Requires GROUP BY for multiple rows
Calculation scope Over a defined window of rows Over entire group or table
Syntax Uses OVER() clause Standalone function calls

Example demonstrating the difference:

-- Sample data: employee salaries by department
SELECT 
    employee_name,
    department,
    salary,
    -- Window function: shows individual rows with additional calculated column
    AVG(salary) OVER (PARTITION BY department) as dept_avg_salary,
    -- This would require GROUP BY and collapse rows:
    -- AVG(salary) as dept_avg  -- This won't work without GROUP BY
FROM employees;

This query shows each employee with their department's average salary, preserving all individual rows. An aggregate function would require grouping and would only show one row per department.

References:

↑ Back to top

Advanced Data Manipulation

What is the difference between DELETE, DROP, and TRUNCATE?

These three SQL commands serve different purposes for data and structure removal, each with distinct characteristics and use cases.

Detailed Comparison

Aspect DELETE TRUNCATE DROP
Purpose Remove specific rows Remove all rows Remove entire table/structure
Structure Impact Preserves table structure Preserves table structure Removes table completely
WHERE Clause Supported Not supported Not applicable
Transaction Log Logs each row deletion Minimal logging Logs structure removal
Rollback Can be rolled back Can be rolled back (in transaction) Can be rolled back (in transaction)
Auto-increment Reset No Yes N/A (table removed)
Speed Slower (row-by-row) Faster (page deallocation) Fast (metadata operation)
Triggers Fires DELETE triggers Does not fire triggers Fires DROP triggers

DELETE Command

-- Remove specific records based on conditions
DELETE FROM employees 
WHERE department = 'Marketing' 
  AND hire_date < '2020-01-01';

-- Remove all records (but keep table structure)
DELETE FROM employees;

-- Delete with JOIN (remove employees from closed departments)
DELETE e
FROM employees e
INNER JOIN departments d ON e.department_id = d.department_id
WHERE d.status = 'CLOSED';

The DELETE command provides granular control over which records to remove and maintains full transaction logging. It's the safest option when you need to remove specific data while preserving the ability to rollback.

TRUNCATE Command

-- Remove all rows from table quickly
TRUNCATE TABLE employees;

-- Note: TRUNCATE cannot be used with WHERE clause
-- This would cause an error:
-- TRUNCATE TABLE employees WHERE department = 'IT';

TRUNCATE is optimized for removing all data from a table quickly. It deallocates data pages rather than deleting rows individually, making it much faster than DELETE for clearing entire tables.

DROP Command

-- Remove entire table structure and data
DROP TABLE employees;

-- Drop with existence check
DROP TABLE IF EXISTS temp_employees;

-- Drop multiple related objects
DROP TABLE employees CASCADE;  -- PostgreSQL: drops dependent objects
DROP TABLE employees;          -- SQL Server: requires manual dependency removal

-- Drop other database objects
DROP INDEX idx_employee_name ON employees;
DROP VIEW employee_summary;
DROP PROCEDURE calculate_bonus;

DROP completely removes database objects from the system. Once executed, both the data and structure are permanently removed (unless restored from backup).

Practical Examples and Best Practices

-- Scenario 1: Data cleanup (use DELETE)
DELETE FROM order_logs 
WHERE created_date < DATEADD(month, -6, GETDATE());

-- Scenario 2: Reset table for new data load (use TRUNCATE)
TRUNCATE TABLE staging_data;
-- Then bulk insert new data

-- Scenario 3: Remove temporary table (use DROP)
DROP TABLE IF EXISTS #temp_calculations;

-- Best practice: Always use transactions for safety
BEGIN TRANSACTION;
    DELETE FROM employees WHERE status = 'TERMINATED';
    -- Verify the results
    SELECT COUNT(*) FROM employees WHERE status = 'TERMINATED';
COMMIT; -- or ROLLBACK if something went wrong

This example demonstrates appropriate use cases for each command and shows how transactions provide safety when making destructive changes to data.

↑ Back to top

Query Performance

What are some common causes of poor query performance?

Poor query performance typically stems from several identifiable issues. Understanding these common causes helps developers proactively design efficient queries and database schemas.

1. Missing or inappropriate indexes:

-- Problem: No index on frequently queried column
SELECT * FROM orders WHERE customer_id = 12345;  -- Table scan on millions of rows

-- Solution: Add appropriate index
CREATE INDEX idx_orders_customer_id ON orders(customer_id);

2. Inefficient WHERE clauses:

Non-sargable conditions:

-- Problem: Function in WHERE clause prevents index usage
SELECT * FROM employees WHERE UPPER(last_name) = 'SMITH';

-- Solution: Use functional index or store normalized data
CREATE INDEX idx_employees_upper_name ON employees(UPPER(last_name));
-- Or better: Store normalized data
SELECT * FROM employees WHERE last_name = 'Smith';

Leading wildcards:

-- Problem: Cannot use index effectively
SELECT * FROM products WHERE name LIKE '%phone%';

-- Better: Trailing wildcard can use index
SELECT * FROM products WHERE name LIKE 'phone%';

3. Suboptimal JOIN operations:

-- Problem: Cartesian product due to missing JOIN condition
SELECT * FROM orders o, customers c;  -- Missing WHERE clause

-- Solution: Proper JOIN syntax
SELECT * FROM orders o 
INNER JOIN customers c ON o.customer_id = c.id;

-- Problem: Wrong JOIN order (large table first)
SELECT * FROM huge_table h
JOIN small_lookup l ON h.lookup_id = l.id;

-- Better: Let optimizer choose or force small table first
SELECT * FROM small_lookup l
JOIN huge_table h ON l.id = h.lookup_id;

4. SELECT * abuse:

-- Problem: Retrieving unnecessary data
SELECT * FROM products WHERE category = 'Electronics';

-- Solution: Select only needed columns
SELECT id, name, price FROM products WHERE category = 'Electronics';

5. Lack of query limits:

-- Problem: Returning entire dataset
SELECT * FROM log_table ORDER BY created_date DESC;

-- Solution: Use appropriate limits
SELECT * FROM log_table 
ORDER BY created_date DESC 
LIMIT 100;

6. Outdated table statistics:

-- Problem: Query optimizer makes poor decisions due to stale statistics
-- Solution: Regular statistics updates
ANALYZE TABLE employees;  -- MySQL
ANALYZE employees;        -- PostgreSQL
UPDATE STATISTICS employees; -- SQL Server

7. Parameter sniffing (SQL Server specific):

-- Problem: Plan optimized for first parameter value
CREATE PROCEDURE GetOrdersByDate(@date DATE)
AS
SELECT * FROM orders WHERE order_date = @date;

-- Solution: Use OPTION (RECOMPILE) or local variables
CREATE PROCEDURE GetOrdersByDate(@date DATE)
AS
DECLARE @local_date DATE = @date;
SELECT * FROM orders WHERE order_date = @local_date;

8. Excessive data type conversions:

-- Problem: Implicit conversion prevents index usage
SELECT * FROM orders WHERE customer_id = '12345';  -- customer_id is INT

-- Solution: Use correct data types
SELECT * FROM orders WHERE customer_id = 12345;

Common performance killers summary:

Issue Impact Solution
Missing indexes Full table scans Add appropriate indexes
Functions in WHERE Index not usable Functional indexes or data normalization
SELECT * Unnecessary I/O Select specific columns
No LIMIT clause Memory exhaustion Always use appropriate limits
Stale statistics Poor execution plans Regular ANALYZE/UPDATE STATISTICS
Wrong data types Implicit conversions Use matching data types
Complex subqueries Multiple table scans CTEs or JOINs
Lack of partitioning Large table scans Implement table partitioning

Monitoring and detection:

// Node.js example: Query performance monitoring
const queryMonitor = {
    async executeWithProfiling(query, params) {
        const start = Date.now();
        const result = await db.query(query, params);
        const duration = Date.now() - start;
        
        // Flag potential issues
        if (duration > 5000) {
            console.warn('Very slow query detected:', {
                query: query.substring(0, 100),
                duration,
                rowCount: result.rowCount
            });
        }
        
        return result;
    }
};

This monitoring approach helps identify performance issues in production environments before they impact users significantly.

References:

↑ Back to top

ACID Properties

What are ACID properties in database transactions?

ACID is an acronym that represents four fundamental properties that guarantee reliable processing of database transactions. These properties ensure data integrity and consistency in database systems, especially in multi-user environments.

Property Description Purpose
Atomicity All operations in a transaction succeed or fail together Prevents partial updates
Consistency Database remains in a valid state before and after transaction Maintains data integrity
Isolation Concurrent transactions don't interfere with each other Prevents data corruption
Durability Committed changes persist even after system failures Ensures data permanence

Atomicity

Atomicity ensures that a transaction is treated as a single, indivisible unit. Either all operations within the transaction are completed successfully, or none of them are applied to the database.

-- Example: Bank transfer transaction
BEGIN TRANSACTION;
    UPDATE accounts SET balance = balance - 100 WHERE account_id = 'A001';
    UPDATE accounts SET balance = balance + 100 WHERE account_id = 'A002';
COMMIT;
-- If any operation fails, the entire transaction is rolled back

This code demonstrates a bank transfer where money is deducted from one account and added to another. Atomicity ensures that if either operation fails, both are rolled back, preventing inconsistent states where money could be lost or created.

Consistency

Consistency ensures that the database transitions from one valid state to another, maintaining all defined rules, constraints, and relationships.

Isolation

Isolation ensures that concurrent transactions don't interfere with each other, preventing issues like dirty reads, phantom reads, and non-repeatable reads.

Durability

Durability guarantees that once a transaction is committed, its effects are permanently stored and will survive system crashes, power failures, or other system errors.

References:

↑ Back to top

What are the different isolation levels in SQL?

SQL defines four standard isolation levels that provide different degrees of isolation between concurrent transactions. Each level makes different trade-offs between data consistency and system performance.

SQL Standard Isolation Levels

Level Dirty Read Non-repeatable Read Phantom Read Performance
READ UNCOMMITTED Possible Possible Possible Highest
READ COMMITTED Prevented Possible Possible High
REPEATABLE READ Prevented Prevented Possible Medium
SERIALIZABLE Prevented Prevented Prevented Lowest

1. READ UNCOMMITTED

The lowest isolation level where transactions can read uncommitted data from other transactions.

-- Setting isolation level
SET TRANSACTION ISOLATION LEVEL READ UNCOMMITTED;

BEGIN TRANSACTION;
    -- Can read data that other transactions haven't committed yet
    SELECT balance FROM accounts WHERE account_id = 'A001';
COMMIT;

This code allows reading uncommitted changes from other transactions, providing maximum concurrency but risking dirty reads. It's typically used for reporting where approximate data is acceptable.

2. READ COMMITTED

Transactions can only read committed data, but the same query might return different results if run multiple times within the same transaction.

SET TRANSACTION ISOLATION LEVEL READ COMMITTED;

BEGIN TRANSACTION;
    SELECT COUNT(*) FROM products WHERE price > 100; -- Returns 50
    -- Another transaction commits new products
    SELECT COUNT(*) FROM products WHERE price > 100; -- Might return 52
COMMIT;

This example shows how READ COMMITTED prevents dirty reads but allows non-repeatable reads. The count might change between reads if other transactions commit new data.

3. REPEATABLE READ

Ensures that if a transaction reads a row, subsequent reads of the same row within the transaction will return the same data.

SET TRANSACTION ISOLATION LEVEL REPEATABLE READ;

BEGIN TRANSACTION;
    SELECT balance FROM accounts WHERE account_id = 'A001'; -- $1000
    -- Other transactions cannot modify this specific account
    SELECT balance FROM accounts WHERE account_id = 'A001'; -- Still $1000
COMMIT;

This code guarantees that the balance for account A001 remains constant throughout the transaction, preventing non-repeatable reads but still allowing phantom reads for range queries.

4. SERIALIZABLE

The highest isolation level that completely isolates transactions from each other, as if they were executed serially.

SET TRANSACTION ISOLATION LEVEL SERIALIZABLE;

BEGIN TRANSACTION;
    SELECT COUNT(*) FROM orders WHERE customer_id = 100; -- Returns 5
    -- No other transaction can modify orders for customer 100
    SELECT COUNT(*) FROM orders WHERE customer_id = 100; -- Still returns 5
COMMIT;

This example shows how SERIALIZABLE prevents all concurrency anomalies, including phantom reads, but may significantly impact performance due to increased locking.

Database-Specific Variations

Different database systems may implement additional isolation levels:

  • PostgreSQL: Implements only READ COMMITTED, REPEATABLE READ, and SERIALIZABLE
  • MySQL: Supports all four levels with InnoDB engine
  • SQL Server: Includes additional levels like SNAPSHOT isolation

References:

↑ Back to top

Security and Access Control

What is SQL injection and how do you prevent it?

SQL injection is a critical security vulnerability that occurs when an attacker inserts malicious SQL code into application queries through user input fields. This happens when user input is directly concatenated into SQL statements without proper validation or sanitization, allowing attackers to manipulate the database, access unauthorized data, or even delete entire tables.

How SQL Injection Works

Consider this vulnerable code example:

// VULNERABLE CODE - Never do this!
const userId = req.body.userId; // User input: "1 OR 1=1"
const query = `SELECT * FROM users WHERE id = ${userId}`;
// Resulting query: SELECT * FROM users WHERE id = 1 OR 1=1
// This returns ALL users instead of just one!

In this example, an attacker could input 1 OR 1=1 which makes the WHERE clause always true, potentially exposing all user records.

Prevention Methods

Method Description Effectiveness
Prepared Statements Pre-compiled SQL with placeholders Highest
Input Validation Whitelist/blacklist validation High
Stored Procedures Pre-defined database procedures High
Escaping Special Characters Escape SQL metacharacters Medium
Least Privilege Principle Limit database user permissions High

Best Practices Implementation

// SECURE CODE - Using prepared statements with Node.js/MySQL
const mysql = require('mysql2/promise');

async function getUserById(userId) {
    const connection = await mysql.createConnection(dbConfig);
    
    // Using prepared statement with placeholder (?)
    const [rows] = await connection.execute(
        'SELECT * FROM users WHERE id = ?',
        [userId]  // Parameters are safely escaped
    );
    
    await connection.end();
    return rows;
}

This approach ensures that user input is treated as data, not executable code, completely preventing SQL injection attacks.

↑ Back to top

Want more questions?

You've seen 15 sample questions. Unlock all 105 En interview questions with detailed explanations, code examples, and expert insights.

105+ questions
Code examples
Expert explanations
Instant access
Unlock Full Access