What are aggregate functions in SQL?

Aggregate functions perform calculations on multiple rows and return a single value. Common functions: COUNT() counts rows, SUM() adds values, AVG() calculates average, MAX()/MIN() find highest/lowest values. Used with GROUP BY to aggregate per group. COUNT(*) counts all rows; COUNT(column) counts non-NULL values. They ignore NULL values except COUNT(*).

What is a foreign key and referential integrity?

A foreign key is a column referencing another table's primary key, establishing relationships between tables. Referential integrity ensures foreign key values must exist in the referenced table—you can't insert invalid references or delete referenced rows. CASCADE options automatically propagate changes: ON DELETE CASCADE removes child rows; ON UPDATE CASCADE updates foreign key values.

What is the difference between clustered and non-clustered indexes?

A clustered index determines the physical order of data in the table—only one per table, typically on the primary key. Data rows are stored in index order. Non-clustered indexes are separate structures with pointers to data rows—multiple allowed per table. Clustered is faster for range queries; non-clustered for specific lookups on non-primary columns.

What is a subquery and what types exist?

A subquery is a query nested inside another query. Types: scalar (returns single value), row (returns one row), table (returns multiple rows/columns). Non-correlated subqueries execute once independently. Correlated subqueries reference outer query and execute per outer row (slower). Subqueries appear in SELECT, FROM, WHERE, or HAVING clauses. CTEs often provide better readability.

What is the difference between DELETE, TRUNCATE, and DROP?

DELETE removes specific rows (can use WHERE), logs each deletion, fires triggers, can be rolled back, slower for large tables. TRUNCATE removes all rows quickly, minimal logging, resets identity, cannot use WHERE, faster but less flexible. DROP removes the entire table structure and data permanently. Use DELETE for selective removal, TRUNCATE to empty tables, DROP to remove tables entirely.

What are transaction isolation levels?

Isolation levels control visibility of uncommitted changes between concurrent transactions. READ UNCOMMITTED: sees uncommitted changes (dirty reads). READ COMMITTED: sees only committed data (default in many databases). REPEATABLE READ: consistent reads within transaction. SERIALIZABLE: highest isolation, transactions appear sequential. Higher isolation = more consistency but less concurrency and performance.

How do you optimize SQL query performance?

Key optimizations: create appropriate indexes on frequently queried columns, analyze execution plans (EXPLAIN), avoid SELECT * (specify needed columns), use JOINs instead of subqueries when possible, avoid functions on indexed columns in WHERE, use LIMIT for large results, ensure up-to-date statistics, consider query caching and connection pooling. Profile before optimizing—measure actual bottlenecks.

SQL Interview Questions (Free Preview)

Q: What is the difference between SQL and NoSQL databases?

SQL databases are relational, using structured tables with predefined schemas, ACID transactions, and SQL query language (MySQL, PostgreSQL). NoSQL databases are non-relational, offering flexible schemas for unstructured data—types include document (MongoDB), key-value (Redis), column-family (Cassandra), and graph (Neo4j). SQL excels at complex queries and data integrity; NoSQL at scalability and flexibility.

Q: What is a primary key and why is it important?

A primary key is a column (or combination) that uniquely identifies each row in a table. It enforces entity integrity—values must be unique and cannot be NULL. Primary keys enable efficient row access, serve as reference points for foreign keys establishing relationships, and typically have clustered indexes for fast lookups. Every table should have a primary key.

Q: What are the different types of JOINs in SQL?

INNER JOIN returns only matching rows from both tables. LEFT JOIN returns all rows from the left table plus matches from right (NULL if no match). RIGHT JOIN returns all from right plus matches from left. FULL OUTER JOIN returns all rows from both tables. CROSS JOIN produces Cartesian product of all combinations. Self-join joins a table to itself.

Q: What is the difference between WHERE and HAVING clauses?

WHERE filters rows before grouping—it works on individual records and cannot use aggregate functions. HAVING filters groups after GROUP BY—it works on aggregated results and can use aggregate functions. Example: WHERE salary > 50000 filters employees; HAVING COUNT(*) > 5 filters departments with more than 5 employees. Use WHERE when possible (more efficient).

Q: What is database normalization?

Normalization organizes data to reduce redundancy and improve integrity through normal forms. 1NF: atomic values, no repeating groups. 2NF: 1NF + no partial dependencies on composite keys. 3NF: 2NF + no transitive dependencies (non-key columns depend only on the key). Benefits: less redundancy, easier updates, better integrity. Trade-off: more joins needed for queries.

Q: What are window functions?

Window functions perform calculations across related rows without collapsing them into groups (unlike aggregates). They use OVER() clause with optional PARTITION BY and ORDER BY. Common functions: ROW_NUMBER() (sequential), RANK()/DENSE_RANK() (handle ties differently), LAG()/LEAD() (access previous/next rows), running totals with SUM() OVER(). Essential for analytics and reporting.

SQL Interview Questions - Fundamentals

What is the difference between SQL and NoSQL databases?

Aspect	SQL Databases	NoSQL Databases
Structure	Structured, table-based with rows and columns	Flexible: document, key-value, column-family, or graph
Schema	Fixed schema, predefined structure	Dynamic/flexible schema
Query Language	Standardized SQL	Varies by database type
ACID Properties	Strong ACID compliance	Eventually consistent (BASE properties)
Scalability	Vertical scaling (scale-up)	Horizontal scaling (scale-out)
Relationships	Complex relationships via JOINs	Denormalized data, embedded documents
Use Cases	Financial systems, CRM, ERP	Big data, real-time analytics, content management

Examples:

SQL: MySQL, PostgreSQL, Oracle, SQL Server
NoSQL: MongoDB (document), Redis (key-value), Cassandra (column-family), Neo4j (graph)

References:

↑ Back to top

What is a primary key and why is it important?

A primary key is a column (or combination of columns) that uniquely identifies each record in a table. It serves as the main identifier for rows and ensures data integrity.

Key Characteristics:

Uniqueness: No two records can have the same primary key value
Non-null: Primary key values cannot be NULL
Immutable: Primary key values should not change once assigned
Minimal: Should contain the minimum number of columns necessary for uniqueness

-- Single column primary key
CREATE TABLE customers (
    customer_id INT PRIMARY KEY,
    customer_name VARCHAR(100),
    email VARCHAR(100)
);

-- Composite primary key (multiple columns)
CREATE TABLE order_items (
    order_id INT,
    product_id INT,
    quantity INT,
    PRIMARY KEY (order_id, product_id)
);

Why Primary Keys Are Important:

Data Integrity: Prevents duplicate records
Indexing: Automatically creates a clustered index for fast data retrieval
Relationships: Serves as a reference point for foreign keys
Replication: Essential for database replication and synchronization
Performance: Optimizes query execution and JOIN operations

-- Foreign key relationship using primary key
CREATE TABLE orders (
    order_id INT PRIMARY KEY,
    customer_id INT,
    order_date DATE,
    FOREIGN KEY (customer_id) REFERENCES customers(customer_id)
);

Reference: Primary Key Constraints

↑ Back to top

Basic Queries

What is the difference between `WHERE` and `HAVING` clauses?

The key difference between WHERE and HAVING lies in when they are applied during query execution and what they can filter.

Aspect	WHERE	HAVING
Applied	Before grouping	After grouping
Filters	Individual rows	Groups of rows
Can use aggregates	No	Yes
Performance	Generally faster	Can be slower
Execution order	Earlier in query processing	Later in query processing

Here's a practical example demonstrating both clauses:

-- Using WHERE clause (filters individual rows before grouping)
SELECT department, COUNT(*), AVG(salary)
FROM employees 
WHERE hire_date > '2020-01-01'  -- Filter individual employees
GROUP BY department 
HAVING COUNT(*) > 5;            -- Filter departments with more than 5 employees

-- This query:
-- 1. Filters employees hired after 2020-01-01 (WHERE)
-- 2. Groups remaining employees by department
-- 3. Calculates count and average salary for each department
-- 4. Only shows departments with more than 5 employees (HAVING)

-- Example showing what each clause can and cannot do
SELECT department, AVG(salary) as avg_salary
FROM employees 
WHERE salary > 50000           -- ✅ Valid: filters individual rows
GROUP BY department 
HAVING AVG(salary) > 80000;    -- ✅ Valid: filters groups using aggregate

-- WHERE AVG(salary) > 80000   -- ❌ Invalid: cannot use aggregates in WHERE
-- HAVING salary > 50000       -- ❌ Inefficient: should use WHERE instead

When to use each:

Use WHERE to filter individual records before expensive grouping operations
Use HAVING to filter groups based on aggregate calculations
Combine both for optimal performance and flexibility

↑ Back to top

Joins and Relationships

What are the different types of `JOIN`s in SQL?

SQL provides several types of JOIN operations to combine data from multiple tables based on related columns. The main types of JOINs are:

JOIN Type	Description	Returns
`INNER JOIN`	Returns only matching records	Records that exist in both tables
`LEFT JOIN` (LEFT OUTER JOIN)	Returns all records from left table	All left table records + matching right table records
`RIGHT JOIN` (RIGHT OUTER JOIN)	Returns all records from right table	All right table records + matching left table records
`FULL OUTER JOIN`	Returns all records from both tables	All records from both tables, matched where possible
`CROSS JOIN`	Cartesian product of both tables	Every combination of rows from both tables
`SELF JOIN`	Table joined with itself	Records from the same table based on a relationship

Here's a visual representation of the most common JOINs:

-- Example tables for demonstration
CREATE TABLE employees (
    id INT PRIMARY KEY,
    name VARCHAR(50),
    department_id INT
);

CREATE TABLE departments (
    id INT PRIMARY KEY,
    department_name VARCHAR(50)
);

This foundational understanding helps developers choose the appropriate JOIN type based on the specific data retrieval requirements.

References:

↑ Back to top

Aggregate Functions and Grouping

What are aggregate functions? Name the most common ones.

Aggregate functions are SQL functions that perform calculations on a set of values and return a single value. They operate on multiple rows of data to produce summary statistics and are essential for data analysis and reporting. These functions ignore NULL values (except for COUNT(*)).

Most Common Aggregate Functions:

Function	Purpose	Example Usage
`COUNT()`	Counts the number of rows	`COUNT(*)`, `COUNT(column_name)`
`SUM()`	Calculates the total sum	`SUM(salary)`
`AVG()`	Calculates the average value	`AVG(age)`
`MIN()`	Finds the minimum value	`MIN(price)`
`MAX()`	Finds the maximum value	`MAX(score)`
`GROUP_CONCAT()`	Concatenates values from multiple rows	`GROUP_CONCAT(name)` (MySQL)
`STRING_AGG()`	Concatenates string values	`STRING_AGG(name, ',')` (PostgreSQL)

Example Implementation:

-- Basic aggregate functions example
SELECT 
    COUNT(*) as total_employees,
    SUM(salary) as total_payroll,
    AVG(salary) as average_salary,
    MIN(salary) as lowest_salary,
    MAX(salary) as highest_salary
FROM employees;

This query demonstrates how aggregate functions summarize data from the entire employees table, providing key statistics about employee salaries in a single result row.

References:

↑ Back to top

Schema Design Interview Questions - Professional Answers

What is database normalization and why is it important?

Database normalization is a systematic process of organizing data in a relational database to minimize redundancy and improve data integrity. It involves decomposing tables into smaller, related tables and defining relationships between them using foreign keys.

Why normalization is important:

Benefit	Description
Data Integrity	Eliminates inconsistent data by storing each piece of information in only one place
Storage Efficiency	Reduces data redundancy, saving storage space
Update Anomalies Prevention	Prevents inconsistencies when updating data across multiple locations
Insertion Anomalies Prevention	Eliminates the need to enter redundant data when inserting new records
Deletion Anomalies Prevention	Prevents loss of important data when deleting records

-- Example of unnormalized data (problematic)
CREATE TABLE customer_orders (
    order_id INT,
    customer_name VARCHAR(100),
    customer_email VARCHAR(100),
    customer_phone VARCHAR(20),
    product_name VARCHAR(100),
    product_price DECIMAL(10,2)
);

-- Normalized approach (better)
CREATE TABLE customers (
    customer_id INT PRIMARY KEY,
    customer_name VARCHAR(100),
    customer_email VARCHAR(100),
    customer_phone VARCHAR(20)
);

CREATE TABLE products (
    product_id INT PRIMARY KEY,
    product_name VARCHAR(100),
    product_price DECIMAL(10,2)
);

CREATE TABLE orders (
    order_id INT PRIMARY KEY,
    customer_id INT,
    product_id INT,
    FOREIGN KEY (customer_id) REFERENCES customers(customer_id),
    FOREIGN KEY (product_id) REFERENCES products(product_id)
);

The normalized approach eliminates redundancy by separating customer and product information into their own tables, creating cleaner relationships and preventing data inconsistencies.

References:

↑ Back to top

What is a foreign key and how does it enforce referential integrity?

A foreign key is a column or combination of columns in one table that refers to the primary key of another table. It establishes a link between two tables and enforces referential integrity by ensuring that the relationship between tables remains consistent.

How foreign keys enforce referential integrity:

-- Parent table (referenced table)
CREATE TABLE customers (
    customer_id INT PRIMARY KEY,
    customer_name VARCHAR(100),
    email VARCHAR(100)
);

-- Child table (referencing table)
CREATE TABLE orders (
    order_id INT PRIMARY KEY,
    customer_id INT,
    order_date DATE,
    total_amount DECIMAL(10,2),
    FOREIGN KEY (customer_id) REFERENCES customers(customer_id)
);

This foreign key constraint ensures that every customer_id in the orders table must correspond to an existing customer_id in the customers table.

Referential integrity rules enforced:

Rule	Description	Example
Insert Rule	Cannot insert a record with a foreign key value that doesn't exist in parent table	Cannot create order for non-existent customer
Update Rule	Cannot update foreign key to a value that doesn't exist in parent table	Cannot change order's customer_id to invalid customer
Delete Rule	Cannot delete a parent record if child records reference it	Cannot delete customer who has orders (without handling children)

-- These operations will FAIL due to referential integrity:

-- 1. Insert order for non-existent customer
INSERT INTO orders (order_id, customer_id, order_date, total_amount)
VALUES (101, 999, '2024-01-15', 150.00); -- ERROR: customer_id 999 doesn't exist

-- 2. Update order to reference non-existent customer  
UPDATE orders SET customer_id = 888 WHERE order_id = 101; -- ERROR: customer_id 888 doesn't exist

-- 3. Delete customer who has orders
DELETE FROM customers WHERE customer_id = 1; -- ERROR: orders reference this customer

Handling referential integrity with cascading actions:

CREATE TABLE orders (
    order_id INT PRIMARY KEY,
    customer_id INT,
    order_date DATE,
    total_amount DECIMAL(10,2),
    FOREIGN KEY (customer_id) REFERENCES customers(customer_id)
        ON DELETE CASCADE    -- Delete orders when customer is deleted
        ON UPDATE CASCADE    -- Update orders when customer_id changes
);

-- Alternative cascading options:
-- ON DELETE SET NULL     -- Set foreign key to NULL when parent is deleted
-- ON DELETE RESTRICT     -- Prevent deletion of parent if children exist  
-- ON UPDATE SET NULL     -- Set foreign key to NULL when parent key changes
-- ON UPDATE RESTRICT     -- Prevent update of parent key if children exist

The CASCADE option automatically handles child records when parent records are modified, maintaining referential integrity while allowing necessary operations.

Benefits of foreign keys:

Data Consistency: Prevents orphaned records and invalid references
Data Quality: Ensures relationships between tables remain valid
Documentation: Explicitly defines table relationships in schema
Query Optimization: Database can use foreign keys for join optimization

References:

↑ Back to top

What is the difference between a clustered and non-clustered index?

Indexes are database structures that improve query performance by creating shortcuts to data. The fundamental difference between clustered and non-clustered indexes lies in how they store and organize the actual table data.

Clustered Index

A clustered index determines the physical storage order of data in the table. The table data is stored in the same order as the clustered index key.

-- Creating a clustered index (usually on primary key)
CREATE TABLE employees (
    emp_id INT PRIMARY KEY,        -- Automatically creates clustered index
    last_name VARCHAR(50),
    first_name VARCHAR(50),
    department_id INT,
    salary DECIMAL(10,2)
);

-- Explicit clustered index creation (SQL Server syntax)
CREATE CLUSTERED INDEX IX_employees_emp_id ON employees(emp_id);

Non-Clustered Index

A non-clustered index is a separate structure that contains pointers to the actual data rows. The table data remains in its original physical order.

-- Creating non-clustered indexes for better query performance
CREATE INDEX IX_employees_lastname ON employees(last_name);
CREATE INDEX IX_employees_dept_salary ON employees(department_id, salary);

-- Index with included columns (covering index)
CREATE INDEX IX_employees_covering 
ON employees(department_id) 
INCLUDE (first_name, last_name, salary);

The covering index includes additional columns that can satisfy queries without accessing the base table, further improving performance.

Key Differences Comparison

Aspect	Clustered Index	Non-Clustered Index
Data Storage	Data pages stored in index order	Separate structure with pointers to data
Physical Order	Determines table's physical data order	Doesn't affect data storage order
Per Table Limit	Only one per table	Multiple allowed (typically 999+ depending on DBMS)
Storage Space	No additional space (data IS the index)	Additional storage required for index structure
Insert Performance	Slower (may require page splits)	Faster (just add pointer)
Range Queries	Excellent (sequential reads)	Good (but may require random access)
Key Lookups	Direct data access	Requires bookmark lookup to get full row

Performance Implications

-- Query benefiting from clustered index (range scan)
SELECT * FROM employees 
WHERE emp_id BETWEEN 1000 AND 2000
ORDER BY emp_id;
-- Very efficient: data is physically ordered by emp_id

-- Query benefiting from non-clustered index
SELECT emp_id, first_name, last_name 
FROM employees 
WHERE last_name = 'Smith';
-- Uses IX_employees_lastname index to quickly find matching rows

-- Query using covering index (no key lookup needed)
SELECT first_name, last_name, salary 
FROM employees 
WHERE department_id = 10;
-- IX_employees_covering provides all needed data without accessing base table

Choosing Between Index Types

Use Clustered Index for:

Primary key columns (default behavior)
Columns frequently used in range queries
Columns used in ORDER BY clauses
Columns with high selectivity and frequent access

Use Non-Clustered Index for:

Columns frequently used in WHERE clauses
Foreign key columns used in joins
Columns used in GROUP BY operations
Supporting multiple query patterns

-- Example: Optimal indexing strategy for an orders table
CREATE TABLE orders (
    order_id INT PRIMARY KEY,           -- Clustered index (automatic)
    customer_id INT,
    order_date DATE,
    status VARCHAR(20),
    total_amount DECIMAL(10,2)
);

-- Non-clustered indexes for common query patterns
CREATE INDEX IX_orders_customer ON orders(customer_id);
CREATE INDEX IX_orders_date ON orders(order_date);
CREATE INDEX IX_orders_status ON orders(status);

-- Covering index for reporting queries
CREATE INDEX IX_orders_reporting 
ON orders(order_date, status) 
INCLUDE (customer_id, total_amount);

This indexing strategy supports various query patterns efficiently while maintaining reasonable storage overhead and insert performance.

References:

↑ Back to top

Subqueries and CTEs

What is a subquery and what are the different types?

A subquery (also called a nested query or inner query) is a SQL query that is embedded within another SQL query. The subquery is executed first, and its result is used by the outer query to complete the operation.

Types of Subqueries

Type	Description	Usage Context
Scalar Subquery	Returns a single value (one row, one column)	Used in SELECT, WHERE, or HAVING clauses
Row Subquery	Returns a single row with multiple columns	Used with row constructors
Column Subquery	Returns multiple rows with a single column	Used with IN, ANY, ALL operators
Table Subquery	Returns multiple rows and columns	Used in FROM clause

Examples of Different Subquery Types

-- Scalar subquery: Find employees earning more than average salary
SELECT employee_name, salary
FROM employees
WHERE salary > (SELECT AVG(salary) FROM employees);

-- Column subquery: Find employees in specific departments
SELECT employee_name
FROM employees
WHERE department_id IN (SELECT department_id FROM departments WHERE location = 'New York');

-- Table subquery: Use subquery as a table source
SELECT emp.employee_name, dept_stats.avg_salary
FROM employees emp
JOIN (
    SELECT department_id, AVG(salary) as avg_salary
    FROM employees
    GROUP BY department_id
) dept_stats ON emp.department_id = dept_stats.department_id;

These examples demonstrate how subqueries can be used to break down complex queries into manageable parts, making them more readable and maintainable.

↑ Back to top

Window Functions

What are window functions and how do they differ from aggregate functions?

Window functions are advanced SQL features that perform calculations across a set of rows related to the current row within a result set, without collapsing the rows into a single output row. They operate on a "window" of data and return a value for each row in the original result set.

Key differences from aggregate functions:

Aspect	Window Functions	Aggregate Functions
Row preservation	Preserve all original rows	Collapse rows into groups
Output	One result per input row	One result per group
GROUP BY requirement	No GROUP BY needed	Requires GROUP BY for multiple rows
Calculation scope	Over a defined window of rows	Over entire group or table
Syntax	Uses OVER() clause	Standalone function calls

Example demonstrating the difference:

-- Sample data: employee salaries by department
SELECT 
    employee_name,
    department,
    salary,
    -- Window function: shows individual rows with additional calculated column
    AVG(salary) OVER (PARTITION BY department) as dept_avg_salary,
    -- This would require GROUP BY and collapse rows:
    -- AVG(salary) as dept_avg  -- This won't work without GROUP BY
FROM employees;

This query shows each employee with their department's average salary, preserving all individual rows. An aggregate function would require grouping and would only show one row per department.

References:

↑ Back to top

Advanced Data Manipulation

What is the difference between `DELETE`, `DROP`, and `TRUNCATE`?

These three SQL commands serve different purposes for data and structure removal, each with distinct characteristics and use cases.

Detailed Comparison

Aspect	DELETE	TRUNCATE	DROP
Purpose	Remove specific rows	Remove all rows	Remove entire table/structure
Structure Impact	Preserves table structure	Preserves table structure	Removes table completely
WHERE Clause	Supported	Not supported	Not applicable
Transaction Log	Logs each row deletion	Minimal logging	Logs structure removal
Rollback	Can be rolled back	Can be rolled back (in transaction)	Can be rolled back (in transaction)
Auto-increment Reset	No	Yes	N/A (table removed)
Speed	Slower (row-by-row)	Faster (page deallocation)	Fast (metadata operation)
Triggers	Fires DELETE triggers	Does not fire triggers	Fires DROP triggers

DELETE Command

-- Remove specific records based on conditions
DELETE FROM employees 
WHERE department = 'Marketing' 
  AND hire_date < '2020-01-01';

-- Remove all records (but keep table structure)
DELETE FROM employees;

-- Delete with JOIN (remove employees from closed departments)
DELETE e
FROM employees e
INNER JOIN departments d ON e.department_id = d.department_id
WHERE d.status = 'CLOSED';

The DELETE command provides granular control over which records to remove and maintains full transaction logging. It's the safest option when you need to remove specific data while preserving the ability to rollback.

TRUNCATE Command

-- Remove all rows from table quickly
TRUNCATE TABLE employees;

-- Note: TRUNCATE cannot be used with WHERE clause
-- This would cause an error:
-- TRUNCATE TABLE employees WHERE department = 'IT';

TRUNCATE is optimized for removing all data from a table quickly. It deallocates data pages rather than deleting rows individually, making it much faster than DELETE for clearing entire tables.

DROP Command

-- Remove entire table structure and data
DROP TABLE employees;

-- Drop with existence check
DROP TABLE IF EXISTS temp_employees;

-- Drop multiple related objects
DROP TABLE employees CASCADE;  -- PostgreSQL: drops dependent objects
DROP TABLE employees;          -- SQL Server: requires manual dependency removal

-- Drop other database objects
DROP INDEX idx_employee_name ON employees;
DROP VIEW employee_summary;
DROP PROCEDURE calculate_bonus;

DROP completely removes database objects from the system. Once executed, both the data and structure are permanently removed (unless restored from backup).

Practical Examples and Best Practices

-- Scenario 1: Data cleanup (use DELETE)
DELETE FROM order_logs 
WHERE created_date < DATEADD(month, -6, GETDATE());

-- Scenario 2: Reset table for new data load (use TRUNCATE)
TRUNCATE TABLE staging_data;
-- Then bulk insert new data

-- Scenario 3: Remove temporary table (use DROP)
DROP TABLE IF EXISTS #temp_calculations;

-- Best practice: Always use transactions for safety
BEGIN TRANSACTION;
    DELETE FROM employees WHERE status = 'TERMINATED';
    -- Verify the results
    SELECT COUNT(*) FROM employees WHERE status = 'TERMINATED';
COMMIT; -- or ROLLBACK if something went wrong

This example demonstrates appropriate use cases for each command and shows how transactions provide safety when making destructive changes to data.

↑ Back to top

Query Performance

What are some common causes of poor query performance?

Poor query performance typically stems from several identifiable issues. Understanding these common causes helps developers proactively design efficient queries and database schemas.

1. Missing or inappropriate indexes:

-- Problem: No index on frequently queried column
SELECT * FROM orders WHERE customer_id = 12345;  -- Table scan on millions of rows

-- Solution: Add appropriate index
CREATE INDEX idx_orders_customer_id ON orders(customer_id);

2. Inefficient WHERE clauses:

Non-sargable conditions:

-- Problem: Function in WHERE clause prevents index usage
SELECT * FROM employees WHERE UPPER(last_name) = 'SMITH';

-- Solution: Use functional index or store normalized data
CREATE INDEX idx_employees_upper_name ON employees(UPPER(last_name));
-- Or better: Store normalized data
SELECT * FROM employees WHERE last_name = 'Smith';

Leading wildcards:

-- Problem: Cannot use index effectively
SELECT * FROM products WHERE name LIKE '%phone%';

-- Better: Trailing wildcard can use index
SELECT * FROM products WHERE name LIKE 'phone%';

3. Suboptimal JOIN operations:

-- Problem: Cartesian product due to missing JOIN condition
SELECT * FROM orders o, customers c;  -- Missing WHERE clause

-- Solution: Proper JOIN syntax
SELECT * FROM orders o 
INNER JOIN customers c ON o.customer_id = c.id;

-- Problem: Wrong JOIN order (large table first)
SELECT * FROM huge_table h
JOIN small_lookup l ON h.lookup_id = l.id;

-- Better: Let optimizer choose or force small table first
SELECT * FROM small_lookup l
JOIN huge_table h ON l.id = h.lookup_id;

**4. SELECT * abuse:**

-- Problem: Retrieving unnecessary data
SELECT * FROM products WHERE category = 'Electronics';

-- Solution: Select only needed columns
SELECT id, name, price FROM products WHERE category = 'Electronics';

5. Lack of query limits:

-- Problem: Returning entire dataset
SELECT * FROM log_table ORDER BY created_date DESC;

-- Solution: Use appropriate limits
SELECT * FROM log_table 
ORDER BY created_date DESC 
LIMIT 100;

6. Outdated table statistics:

-- Problem: Query optimizer makes poor decisions due to stale statistics
-- Solution: Regular statistics updates
ANALYZE TABLE employees;  -- MySQL
ANALYZE employees;        -- PostgreSQL
UPDATE STATISTICS employees; -- SQL Server

7. Parameter sniffing (SQL Server specific):

-- Problem: Plan optimized for first parameter value
CREATE PROCEDURE GetOrdersByDate(@date DATE)
AS
SELECT * FROM orders WHERE order_date = @date;

-- Solution: Use OPTION (RECOMPILE) or local variables
CREATE PROCEDURE GetOrdersByDate(@date DATE)
AS
DECLARE @local_date DATE = @date;
SELECT * FROM orders WHERE order_date = @local_date;

8. Excessive data type conversions:

-- Problem: Implicit conversion prevents index usage
SELECT * FROM orders WHERE customer_id = '12345';  -- customer_id is INT

-- Solution: Use correct data types
SELECT * FROM orders WHERE customer_id = 12345;

Common performance killers summary:

Issue	Impact	Solution
Missing indexes	Full table scans	Add appropriate indexes
Functions in WHERE	Index not usable	Functional indexes or data normalization
SELECT *	Unnecessary I/O	Select specific columns
No LIMIT clause	Memory exhaustion	Always use appropriate limits
Stale statistics	Poor execution plans	Regular ANALYZE/UPDATE STATISTICS
Wrong data types	Implicit conversions	Use matching data types
Complex subqueries	Multiple table scans	CTEs or JOINs
Lack of partitioning	Large table scans	Implement table partitioning

Monitoring and detection:

// Node.js example: Query performance monitoring
const queryMonitor = {
    async executeWithProfiling(query, params) {
        const start = Date.now();
        const result = await db.query(query, params);
        const duration = Date.now() - start;
        
        // Flag potential issues
        if (duration > 5000) {
            console.warn('Very slow query detected:', {
                query: query.substring(0, 100),
                duration,
                rowCount: result.rowCount
            });
        }
        
        return result;
    }
};

This monitoring approach helps identify performance issues in production environments before they impact users significantly.

References:

↑ Back to top

ACID Properties

What are ACID properties in database transactions?

ACID is an acronym that represents four fundamental properties that guarantee reliable processing of database transactions. These properties ensure data integrity and consistency in database systems, especially in multi-user environments.

Property	Description	Purpose
Atomicity	All operations in a transaction succeed or fail together	Prevents partial updates
Consistency	Database remains in a valid state before and after transaction	Maintains data integrity
Isolation	Concurrent transactions don't interfere with each other	Prevents data corruption
Durability	Committed changes persist even after system failures	Ensures data permanence

Atomicity

Atomicity ensures that a transaction is treated as a single, indivisible unit. Either all operations within the transaction are completed successfully, or none of them are applied to the database.

-- Example: Bank transfer transaction
BEGIN TRANSACTION;
    UPDATE accounts SET balance = balance - 100 WHERE account_id = 'A001';
    UPDATE accounts SET balance = balance + 100 WHERE account_id = 'A002';
COMMIT;
-- If any operation fails, the entire transaction is rolled back

This code demonstrates a bank transfer where money is deducted from one account and added to another. Atomicity ensures that if either operation fails, both are rolled back, preventing inconsistent states where money could be lost or created.

Consistency

Consistency ensures that the database transitions from one valid state to another, maintaining all defined rules, constraints, and relationships.

Isolation

Isolation ensures that concurrent transactions don't interfere with each other, preventing issues like dirty reads, phantom reads, and non-repeatable reads.

Durability

Durability guarantees that once a transaction is committed, its effects are permanently stored and will survive system crashes, power failures, or other system errors.

References:

↑ Back to top

What are the different isolation levels in SQL?

SQL defines four standard isolation levels that provide different degrees of isolation between concurrent transactions. Each level makes different trade-offs between data consistency and system performance.

SQL Standard Isolation Levels

Level	Dirty Read	Non-repeatable Read	Phantom Read	Performance
READ UNCOMMITTED	Possible	Possible	Possible	Highest
READ COMMITTED	Prevented	Possible	Possible	High
REPEATABLE READ	Prevented	Prevented	Possible	Medium
SERIALIZABLE	Prevented	Prevented	Prevented	Lowest

1. READ UNCOMMITTED

The lowest isolation level where transactions can read uncommitted data from other transactions.

-- Setting isolation level
SET TRANSACTION ISOLATION LEVEL READ UNCOMMITTED;

BEGIN TRANSACTION;
    -- Can read data that other transactions haven't committed yet
    SELECT balance FROM accounts WHERE account_id = 'A001';
COMMIT;

This code allows reading uncommitted changes from other transactions, providing maximum concurrency but risking dirty reads. It's typically used for reporting where approximate data is acceptable.

2. READ COMMITTED

Transactions can only read committed data, but the same query might return different results if run multiple times within the same transaction.

SET TRANSACTION ISOLATION LEVEL READ COMMITTED;

BEGIN TRANSACTION;
    SELECT COUNT(*) FROM products WHERE price > 100; -- Returns 50
    -- Another transaction commits new products
    SELECT COUNT(*) FROM products WHERE price > 100; -- Might return 52
COMMIT;

This example shows how READ COMMITTED prevents dirty reads but allows non-repeatable reads. The count might change between reads if other transactions commit new data.

3. REPEATABLE READ

Ensures that if a transaction reads a row, subsequent reads of the same row within the transaction will return the same data.

SET TRANSACTION ISOLATION LEVEL REPEATABLE READ;

BEGIN TRANSACTION;
    SELECT balance FROM accounts WHERE account_id = 'A001'; -- $1000
    -- Other transactions cannot modify this specific account
    SELECT balance FROM accounts WHERE account_id = 'A001'; -- Still $1000
COMMIT;

This code guarantees that the balance for account A001 remains constant throughout the transaction, preventing non-repeatable reads but still allowing phantom reads for range queries.

4. SERIALIZABLE

The highest isolation level that completely isolates transactions from each other, as if they were executed serially.

SET TRANSACTION ISOLATION LEVEL SERIALIZABLE;

BEGIN TRANSACTION;
    SELECT COUNT(*) FROM orders WHERE customer_id = 100; -- Returns 5
    -- No other transaction can modify orders for customer 100
    SELECT COUNT(*) FROM orders WHERE customer_id = 100; -- Still returns 5
COMMIT;

This example shows how SERIALIZABLE prevents all concurrency anomalies, including phantom reads, but may significantly impact performance due to increased locking.

Database-Specific Variations

Different database systems may implement additional isolation levels:

PostgreSQL: Implements only READ COMMITTED, REPEATABLE READ, and SERIALIZABLE
MySQL: Supports all four levels with InnoDB engine
SQL Server: Includes additional levels like SNAPSHOT isolation

References:

↑ Back to top

Security and Access Control

What is SQL injection and how do you prevent it?

SQL injection is a critical security vulnerability that occurs when an attacker inserts malicious SQL code into application queries through user input fields. This happens when user input is directly concatenated into SQL statements without proper validation or sanitization, allowing attackers to manipulate the database, access unauthorized data, or even delete entire tables.

How SQL Injection Works

Consider this vulnerable code example:

// VULNERABLE CODE - Never do this!
const userId = req.body.userId; // User input: "1 OR 1=1"
const query = `SELECT * FROM users WHERE id = ${userId}`;
// Resulting query: SELECT * FROM users WHERE id = 1 OR 1=1
// This returns ALL users instead of just one!

In this example, an attacker could input 1 OR 1=1 which makes the WHERE clause always true, potentially exposing all user records.

Prevention Methods

Method	Description	Effectiveness
Prepared Statements	Pre-compiled SQL with placeholders	Highest
Input Validation	Whitelist/blacklist validation	High
Stored Procedures	Pre-defined database procedures	High
Escaping Special Characters	Escape SQL metacharacters	Medium
Least Privilege Principle	Limit database user permissions	High

Best Practices Implementation

// SECURE CODE - Using prepared statements with Node.js/MySQL
const mysql = require('mysql2/promise');

async function getUserById(userId) {
    const connection = await mysql.createConnection(dbConfig);
    
    // Using prepared statement with placeholder (?)
    const [rows] = await connection.execute(
        'SELECT * FROM users WHERE id = ?',
        [userId]  // Parameters are safely escaped
    );
    
    await connection.end();
    return rows;
}

This approach ensures that user input is treated as data, not executable code, completely preventing SQL injection attacks.

↑ Back to top

Country/region