✍️ 필사 모드: The Complete Guide to Database Fundamentals — SQL, Normalization, Indexes, Transactions, NoSQL
English- 1. What Is a Database
- 2. SQL Basics
- 3. Normalization
- 4. Indexes
- 5. Transactions
- 6. Query Optimization
- 7. Advanced PostgreSQL
- 8. NoSQL Types and Selection Criteria
- 9. Redis in Practice
- 10. Data Modeling
- 11. Migrations
- Conclusion
1. What Is a Database
A database is a system that stores and manages structured data. In modern software, data storage and retrieval are core capabilities, and choosing the right database and designing it properly determines application performance and scalability.
RDBMS vs NoSQL
Relational databases (RDBMS) and NoSQL have fundamentally different philosophies.
| Aspect | RDBMS | NoSQL |
|---|---|---|
| Data Model | Tables (rows and columns) | Document, Key-Value, Column, Graph |
| Schema | Fixed schema | Flexible schema |
| Scaling | Vertical (Scale-Up) | Horizontal (Scale-Out) |
| Transactions | ACID guaranteed | BASE (some ACID support) |
| Query Language | SQL | DB-specific APIs |
| Best For | Complex relationships, consistency | High volume, flexible schemas |
Major Database Comparison
| DB | Type | Key Features | Primary Use Cases |
|---|---|---|---|
| PostgreSQL | RDBMS | JSONB, extensibility, rich types | General purpose |
| MySQL | RDBMS | High compatibility, InnoDB engine | Web services |
| MongoDB | Document | Flexible schema, aggregation pipeline | Content management |
| Redis | Key-Value | In-memory, ultra-fast | Cache, sessions |
| Cassandra | Wide Column | High write throughput | IoT, time series |
| Neo4j | Graph | Optimized relationship traversal | Social networks |
2. SQL Basics
SQL (Structured Query Language) is the standard language for working with relational databases.
SELECT and Basic Syntax
-- Basic query
SELECT name, email, created_at
FROM users
WHERE status = 'active'
ORDER BY created_at DESC
LIMIT 10;
-- Using aliases
SELECT
u.name AS user_name,
COUNT(o.id) AS order_count,
SUM(o.total_amount) AS total_spent
FROM users u
JOIN orders o ON u.id = o.user_id
GROUP BY u.name;
Types of JOINs
JOINs are the core operation for combining data from multiple tables.
-- INNER JOIN: only matching rows from both sides
SELECT u.name, o.order_date
FROM users u
INNER JOIN orders o ON u.id = o.user_id;
-- LEFT JOIN: all rows from left table + matching right
SELECT u.name, o.order_date
FROM users u
LEFT JOIN orders o ON u.id = o.user_id;
-- RIGHT JOIN: all rows from right table + matching left
SELECT u.name, o.order_date
FROM users u
RIGHT JOIN orders o ON u.id = o.user_id;
-- FULL OUTER JOIN: all rows from both sides
SELECT u.name, o.order_date
FROM users u
FULL OUTER JOIN orders o ON u.id = o.user_id;
JOIN Selection Guide:
- INNER JOIN -- when you only need data that exists in both tables
- LEFT JOIN -- when you need to preserve all rows from the base table
- RIGHT JOIN -- rarely used in practice since LEFT JOIN can substitute
- FULL OUTER JOIN -- when you need to merge all data from both sides
GROUP BY and HAVING
-- Departments where average salary exceeds 50,000
SELECT
department,
COUNT(*) AS employee_count,
AVG(salary) AS avg_salary
FROM employees
GROUP BY department
HAVING AVG(salary) >= 50000
ORDER BY avg_salary DESC;
WHERE filters individual rows, while HAVING filters grouped results.
Subqueries
-- Employees with salary above average
SELECT name, salary
FROM employees
WHERE salary > (
SELECT AVG(salary) FROM employees
);
-- Subquery with EXISTS
SELECT u.name
FROM users u
WHERE EXISTS (
SELECT 1 FROM orders o
WHERE o.user_id = u.id
AND o.created_at >= '2026-01-01'
);
-- Subquery with IN
SELECT name, department
FROM employees
WHERE department IN (
SELECT department
FROM employees
GROUP BY department
HAVING COUNT(*) > 10
);
3. Normalization
Normalization is the process of decomposing tables to reduce data redundancy and improve integrity.
First Normal Form (1NF)
Every column value must be atomic. A single cell cannot contain multiple values.
-- 1NF violation: multiple values in phone_numbers
-- name: 'John Doe', phone_numbers: '555-1234, 555-5678'
-- 1NF compliant: separate table
CREATE TABLE users (
id SERIAL PRIMARY KEY,
name VARCHAR(100) NOT NULL
);
CREATE TABLE phone_numbers (
id SERIAL PRIMARY KEY,
user_id INTEGER REFERENCES users(id),
phone VARCHAR(20) NOT NULL
);
Second Normal Form (2NF)
Satisfies 1NF and has no partial dependencies. Columns that depend on only part of a composite key must be separated.
-- 2NF violation: student_name depends only on student_id
-- (student_id, course_id) -> grade (OK)
-- student_id -> student_name (partial dependency!)
-- 2NF compliant
CREATE TABLE students (
student_id SERIAL PRIMARY KEY,
student_name VARCHAR(100) NOT NULL
);
CREATE TABLE enrollments (
student_id INTEGER REFERENCES students(student_id),
course_id INTEGER REFERENCES courses(course_id),
grade CHAR(2),
PRIMARY KEY (student_id, course_id)
);
Third Normal Form (3NF)
Satisfies 2NF and has no transitive dependencies.
-- 3NF violation: department_name is transitively dependent via department_id
-- employee_id -> department_id -> department_name
-- 3NF compliant
CREATE TABLE departments (
department_id SERIAL PRIMARY KEY,
department_name VARCHAR(100) NOT NULL
);
CREATE TABLE employees (
employee_id SERIAL PRIMARY KEY,
name VARCHAR(100) NOT NULL,
department_id INTEGER REFERENCES departments(department_id)
);
BCNF (Boyce-Codd Normal Form)
Satisfies 3NF and every determinant is a candidate key. It applies stricter conditions than 3NF.
When to Denormalize
Denormalization intentionally allows redundancy to improve read performance.
Consider denormalization when:
- Reads vastly outnumber writes
- Complex JOINs occur repeatedly
- Real-time aggregation is needed
- Caching alone cannot meet performance requirements
-- Denormalization example: storing user name in orders table
CREATE TABLE orders (
id SERIAL PRIMARY KEY,
user_id INTEGER REFERENCES users(id),
user_name VARCHAR(100), -- Denormalized: eliminates JOIN on read
total_amount DECIMAL(10, 2),
created_at TIMESTAMP DEFAULT NOW()
);
4. Indexes
An index is a data structure that accelerates data retrieval. It works like a book index.
B-Tree Index
The most versatile index type. Effective for range queries, sorting, and equality comparisons.
-- Single column index
CREATE INDEX idx_users_email ON users(email);
-- Query examples with index usage
SELECT * FROM users WHERE email = 'test@example.com'; -- Uses index
SELECT * FROM users WHERE email LIKE 'test%'; -- Uses index
SELECT * FROM users WHERE email LIKE '%test'; -- No index (leading wildcard)
Hash Index
Only usable for equality comparisons (=). Cannot perform range queries but can be faster than B-Tree for equality lookups.
CREATE INDEX idx_users_email_hash ON users USING HASH (email);
GIN Index (Generalized Inverted Index)
Suitable for arrays, JSONB, and full-text search.
-- GIN index on JSONB column
CREATE INDEX idx_products_tags ON products USING GIN (tags);
-- GIN index for full-text search
CREATE INDEX idx_articles_search ON articles USING GIN (
to_tsvector('english', title || ' ' || content)
);
GiST Index (Generalized Search Tree)
Used for spatial data, range types, and full-text search.
-- Spatial data index
CREATE INDEX idx_locations_coords ON locations USING GIST (coordinates);
Composite Index
An index combining multiple columns. Column order matters significantly.
-- Composite index: matches from leading column
CREATE INDEX idx_orders_user_date ON orders(user_id, created_at);
-- Queries where this index is effective:
SELECT * FROM orders WHERE user_id = 1; -- Used
SELECT * FROM orders WHERE user_id = 1 AND created_at > '2026-01-01'; -- Used
SELECT * FROM orders WHERE created_at > '2026-01-01'; -- Not used (missing leading column)
Covering Index
Includes all columns needed by a query in the index, returning results without accessing the table.
-- Covering index
CREATE INDEX idx_orders_covering ON orders(user_id, created_at, total_amount);
-- Can be processed using index only (Index Only Scan)
SELECT user_id, created_at, total_amount
FROM orders
WHERE user_id = 1 AND created_at > '2026-01-01';
Index Design Principles
- Index columns with high cardinality (number of unique values)
- Prioritize columns frequently used in WHERE, JOIN, and ORDER BY
- Consider the trade-off between write performance and storage space
- In composite indexes, place higher selectivity columns first
5. Transactions
A transaction is a single logical unit of work that changes the state of a database.
ACID Properties
- Atomicity -- all operations in a transaction either succeed entirely or fail entirely
- Consistency -- the database remains in a consistent state before and after a transaction
- Isolation -- concurrent transactions do not interfere with each other
- Durability -- results of a completed transaction are permanently preserved
-- Bank transfer transaction example
BEGIN;
UPDATE accounts SET balance = balance - 1000
WHERE account_id = 'A001';
UPDATE accounts SET balance = balance + 1000
WHERE account_id = 'B001';
-- Verify balance
SELECT balance FROM accounts WHERE account_id = 'A001';
COMMIT; -- or ROLLBACK if there is an issue;
Isolation Levels
Higher isolation levels increase data consistency but decrease concurrency.
| Isolation Level | Dirty Read | Non-Repeatable Read | Phantom Read | Performance |
|---|---|---|---|---|
| Read Uncommitted | Possible | Possible | Possible | Highest |
| Read Committed | Prevented | Possible | Possible | High |
| Repeatable Read | Prevented | Prevented | Possible | Moderate |
| Serializable | Prevented | Prevented | Prevented | Lowest |
-- Setting isolation level
SET TRANSACTION ISOLATION LEVEL REPEATABLE READ;
BEGIN;
SELECT balance FROM accounts WHERE account_id = 'A001';
-- Changes from other transactions are not visible during this transaction
COMMIT;
Problems at each isolation level:
- Dirty Read -- reading data that has not been committed
- Non-Repeatable Read -- getting different results when running the same query twice
- Phantom Read -- new rows appearing when querying with the same conditions
Deadlock
A state where two transactions wait forever for each other to release their locks.
-- Deadlock scenario
-- Transaction 1: locks A then B
-- Transaction 2: locks B then A
-- Solutions:
-- 1. Unify lock ordering (always lock the lower ID first)
-- 2. Set timeouts
-- 3. Minimize transaction scope
SET lock_timeout = '5s';
6. Query Optimization
Slow queries degrade overall service performance. Systematic optimization is essential.
EXPLAIN ANALYZE
EXPLAIN ANALYZE
SELECT u.name, COUNT(o.id) AS order_count
FROM users u
LEFT JOIN orders o ON u.id = o.user_id
WHERE u.created_at >= '2026-01-01'
GROUP BY u.name
ORDER BY order_count DESC
LIMIT 10;
Key items to check in the execution plan:
- Seq Scan -- full table scan. If it occurs on large tables, an index is needed
- Index Scan -- query using an index. Efficient
- Index Only Scan -- returns data from index alone. Most efficient
- Nested Loop -- suitable for small data JOINs
- Hash Join -- suitable for medium to large data JOINs
- Merge Join -- suitable for large sorted data JOINs
- Actual Time -- real execution time
- Rows -- check estimated vs actual row count discrepancy
The N+1 Problem
The N+1 problem is an inefficient pattern where 1 query triggers N additional queries.
-- N+1 problem occurring
-- 1 query: SELECT * FROM users LIMIT 100;
-- 100 queries: SELECT * FROM orders WHERE user_id = ?; (for each user)
-- Solution: retrieve in one query with JOIN
SELECT u.*, o.*
FROM users u
LEFT JOIN orders o ON u.id = o.user_id
WHERE u.id IN (SELECT id FROM users LIMIT 100);
-- Or use a subquery
SELECT u.*, (
SELECT COUNT(*) FROM orders o WHERE o.user_id = u.id
) AS order_count
FROM users u
LIMIT 100;
Query Optimization Checklist
- Specify only needed columns instead of SELECT *
- Functions in WHERE clauses can invalidate indexes
- Avoid leading wildcards in LIKE searches
- Process large INSERTs in batches
- Create appropriate indexes
-- Bad example: index invalidated
SELECT * FROM users WHERE YEAR(created_at) = 2026;
-- Good example: index can be utilized
SELECT * FROM users
WHERE created_at >= '2026-01-01'
AND created_at < '2027-01-01';
7. Advanced PostgreSQL
PostgreSQL is the most powerful open-source RDBMS in terms of extensibility and standards compliance.
JSONB
-- Creating and using JSONB columns
CREATE TABLE products (
id SERIAL PRIMARY KEY,
name VARCHAR(200),
attributes JSONB
);
-- Inserting data
INSERT INTO products (name, attributes)
VALUES ('Laptop', '{"brand": "Samsung", "ram": 16, "storage": "512GB", "tags": ["electronics", "portable"]}');
-- JSONB queries
SELECT name, attributes->>'brand' AS brand
FROM products
WHERE attributes->>'ram' = '16';
-- JSONB containment operator
SELECT * FROM products
WHERE attributes @> '{"brand": "Samsung"}';
-- JSONB array query
SELECT * FROM products
WHERE attributes->'tags' ? 'electronics';
CTE (Common Table Expression)
-- Recursive CTE: traversing an org chart
WITH RECURSIVE org_tree AS (
-- Base case: top-level manager
SELECT id, name, manager_id, 1 AS depth
FROM employees
WHERE manager_id IS NULL
UNION ALL
-- Recursive case: traverse subordinates
SELECT e.id, e.name, e.manager_id, t.depth + 1
FROM employees e
INNER JOIN org_tree t ON e.manager_id = t.id
)
SELECT * FROM org_tree ORDER BY depth, name;
Window Functions
-- Salary ranking by department
SELECT
name,
department,
salary,
ROW_NUMBER() OVER (PARTITION BY department ORDER BY salary DESC) AS rank_in_dept,
RANK() OVER (ORDER BY salary DESC) AS overall_rank,
AVG(salary) OVER (PARTITION BY department) AS dept_avg_salary,
salary - LAG(salary) OVER (PARTITION BY department ORDER BY salary DESC) AS diff_from_prev
FROM employees;
-- Running total
SELECT
order_date,
amount,
SUM(amount) OVER (ORDER BY order_date ROWS UNBOUNDED PRECEDING) AS running_total
FROM daily_sales;
Partitioning
-- Range partitioning
CREATE TABLE events (
id BIGSERIAL,
event_type VARCHAR(50),
payload JSONB,
created_at TIMESTAMP NOT NULL
) PARTITION BY RANGE (created_at);
-- Creating monthly partitions
CREATE TABLE events_2026_01 PARTITION OF events
FOR VALUES FROM ('2026-01-01') TO ('2026-02-01');
CREATE TABLE events_2026_02 PARTITION OF events
FOR VALUES FROM ('2026-02-01') TO ('2026-03-01');
CREATE TABLE events_2026_03 PARTITION OF events
FOR VALUES FROM ('2026-03-01') TO ('2026-04-01');
-- Indexes are automatically created per partition
CREATE INDEX idx_events_type ON events(event_type);
8. NoSQL Types and Selection Criteria
Document Store -- MongoDB
A document-oriented database that supports flexible JSON-like schemas.
// MongoDB query examples
db.users.insertOne({
name: "John Doe",
email: "john@example.com",
address: {
city: "New York",
state: "NY"
},
tags: ["premium", "active"]
});
// Aggregation pipeline
db.orders.aggregate([
{ $match: { status: "completed" } },
{ $group: {
_id: "$user_id",
totalSpent: { $sum: "$amount" },
orderCount: { $sum: 1 }
}},
{ $sort: { totalSpent: -1 } },
{ $limit: 10 }
]);
Best for: services with frequently changing schemas, content management systems, catalogs
Key-Value Store -- Redis
An in-memory data store providing ultra-fast reads and writes. (Redis details are covered in section 9.)
Wide Column Store -- Cassandra
Provides high write throughput in large-scale distributed environments.
-- Cassandra CQL
CREATE TABLE sensor_data (
sensor_id UUID,
event_time TIMESTAMP,
temperature DOUBLE,
humidity DOUBLE,
PRIMARY KEY (sensor_id, event_time)
) WITH CLUSTERING ORDER BY (event_time DESC);
Best for: IoT data, time series data, large-scale log storage
Graph Database -- Neo4j
Stores and traverses data based on nodes and relationships (edges).
// Neo4j Cypher query
// Finding friends of friends (2-degree relationship)
MATCH (me:Person)-[:FRIEND]->(friend)-[:FRIEND]->(fof)
WHERE me.name = 'John Doe'
AND NOT (me)-[:FRIEND]->(fof)
AND fof <> me
RETURN DISTINCT fof.name AS recommendation
LIMIT 10;
Best for: social networks, recommendation systems, fraud detection, knowledge graphs
NoSQL Selection Guide
| Requirement | Recommended DB |
|---|---|
| Flexible schema + complex queries | MongoDB |
| Ultra-fast cache/sessions | Redis |
| Massive writes + high availability | Cassandra |
| Relationship traversal + graph analysis | Neo4j |
| Full-text search | Elasticsearch |
| Time series data | TimescaleDB, InfluxDB |
9. Redis in Practice
Redis is an in-memory data structure store that supports a variety of data structures.
Caching
# Set cache (TTL 3600 seconds)
SET user:1001:profile '{"name":"John Doe","email":"john@example.com"}' EX 3600
# Read cache
GET user:1001:profile
# Invalidate cache
DEL user:1001:profile
Caching strategies:
- Cache-Aside (Lazy Loading) -- check cache on request; if missing, query DB and store in cache
- Write-Through -- write to both DB and cache simultaneously
- Write-Behind -- write to cache first, then asynchronously persist to DB
Session Management
# Store session (30-minute expiry)
HSET session:abc123 user_id 1001 role admin login_time "2026-04-12T10:00:00"
EXPIRE session:abc123 1800
# Retrieve session
HGETALL session:abc123
Leaderboards
# Add scores
ZADD game:leaderboard 1500 "player:001"
ZADD game:leaderboard 2300 "player:002"
ZADD game:leaderboard 1800 "player:003"
# Get top 10 (highest score first)
ZREVRANGE game:leaderboard 0 9 WITHSCORES
# Get a specific player rank
ZREVRANK game:leaderboard "player:001"
Pub/Sub
# Subscribe to channel
SUBSCRIBE notifications:user:1001
# Publish message
PUBLISH notifications:user:1001 '{"type":"order","message":"Your order has been completed"}'
Distributed Locks
# Acquire lock (NX: only if not exists, EX: expiry time)
SET lock:order:process "worker-1" NX EX 30
# Release lock (Lua script for atomic operation)
# Only release the lock if the value matches (to ensure we own it)
EVAL "if redis.call('get', KEYS[1]) == ARGV[1] then return redis.call('del', KEYS[1]) else return 0 end" 1 lock:order:process "worker-1"
10. Data Modeling
ER Diagrams
Entity-Relationship diagrams are a visual representation of database design.
Key components:
- Entity -- a real-world object that becomes a table
- Attribute -- a characteristic of an entity (column)
- Relationship -- an association between entities
Relationship Types
One-to-One (1:1)
-- User and profile: 1:1
CREATE TABLE users (
id SERIAL PRIMARY KEY,
email VARCHAR(255) UNIQUE NOT NULL
);
CREATE TABLE user_profiles (
user_id INTEGER PRIMARY KEY REFERENCES users(id),
bio TEXT,
avatar_url VARCHAR(500)
);
One-to-Many (1:N)
-- User and posts: 1:N
CREATE TABLE posts (
id SERIAL PRIMARY KEY,
user_id INTEGER REFERENCES users(id),
title VARCHAR(200) NOT NULL,
content TEXT,
created_at TIMESTAMP DEFAULT NOW()
);
Many-to-Many (M:N)
-- Posts and tags: M:N (junction table required)
CREATE TABLE tags (
id SERIAL PRIMARY KEY,
name VARCHAR(50) UNIQUE NOT NULL
);
CREATE TABLE post_tags (
post_id INTEGER REFERENCES posts(id),
tag_id INTEGER REFERENCES tags(id),
PRIMARY KEY (post_id, tag_id)
);
Modeling Checklist
- Analyze business requirements first
- Identify entities and relationships
- Apply normalization (minimum 3NF)
- Review denormalization based on performance requirements
- Establish an index strategy
- Forecast data growth volume
11. Migrations
Why Migrations Are Necessary
Database schemas evolve alongside code. Migration tools version-control schema changes and ensure the entire team maintains the same schema.
Flyway (Java/JVM Ecosystem)
-- V1__create_users_table.sql
CREATE TABLE users (
id BIGSERIAL PRIMARY KEY,
email VARCHAR(255) NOT NULL UNIQUE,
name VARCHAR(100) NOT NULL,
created_at TIMESTAMP DEFAULT NOW()
);
-- V2__add_status_to_users.sql
ALTER TABLE users ADD COLUMN status VARCHAR(20) DEFAULT 'active';
CREATE INDEX idx_users_status ON users(status);
-- V3__create_orders_table.sql
CREATE TABLE orders (
id BIGSERIAL PRIMARY KEY,
user_id BIGINT REFERENCES users(id),
total_amount DECIMAL(10, 2) NOT NULL,
created_at TIMESTAMP DEFAULT NOW()
);
Flyway rules:
- File names follow the V-number-description.sql format
- Never modify a migration that has already been applied
- Add new changes as new files
Alembic (Python/SQLAlchemy Ecosystem)
# alembic/versions/001_create_users.py
from alembic import op
import sqlalchemy as sa
def upgrade():
op.create_table(
'users',
sa.Column('id', sa.Integer(), primary_key=True),
sa.Column('email', sa.String(255), nullable=False, unique=True),
sa.Column('name', sa.String(100), nullable=False),
sa.Column('created_at', sa.DateTime(), server_default=sa.func.now()),
)
def downgrade():
op.drop_table('users')
Zero-Downtime Migrations
Methods for changing schemas without service interruption in production environments.
Adding a Column (Safe)
-- Add without NOT NULL (no impact on existing rows)
ALTER TABLE users ADD COLUMN phone VARCHAR(20);
Dropping a Column (Staged Approach)
Step 1 -- remove all code usage of the column and deploy.
Step 2 -- after sufficient time has passed, drop the column.
ALTER TABLE users DROP COLUMN phone;
Renaming a Table (Safe Method)
Step 1 -- create the new table.
Step 2 -- modify code to write to both tables.
Step 3 -- migrate the data.
Step 4 -- switch reads to the new table only.
Step 5 -- drop the old table.
Large-Scale Index Creation
-- CONCURRENTLY option: create index without table lock
CREATE INDEX CONCURRENTLY idx_orders_user ON orders(user_id);
Migration Tool Comparison
| Tool | Ecosystem | Features |
|---|---|---|
| Flyway | Java/JVM | SQL-based, simple and intuitive |
| Liquibase | Java/JVM | XML/YAML/JSON/SQL support |
| Alembic | Python | SQLAlchemy integration, auto-detection |
| Prisma Migrate | Node.js | Prisma ORM integration |
| golang-migrate | Go | Lightweight, CLI-based |
| Knex.js | Node.js | JavaScript migrations |
Conclusion
A database is the heart of software. You need a solid foundation in SQL basics, an understanding of normalization and index internals, and hands-on mastery of ACID transaction properties. Add query optimization skills and informed NoSQL selection criteria, and you can design a robust data layer for services of any scale.
Key takeaways:
- Deeply understand SQL -- JOINs, subqueries, and Window Functions are essential
- Follow normalization principles, but consider denormalization for performance
- Indexes are the key to read performance, but they come with write costs
- Transaction isolation levels are a trade-off between consistency and concurrency
- Measure and improve query performance with EXPLAIN ANALYZE
- Understand the strengths and weaknesses of RDBMS and NoSQL, and choose accordingly
- Always version-control migrations and establish zero-downtime strategies
현재 단락 (1/513)
A database is a system that stores and manages structured data. In modern software, data storage and...