Introduction
Linear Algebra is the cornerstone of modern mathematics and engineering. From data science and machine learning to computer graphics and quantum mechanics, it appears in virtually every technical discipline. This guide takes you from the very basics of vectors and matrices all the way to Singular Value Decomposition (SVD), building intuition alongside rigorous definitions.
**Key References:**
- Gilbert Strang, MIT OCW 18.06 Linear Algebra
- 3Blue1Brown: Essence of Linear Algebra (YouTube)
- NumPy official documentation (numpy.org)
- scikit-learn PCA documentation
1. Vectors
Definition and Notation
A vector is a mathematical object possessing both **magnitude** (size) and **direction**. An $n$-dimensional vector $\mathbf{v}$ is written as a column:
$$\mathbf{v} = \begin{pmatrix} v_1 \\ v_2 \\ \vdots \\ v_n \end{pmatrix} \in \mathbb{R}^n$$
Geometrically, a vector in $\mathbb{R}^2$ or $\mathbb{R}^3$ is an arrow pointing from the origin to a specific point.
Vector Operations
**Addition and Scalar Multiplication:**
$$\mathbf{u} + \mathbf{v} = \begin{pmatrix} u_1 + v_1 \\ u_2 + v_2 \end{pmatrix}, \quad c\mathbf{v} = \begin{pmatrix} cv_1 \\ cv_2 \end{pmatrix}$$
**Dot Product (Inner Product):**
$$\mathbf{u} \cdot \mathbf{v} = \sum_{i=1}^{n} u_i v_i = \|\mathbf{u}\| \|\mathbf{v}\| \cos\theta$$
The dot product is zero precisely when the two vectors are **orthogonal** (perpendicular).
**Cross Product** — defined only in 3D:
$$\mathbf{u} \times \mathbf{v} = \begin{vmatrix} \mathbf{i} & \mathbf{j} & \mathbf{k} \\ u_1 & u_2 & u_3 \\ v_1 & v_2 & v_3 \end{vmatrix}$$
The result is a vector perpendicular to both $\mathbf{u}$ and $\mathbf{v}$, with magnitude $\|\mathbf{u}\|\|\mathbf{v}\|\sin\theta$.
Norms
- **L1 norm (Manhattan):** $\|\mathbf{v}\|_1 = \sum_{i} |v_i|$
- **L2 norm (Euclidean):** $\|\mathbf{v}\|_2 = \sqrt{\sum_{i} v_i^2}$
- **Lp norm:** $\|\mathbf{v}\|_p = \left(\sum_{i} |v_i|^p\right)^{1/p}$
- **Infinity norm:** $\|\mathbf{v}\|_\infty = \max_i |v_i|$
**Unit vector (normalization):**
$$\hat{\mathbf{v}} = \frac{\mathbf{v}}{\|\mathbf{v}\|}$$
Define vectors
a = np.array([1, 2, 3])
b = np.array([4, 5, 6])
Addition
print("Sum:", a + b) # [5 7 9]
Scalar multiplication
print("Scalar mult:", 3 * a) # [3 6 9]
Dot product
dot = np.dot(a, b)
print("Dot product:", dot) # 32
Cross product (3D only)
cross = np.cross(a, b)
print("Cross product:", cross) # [-3 6 -3]
Norms
print("L2 norm:", np.linalg.norm(a)) # 3.7416...
print("L1 norm:", np.linalg.norm(a, ord=1)) # 6.0
print("Inf norm:", np.linalg.norm(a, ord=np.inf)) # 3.0
Unit vector
unit = a / np.linalg.norm(a)
print("Unit vector:", unit)
print("Magnitude:", np.linalg.norm(unit)) # 1.0
Angle between vectors
cos_theta = np.dot(a, b) / (np.linalg.norm(a) * np.linalg.norm(b))
angle_rad = np.arccos(np.clip(cos_theta, -1, 1))
print("Angle (degrees):", np.degrees(angle_rad))
Orthogonality check
v1 = np.array([1, 0])
v2 = np.array([0, 1])
print("v1 · v2 =", np.dot(v1, v2)) # 0, orthogonal!
2. Matrices
Definition and Types
An $m \times n$ matrix is a rectangular array of numbers with $m$ rows and $n$ columns:
$$A = \begin{pmatrix} a_{11} & a_{12} & \cdots & a_{1n} \\ a_{21} & a_{22} & \cdots & a_{2n} \\ \vdots & \vdots & \ddots & \vdots \\ a_{m1} & a_{m2} & \cdots & a_{mn} \end{pmatrix}$$
**Important special matrices:**
| Type | Definition | Property |
| ----------------- | ----------------------------------------------------------- | ------------------------ |
| Identity matrix | $I_{ij} = 1$ if $i=j$, else 0 | $AI = IA = A$ |
| Diagonal matrix | Off-diagonal entries are zero | — |
| Symmetric matrix | $A = A^T$ | Real eigenvalues |
| Orthogonal matrix | $A^T A = I$ | $\det(A) = \pm 1$ |
| Positive definite | $\mathbf{x}^T A \mathbf{x} > 0$ for all $\mathbf{x} \neq 0$ | All eigenvalues positive |
Matrix Operations
**Matrix Multiplication:**
$(AB)_{ij} = \sum_{k=1}^{n} a_{ik} b_{kj}$
Matrix multiplication is **not commutative** in general: $AB \neq BA$.
**Transpose:** $(A^T)_{ij} = A_{ji}$
Key property: $(AB)^T = B^T A^T$
**Inverse:** $A^{-1}$ satisfies $AA^{-1} = A^{-1}A = I$. Exists only when $\det(A) \neq 0$.
Key properties: $(AB)^{-1} = B^{-1}A^{-1}$, $(A^T)^{-1} = (A^{-1})^T$
A = np.array([[1, 2], [3, 4]])
B = np.array([[5, 6], [7, 8]])
Matrix addition
print("A + B:\n", A + B)
Matrix multiplication
C = A @ B # preferred syntax in Python 3
print("A @ B:\n", C)
Element-wise multiplication (Hadamard product)
print("A * B (Hadamard):\n", A * B)
Transpose
print("A^T:\n", A.T)
Inverse
A_inv = np.linalg.inv(A)
print("A^(-1):\n", A_inv)
print("A @ A^(-1):\n", np.round(A @ A_inv)) # Identity
Special matrices
print("3x3 Identity:\n", np.eye(3))
print("Diagonal:\n", np.diag([1, 2, 3]))
Properties
print("(AB)^T == B^T A^T:", np.allclose((A@B).T, B.T @ A.T))
print("(AB)^(-1) == B^(-1) A^(-1):",
np.allclose(np.linalg.inv(A@B), np.linalg.inv(B) @ np.linalg.inv(A)))
Symmetric matrix check
S = A + A.T
print("S is symmetric:", np.allclose(S, S.T))
Frobenius norm of a matrix
print("Frobenius norm:", np.linalg.norm(A, 'fro'))
3. Determinants
Definition and Computation
The **determinant** of a square matrix is a scalar that encodes crucial information about the matrix.
**2x2 determinant:**
$$\det\begin{pmatrix} a & b \\ c & d \end{pmatrix} = ad - bc$$
**3x3 determinant by cofactor expansion:**
$$\det(A) = a_{11} C_{11} + a_{12} C_{12} + a_{13} C_{13}$$
where $C_{ij} = (-1)^{i+j} M_{ij}$ is the cofactor and $M_{ij}$ is the minor.
Geometric Interpretation
The absolute value of the determinant equals the **volume of the parallelepiped** formed by the row vectors. A determinant of zero means the rows are linearly dependent — the matrix "flattens" space.
Key Properties
1. $\det(I) = 1$
2. $\det(AB) = \det(A) \det(B)$
3. $\det(A^T) = \det(A)$
4. $\det(A^{-1}) = 1 / \det(A)$
5. Swapping two rows negates the determinant
6. $\det(A) = 0 \iff A$ is singular (non-invertible)
Cramer's Rule
For $A\mathbf{x} = \mathbf{b}$ with $\det(A) \neq 0$:
$$x_i = \frac{\det(A_i)}{\det(A)}$$
where $A_i$ is $A$ with column $i$ replaced by $\mathbf{b}$.
2x2 determinant
A2 = np.array([[3, 1], [2, 4]])
print("det([[3,1],[2,4]]):", np.linalg.det(A2)) # 10
3x3 determinant
A3 = np.array([[1, 2, 3],
[4, 5, 6],
[7, 8, 10]])
print("det(A3):", np.linalg.det(A3)) # -3
Properties verification
B = np.array([[2, 0], [0, 3]])
print("det(A2) * det(B) =", np.linalg.det(A2) * np.linalg.det(B))
print("det(A2 @ B) =", np.linalg.det(A2 @ B))
Singular matrix (det = 0)
singular = np.array([[1, 2], [2, 4]]) # row 2 = 2 * row 1
print("Singular matrix det:", np.linalg.det(singular)) # ~0
Cramer's Rule
def cramers_rule(A, b):
det_A = np.linalg.det(A)
if abs(det_A) < 1e-10:
raise ValueError("Matrix is singular")
n = len(b)
x = np.zeros(n)
for i in range(n):
Ai = A.copy().astype(float)
Ai[:, i] = b
x[i] = np.linalg.det(Ai) / det_A
return x
A = np.array([[2, 1], [5, 3]], dtype=float)
b = np.array([1, 2])
x_cramer = cramers_rule(A, b)
x_solve = np.linalg.solve(A, b)
print("\nCramer's rule:", x_cramer)
print("np.linalg.solve:", x_solve)
print("Results match:", np.allclose(x_cramer, x_solve))
4. Systems of Linear Equations and Gaussian Elimination
Linear Systems
A system of $m$ equations in $n$ unknowns:
$$a_{11}x_1 + a_{12}x_2 + \cdots + a_{1n}x_n = b_1$$
$$\vdots$$
$$a_{m1}x_1 + a_{m2}x_2 + \cdots + a_{mn}x_n = b_m$$
In matrix form: $A\mathbf{x} = \mathbf{b}$
A system has either: **no solution** (inconsistent), **exactly one solution** (consistent, $\det(A) \neq 0$), or **infinitely many solutions** (underdetermined).
Augmented Matrix and Row Operations
The **augmented matrix** $[A|\mathbf{b}]$ combines $A$ and $\mathbf{b}$. Three elementary row operations preserve solution sets:
1. Row swap: $R_i \leftrightarrow R_j$
2. Row scaling: $R_i \leftarrow c \cdot R_i$, $c \neq 0$
3. Row addition: $R_i \leftarrow R_i + c \cdot R_j$
**Gaussian elimination** uses these operations to produce Row Echelon Form (REF). **Gauss-Jordan elimination** produces Reduced Row Echelon Form (RREF).
LU Decomposition
$A = LU$ where $L$ is lower triangular and $U$ is upper triangular. Useful for solving $A\mathbf{x} = \mathbf{b}$ efficiently for multiple right-hand sides.
from scipy import linalg
def gauss_jordan_solve(A, b):
"""Solve Ax = b using Gauss-Jordan elimination."""
n = len(b)
aug = np.hstack([A.astype(float), b.reshape(-1, 1).astype(float)])
for col in range(n):
Partial pivoting
max_row = np.argmax(np.abs(aug[col:, col])) + col
aug[[col, max_row]] = aug[[max_row, col]]
Normalize pivot row
aug[col] = aug[col] / aug[col, col]
Eliminate column
for row in range(n):
if row != col:
aug[row] -= aug[row, col] * aug[col]
return aug[:, -1]
Example: 3x3 system
A = np.array([[2, 1, -1],
[-3, -1, 2],
[-2, 1, 2]], dtype=float)
b = np.array([8, -11, -3], dtype=float)
x = gauss_jordan_solve(A.copy(), b.copy())
print("Solution:", x) # [2. 3. -1.]
print("Verification A@x:", A @ x)
LU Decomposition
P, L, U = linalg.lu(A)
print("\nL:\n", np.round(L, 3))
print("U:\n", np.round(U, 3))
Solve using LU
Ly = Pb, then Ux = y
Pb = P.T @ b
y = linalg.solve_triangular(L, Pb, lower=True)
x_lu = linalg.solve_triangular(U, y)
print("LU solution:", x_lu)
Checking solution existence
A_rank = np.linalg.matrix_rank(A)
print("\nRank of A:", A_rank)
A_aug = np.hstack([A, b.reshape(-1, 1)])
aug_rank = np.linalg.matrix_rank(A_aug)
print("Rank of augmented matrix:", aug_rank)
Consistent iff rank(A) == rank([A|b])
5. Vector Spaces
Definition and Axioms
A **vector space** $V$ over $\mathbb{R}$ is a set closed under addition and scalar multiplication satisfying 8 axioms:
1. Closure under addition: $\mathbf{u} + \mathbf{v} \in V$
2. Commutativity: $\mathbf{u} + \mathbf{v} = \mathbf{v} + \mathbf{u}$
3. Associativity: $(\mathbf{u} + \mathbf{v}) + \mathbf{w} = \mathbf{u} + (\mathbf{v} + \mathbf{w})$
4. Zero vector: $\exists\, \mathbf{0}$ such that $\mathbf{v} + \mathbf{0} = \mathbf{v}$
5. Additive inverse: $\exists\, {-}\mathbf{v}$ such that $\mathbf{v} + (-\mathbf{v}) = \mathbf{0}$
6. Closure under scalar multiplication: $c\mathbf{v} \in V$
7. Distributivity 1: $c(\mathbf{u} + \mathbf{v}) = c\mathbf{u} + c\mathbf{v}$
8. Distributivity 2: $(c + d)\mathbf{v} = c\mathbf{v} + d\mathbf{v}$
Subspaces
A subset $W \subseteq V$ is a **subspace** if it contains the zero vector, and is closed under addition and scalar multiplication.
Linear Independence
Vectors $\mathbf{v}_1, \ldots, \mathbf{v}_k$ are **linearly independent** if:
$$c_1\mathbf{v}_1 + c_2\mathbf{v}_2 + \cdots + c_k\mathbf{v}_k = \mathbf{0} \implies c_1 = c_2 = \cdots = c_k = 0$$
Otherwise, they are **linearly dependent** (at least one is a linear combination of the others).
Basis and Dimension
A **basis** for $V$ is a linearly independent set that spans $V$. The **dimension** of $V$ equals the number of vectors in any basis.
Standard basis of $\mathbb{R}^n$: $\mathbf{e}_1 = (1,0,\ldots,0)$, ..., $\mathbf{e}_n = (0,\ldots,0,1)$
Four Fundamental Subspaces
For $A \in \mathbb{R}^{m \times n}$:
| Subspace | Lives in | Dimension |
| -------------------- | -------------- | --------- |
| Row space (rowsp) | $\mathbb{R}^n$ | $r$ |
| Column space (colsp) | $\mathbb{R}^m$ | $r$ |
| Null space (ker) | $\mathbb{R}^n$ | $n - r$ |
| Left null space | $\mathbb{R}^m$ | $m - r$ |
where $r = \text{rank}(A)$. **Rank-Nullity Theorem:** $r + (n-r) = n$.
from scipy.linalg import null_space
A = np.array([[1, 2, 3, 4],
[2, 4, 6, 8], # 2 * row 1
[1, 0, 1, 2]])
print("Shape:", A.shape)
print("Rank:", np.linalg.matrix_rank(A)) # 2
Null space basis
ns = null_space(A)
print("Null space dimension:", ns.shape[1]) # 4 - rank = 2
print("Null space basis:\n", ns)
Verify: A @ ns should be ~0
print("A @ null_space:\n", np.round(A @ ns, 10))
Column space basis via QR
Q, R = np.linalg.qr(A.T) # QR of A^T for column space
rank = np.linalg.matrix_rank(A)
col_basis = Q[:, :rank]
print("Column space basis shape:", col_basis.shape)
Linear independence check
v1 = np.array([1, 2, 3])
v2 = np.array([0, 1, 2])
v3 = np.array([1, 3, 5]) # v1 + v2
M = np.column_stack([v1, v2, v3])
print("\nRank of [v1, v2, v3]:", np.linalg.matrix_rank(M)) # 2, linearly dependent!
v3_indep = np.array([1, 0, 1])
M2 = np.column_stack([v1, v2, v3_indep])
print("Rank of [v1, v2, v3_indep]:", np.linalg.matrix_rank(M2)) # 3, linearly independent
6. Linear Transformations
Definition
A function $T: V \to W$ is a **linear transformation** if for all $\mathbf{u}, \mathbf{v} \in V$ and $c \in \mathbb{R}$:
1. $T(\mathbf{u} + \mathbf{v}) = T(\mathbf{u}) + T(\mathbf{v})$ (additivity)
2. $T(c\mathbf{v}) = cT(\mathbf{v})$ (homogeneity)
Equivalently: $T(c\mathbf{u} + d\mathbf{v}) = cT(\mathbf{u}) + dT(\mathbf{v})$ (superposition)
Matrix Representation
Every linear transformation from $\mathbb{R}^n$ to $\mathbb{R}^m$ is represented by some $m \times n$ matrix $A$: $T(\mathbf{x}) = A\mathbf{x}$.
**Kernel:** $\ker(T) = \{\mathbf{v} \in V : T(\mathbf{v}) = \mathbf{0}\}$
**Image:** $\text{Im}(T) = \{T(\mathbf{v}) : \mathbf{v} \in V\}$
**Rank-Nullity:** $\dim(\ker T) + \dim(\text{Im} T) = \dim V$
Geometric Transformations in 2D
**Rotation by angle $\theta$:**
$$R(\theta) = \begin{pmatrix} \cos\theta & -\sin\theta \\ \sin\theta & \cos\theta \end{pmatrix}$$
**Reflection about x-axis:**
$$F_x = \begin{pmatrix} 1 & 0 \\ 0 & -1 \end{pmatrix}$$
**Uniform scaling by factor s:**
$$S(s) = \begin{pmatrix} s & 0 \\ 0 & s \end{pmatrix}$$
**Horizontal shear:**
$$H = \begin{pmatrix} 1 & k \\ 0 & 1 \end{pmatrix}$$
Rotation
def rotation_2d(theta_deg):
theta = np.radians(theta_deg)
return np.array([[np.cos(theta), -np.sin(theta)],
[np.sin(theta), np.cos(theta)]])
R90 = rotation_2d(90)
v = np.array([1.0, 0.0])
print("Rotate [1,0] by 90 deg:", R90 @ v) # [0, 1]
Composition: rotate then reflect
Fx = np.array([[1, 0], [0, -1]])
T_composed = Fx @ R90
print("Rotate then reflect:", T_composed @ v) # [0, -1]
Orthogonal matrix properties
print("R90^T @ R90 = I:", np.allclose(R90.T @ R90, np.eye(2)))
print("det(R90) =", np.linalg.det(R90)) # 1.0
Shear transformation
shear = np.array([[1, 2], [0, 1]]) # horizontal shear by 2
pts = np.array([[0, 1, 1, 0],
[0, 0, 1, 1]], dtype=float)
transformed = shear @ pts
print("Original corners:\n", pts)
print("Sheared corners:\n", transformed)
print("Area preserved? det(shear) =", np.linalg.det(shear)) # 1
Projection onto a line
Project onto the line y = mx (unit direction vector u)
u = np.array([1, 1]) / np.sqrt(2) # y = x direction
P_proj = np.outer(u, u) # projection matrix
print("\nProjection matrix:\n", P_proj)
w = np.array([3, 1])
print("Projection of [3,1] onto y=x:", P_proj @ w) # [2, 2]
7. Inner Product Spaces
Inner Product Definition
An **inner product** $\langle \cdot, \cdot \rangle: V \times V \to \mathbb{R}$ satisfies:
1. **Positive definiteness:** $\langle \mathbf{v}, \mathbf{v} \rangle \geq 0$, with equality iff $\mathbf{v} = \mathbf{0}$
2. **Symmetry:** $\langle \mathbf{u}, \mathbf{v} \rangle = \langle \mathbf{v}, \mathbf{u} \rangle$
3. **Bilinearity:** $\langle a\mathbf{u} + b\mathbf{v}, \mathbf{w} \rangle = a\langle \mathbf{u}, \mathbf{w} \rangle + b\langle \mathbf{v}, \mathbf{w} \rangle$
The **induced norm** is $\|\mathbf{v}\| = \sqrt{\langle \mathbf{v}, \mathbf{v} \rangle}$.
Cauchy-Schwarz Inequality
$$|\langle \mathbf{u}, \mathbf{v} \rangle| \leq \|\mathbf{u}\| \cdot \|\mathbf{v}\|$$
Equality holds iff $\mathbf{u}$ and $\mathbf{v}$ are parallel. This implies the **triangle inequality**: $\|\mathbf{u} + \mathbf{v}\| \leq \|\mathbf{u}\| + \|\mathbf{v}\|$.
Gram-Schmidt Process
Given linearly independent vectors $\mathbf{v}_1, \ldots, \mathbf{v}_k$, produces an orthonormal basis:
$$\mathbf{u}_1 = \mathbf{v}_1$$
$$\mathbf{u}_j = \mathbf{v}_j - \sum_{i=1}^{j-1} \frac{\langle \mathbf{v}_j, \mathbf{u}_i \rangle}{\langle \mathbf{u}_i, \mathbf{u}_i \rangle} \mathbf{u}_i$$
Then normalize: $\mathbf{e}_j = \mathbf{u}_j / \|\mathbf{u}_j\|$.
This is the basis of **QR decomposition**: $A = QR$ where $Q$ has orthonormal columns.
def gram_schmidt(vectors):
"""Classical Gram-Schmidt orthonormalization."""
basis = []
for v in vectors:
w = np.array(v, dtype=float)
for b in basis:
w = w - np.dot(w, b) * b
norm = np.linalg.norm(w)
if norm > 1e-10:
basis.append(w / norm)
return np.array(basis)
Apply Gram-Schmidt
v1 = np.array([1, 1, 0])
v2 = np.array([1, 0, 1])
v3 = np.array([0, 1, 1])
Q = gram_schmidt([v1, v2, v3])
print("Orthonormal basis:")
for i, q in enumerate(Q):
print(f" e{i+1} = {q}")
Verify orthonormality: Q @ Q^T should equal I
print("\nQ @ Q^T:\n", np.round(Q @ Q.T))
QR decomposition (equivalent to Gram-Schmidt)
A = np.column_stack([v1, v2, v3])
Q_qr, R = np.linalg.qr(A)
print("\nQR decomposition Q:\n", Q_qr)
print("R (upper triangular):\n", np.round(R, 4))
Projection
def project(v, u):
"""Project v onto u."""
return (np.dot(v, u) / np.dot(u, u)) * u
Orthogonal complement
x = np.array([3, 2, 1])
q1 = Q[0]
x_parallel = project(x, q1)
x_perp = x - x_parallel
print("\nOrthogonal decomposition:")
print("x:", x)
print("x_parallel to e1:", x_parallel)
print("x_perpendicular:", x_perp)
print("Perpendicular check:", np.dot(x_perp, q1)) # ~0
8. Eigenvalues and Eigenvectors
Definition
For a square matrix $A$, a nonzero vector $\mathbf{v}$ and scalar $\lambda$ satisfying:
$$A\mathbf{v} = \lambda\mathbf{v}$$
are called an **eigenvector** and **eigenvalue** of $A$, respectively.
An eigenvector is only scaled (not rotated) by $A$. The eigenvalue $\lambda$ gives the scale factor.
Characteristic Equation
Rearranging: $(A - \lambda I)\mathbf{v} = \mathbf{0}$. For nontrivial solutions, the matrix $(A - \lambda I)$ must be singular:
$$\det(A - \lambda I) = 0$$
This is the **characteristic polynomial** of $A$. Its roots are the eigenvalues.
Eigendecomposition (Diagonalization)
If $A$ has $n$ linearly independent eigenvectors $\mathbf{p}_1, \ldots, \mathbf{p}_n$:
$$A = P \Lambda P^{-1}$$
where $P = [\mathbf{p}_1 \cdots \mathbf{p}_n]$ and $\Lambda = \text{diag}(\lambda_1, \ldots, \lambda_n)$.
**Application:** $A^k = P \Lambda^k P^{-1}$ — easy to compute powers!
Spectral Theorem for Symmetric Matrices
If $A = A^T$ (real symmetric):
1. All eigenvalues are **real**
2. Eigenvectors for distinct eigenvalues are **orthogonal**
3. $A$ is orthogonally diagonalizable: $A = Q \Lambda Q^T$ where $Q^T = Q^{-1}$
from scipy.linalg import eigh
General eigendecomposition
A = np.array([[4, 1],
[2, 3]])
eigenvalues, eigenvectors = np.linalg.eig(A)
print("Eigenvalues:", eigenvalues) # [5., 2.]
print("Eigenvectors (columns):\n", eigenvectors)
Verify A @ v = lambda * v
for i in range(len(eigenvalues)):
lam = eigenvalues[i]
v = eigenvectors[:, i]
print(f"\nFor lambda = {lam:.2f}:")
print(" A @ v =", A @ v)
print(" lam * v =", lam * v)
print(" Match:", np.allclose(A @ v, lam * v))
Diagonalization: A = P @ Lambda @ P_inv
P = eigenvectors
Lambda = np.diag(eigenvalues)
P_inv = np.linalg.inv(P)
print("\nA = P @ Lambda @ P_inv:", np.allclose(P @ Lambda @ P_inv, A))
Matrix power via diagonalization: A^10
k = 10
A_k = P @ np.diag(eigenvalues**k) @ P_inv
print("A^10 (diag):", np.round(A_k.real))
print("A^10 (direct):", np.linalg.matrix_power(A, k))
Symmetric matrix: use eigh (more stable, guaranteed real)
B = np.array([[4, 2, 0],
[2, 3, 1],
[0, 1, 5]])
evals, evecs = eigh(B)
print("\nSymmetric matrix eigenvalues:", evals)
print("Orthogonal eigenvectors (Q^T @ Q = I):\n",
np.round(evecs.T @ evecs))
Spectral decomposition: A = Q @ Lambda @ Q^T
A_reconstructed = evecs @ np.diag(evals) @ evecs.T
print("Reconstruction error:", np.linalg.norm(B - A_reconstructed))
Trace = sum of eigenvalues, Det = product of eigenvalues
print("\ntrace(A) =", np.trace(A), "== sum of eigenvalues:", np.sum(eigenvalues).real)
print("det(A) =", np.linalg.det(A), "== product of eigenvalues:", np.prod(eigenvalues).real)
9. Singular Value Decomposition (SVD)
Definition
Every $m \times n$ real matrix $A$ can be decomposed as:
$$A = U \Sigma V^T$$
- $U \in \mathbb{R}^{m \times m}$: **left singular vectors** (orthonormal columns, eigenvectors of $AA^T$)
- $\Sigma \in \mathbb{R}^{m \times n}$: diagonal matrix with **singular values** $\sigma_1 \geq \sigma_2 \geq \cdots \geq 0$
- $V \in \mathbb{R}^{n \times n}$: **right singular vectors** (orthonormal columns, eigenvectors of $A^TA$)
The singular values satisfy $\sigma_i = \sqrt{\lambda_i(A^T A)}$.
Geometric Interpretation
$A = U \Sigma V^T$ represents three stages:
1. $V^T$: rotate/reflect input space
2. $\Sigma$: scale along coordinate axes (possibly different dimensions)
3. $U$: rotate/reflect output space
Low-Rank Approximation
The best rank-$k$ approximation (minimizing Frobenius norm) is:
$$A_k = \sum_{i=1}^{k} \sigma_i \mathbf{u}_i \mathbf{v}_i^T$$
This is the foundation of **image compression** and **collaborative filtering**.
Connection to PCA
For a mean-centered data matrix $X$: SVD gives $X = U\Sigma V^T$. The right singular vectors $V$ are the **principal components**, and the singular values relate to **explained variance**.
from sklearn.decomposition import PCA
from sklearn.datasets import load_digits
Basic SVD
A = np.array([[3, 2, 2],
[2, 3, -2]], dtype=float)
U, S, Vt = np.linalg.svd(A)
print("U:\n", np.round(U, 4))
print("Singular values:", np.round(S, 4))
print("Vt:\n", np.round(Vt, 4))
Reconstruction
m, n = A.shape
Sigma = np.zeros((m, n))
Sigma[:min(m,n), :min(m,n)] = np.diag(S)
A_rec = U @ Sigma @ Vt
print("Reconstruction error:", np.linalg.norm(A - A_rec))
Low-rank approximation
def low_rank_approx(A, k):
U, S, Vt = np.linalg.svd(A, full_matrices=False)
return U[:, :k] @ np.diag(S[:k]) @ Vt[:k, :]
Demonstrate on random matrix
np.random.seed(42)
M = np.random.randn(50, 50)
for k in [1, 5, 10, 25]:
M_k = low_rank_approx(M, k)
rel_err = np.linalg.norm(M - M_k, 'fro') / np.linalg.norm(M, 'fro')
storage_ratio = k * (50 + 50 + 1) / (50 * 50)
print(f"k={k:2d}: relative error={rel_err:.4f}, storage={storage_ratio:.3f}")
SVD-based PCA on real data
digits = load_digits()
X = digits.data # (1797, 64)
X_centered = X - X.mean(axis=0)
U_d, S_d, Vt_d = np.linalg.svd(X_centered, full_matrices=False)
explained_variance = S_d**2 / (X.shape[0] - 1)
explained_ratio = explained_variance / explained_variance.sum()
print("\nDigits dataset SVD-PCA:")
print("Explained variance ratio (top 5):", np.round(explained_ratio[:5], 4))
print("Cumulative (top 10):", round(explained_ratio[:10].sum(), 4))
Compare with sklearn PCA
pca = PCA(n_components=10)
X_pca = pca.fit_transform(X_centered)
print("sklearn PCA explained ratio:", np.round(pca.explained_variance_ratio_, 4))
Condition number (ratio of largest to smallest singular value)
print("\nCondition number of A:", S[0] / S[-1])
print("Well-conditioned? (< 100):", (S[0] / S[-1]) < 100)
10. Matrix Factorizations
Overview and Comparison
| Factorization | Form | Conditions | Applications |
| ------------------ | --------------------- | --------------------------- | -------------------------------- |
| LU | $A = LU$ | Square matrix | Solving $A\mathbf{x}=\mathbf{b}$ |
| QR | $A = QR$ | Any matrix | Least squares, eigenvalues |
| Cholesky | $A = LL^T$ | Symmetric positive definite | Optimization, statistics |
| SVD | $A = U\Sigma V^T$ | Any matrix | PCA, compression, pseudo-inverse |
| Eigendecomposition | $A = P\Lambda P^{-1}$ | Diagonalizable | Dynamics, spectral analysis |
Least Squares via QR
For overdetermined $A\mathbf{x} \approx \mathbf{b}$ (more equations than unknowns), minimize $\|A\mathbf{x} - \mathbf{b}\|^2$. Using QR: $\hat{\mathbf{x}} = R^{-1}Q^T\mathbf{b}$.
Pseudo-Inverse via SVD
For any matrix $A = U\Sigma V^T$, the **Moore-Penrose pseudo-inverse** is:
$$A^+ = V \Sigma^+ U^T$$
where $\Sigma^+$ inverts nonzero diagonal entries.
from scipy import linalg
np.random.seed(42)
QR decomposition and least squares
Overdetermined system: 5 equations, 2 unknowns
A = np.random.randn(5, 2)
b_true = np.array([1, 2])
b = A @ b_true + 0.1 * np.random.randn(5)
QR least squares
Q, R = np.linalg.qr(A)
x_qr = np.linalg.solve(R, Q.T @ b)
print("QR least squares:", x_qr)
print("True values:", b_true)
numpy lstsq
x_lstsq, _, _, _ = np.linalg.lstsq(A, b, rcond=None)
print("lstsq:", x_lstsq)
Cholesky decomposition
B = np.array([[6, 3, 2],
[3, 5, 1],
[2, 1, 4]], dtype=float)
L = np.linalg.cholesky(B)
print("\nCholesky L:\n", np.round(L, 4))
print("L @ L^T = B:", np.allclose(L @ L.T, B))
Cholesky solve
b_sys = np.array([1, 2, 3], dtype=float)
y = linalg.solve_triangular(L, b_sys, lower=True)
x_chol = linalg.solve_triangular(L.T, y)
print("Cholesky solution:", x_chol)
print("Verify:", np.allclose(B @ x_chol, b_sys))
Pseudo-inverse via SVD
A_rect = np.array([[1, 2], [3, 4], [5, 6]], dtype=float)
A_pinv = np.linalg.pinv(A_rect)
print("\nPseudo-inverse shape:", A_pinv.shape) # (2, 3)
print("A_pinv @ A = I_2x2:\n", np.round(A_pinv @ A_rect))
SVD decomposition comparison
U, S, Vt = np.linalg.svd(A_rect, full_matrices=False)
A_pinv_manual = Vt.T @ np.diag(1/S) @ U.T
print("Manual pseudo-inverse matches:", np.allclose(A_pinv, A_pinv_manual))
11. Linear Algebra in AI/ML
Neural Networks and Matrix Multiplication
The forward pass of a neural network is a sequence of matrix multiplications:
$$\mathbf{h}^{(l)} = f\!\left(W^{(l)} \mathbf{h}^{(l-1)} + \mathbf{b}^{(l)}\right)$$
For a batch of $n$ samples with $d$ features: $X \in \mathbb{R}^{n \times d}$, the entire batch is processed as one matrix multiplication.
**Backpropagation** computes gradients using the chain rule, which involves matrix transpose operations:
$$\frac{\partial L}{\partial W^{(l)}} = \left(\frac{\partial L}{\partial \mathbf{h}^{(l)}}\right) \left(\mathbf{h}^{(l-1)}\right)^T$$
Attention Mechanism in Transformers
The scaled dot-product attention is a core operation in Transformer models:
$$\text{Attention}(Q, K, V) = \text{softmax}\!\left(\frac{QK^T}{\sqrt{d_k}}\right) V$$
Each query vector computes dot products with all key vectors, producing attention weights for a weighted sum of value vectors — pure linear algebra!
Dimensionality Reduction
PCA finds directions of maximum variance by eigendecomposing the covariance matrix $C = \frac{1}{n}X^TX$. The eigenvectors with largest eigenvalues are the **principal components**.
from sklearn.decomposition import PCA
from sklearn.datasets import load_iris
1. Neural Network Forward Pass
np.random.seed(42)
batch_size, d_in, d_hidden, d_out = 32, 784, 256, 10
X_batch = np.random.randn(batch_size, d_in)
W1 = np.random.randn(d_in, d_hidden) * np.sqrt(2.0 / d_in) # He init
b1 = np.zeros(d_hidden)
W2 = np.random.randn(d_hidden, d_out) * np.sqrt(2.0 / d_hidden)
b2 = np.zeros(d_out)
def relu(x): return np.maximum(0, x)
def softmax(x):
exp_x = np.exp(x - x.max(axis=1, keepdims=True))
return exp_x / exp_x.sum(axis=1, keepdims=True)
H1 = relu(X_batch @ W1 + b1)
logits = H1 @ W2 + b2
probs = softmax(logits)
print("Forward pass shapes:", X_batch.shape, "->", H1.shape, "->", probs.shape)
print("Probabilities sum to 1:", np.allclose(probs.sum(axis=1), 1))
2. Transformer Attention
def scaled_dot_product_attention(Q, K, V, mask=None):
d_k = Q.shape[-1]
scores = Q @ K.T / np.sqrt(d_k)
if mask is not None:
scores = scores + mask * -1e9
weights = np.exp(scores - scores.max(axis=-1, keepdims=True))
weights /= weights.sum(axis=-1, keepdims=True)
return weights @ V, weights
seq_len, d_model = 8, 64
Q = np.random.randn(seq_len, d_model)
K = np.random.randn(seq_len, d_model)
V = np.random.randn(seq_len, d_model)
out, attn_weights = scaled_dot_product_attention(Q, K, V)
print("\nAttention output:", out.shape)
print("Attention weights sum:", attn_weights.sum(axis=-1)) # all 1s
3. PCA for Dimensionality Reduction
iris = load_iris()
X_iris = iris.data
y_iris = iris.target
pca = PCA(n_components=2)
X_pca = pca.fit_transform(X_iris)
print("\nIris PCA:")
print("Original shape:", X_iris.shape, "-> Reduced:", X_pca.shape)
print("Explained variance:", np.round(pca.explained_variance_ratio_, 4))
print("Total variance explained:", round(pca.explained_variance_ratio_.sum(), 4))
print("Principal components (loadings):\n", np.round(pca.components_, 4))
4. Word Embedding Analogy
embeddings = {
"king": np.array([0.9, 0.1, 0.8, 0.2, 0.3]),
"queen": np.array([0.8, 0.2, 0.7, 0.3, 0.9]),
"man": np.array([0.8, 0.1, 0.2, 0.1, 0.1]),
"woman": np.array([0.7, 0.2, 0.1, 0.2, 0.8]),
}
def cosine_sim(a, b):
return np.dot(a, b) / (np.linalg.norm(a) * np.linalg.norm(b))
king - man + woman ≈ queen
analogy_vec = embeddings["king"] - embeddings["man"] + embeddings["woman"]
print("\nWord analogy (king - man + woman):")
for word, emb in embeddings.items():
print(f" sim to {word}: {cosine_sim(analogy_vec, emb):.4f}")
12. Quiz
**Answer**: The dot product $\mathbf{u} \cdot \mathbf{v} = \|\mathbf{u}\|\|\mathbf{v}\|\cos\theta$ gives the product of the magnitudes multiplied by the cosine of the angle between them. It measures "how much one vector goes in the direction of another."
**Key applications**: Testing orthogonality (dot product = 0), computing projections, measuring similarity in machine learning (cosine similarity).
**Answer**: A square matrix has no inverse (is **singular**) when its determinant equals zero. Equivalently, the matrix has linearly dependent rows/columns, its rank is less than $n$, or the system $A\mathbf{x} = \mathbf{b}$ does not have a unique solution.
**Example**: Any matrix with two identical rows, or rows that sum to zero, is singular.
**Answer**: Eigendecomposition ($A = P\Lambda P^{-1}$) applies only to **square** matrices and requires linearly independent eigenvectors. SVD ($A = U\Sigma V^T$) works for **any** matrix (including rectangular) and always exists. For symmetric positive definite matrices, both coincide.
**When to use SVD**: When the matrix is rectangular, nearly singular (SVD is more numerically stable), or when you need the best low-rank approximation.
**Answer**: If $\text{rank}(A) < n$ (rank deficient), the system $A\mathbf{x} = \mathbf{b}$ either has **no solution** (when $\mathbf{b}$ is not in the column space of $A$) or **infinitely many solutions** (when the null space is nontrivial). A unique solution requires full column rank.
**Practical implication**: In machine learning, underdetermined systems correspond to models with more parameters than training samples, leading to overfitting.
**Answer**: QR decomposition ($A = QR$) is the matrix formulation of the Gram-Schmidt process. The columns of $Q$ are the orthonormal vectors produced by Gram-Schmidt applied to the columns of $A$. $R$ is upper triangular and captures the change-of-basis coefficients.
**Applications**: Solving least squares problems numerically, computing eigenvalues (QR algorithm), and orthogonalizing data features.
Summary and Learning Roadmap
The core concepts of linear algebra are deeply interconnected:
Vectors -> Matrices -> Determinants -> Linear Systems
| |
Vector Spaces -> Linear Transforms -> Inner Products
|
Eigenvalues/Eigenvectors -> SVD -> AI/ML Applications
**Recommended next steps:**
1. **Gilbert Strang's MIT OCW 18.06**: The gold standard for linear algebra pedagogy — thorough, with excellent problem sets
2. **3Blue1Brown - Essence of Linear Algebra**: Build geometric intuition before or alongside the formal treatment
3. **NumPy linalg module**: Practice every concept with code — `numpy.linalg` covers everything in this guide
4. **scikit-learn PCA documentation**: See how linear algebra powers real ML pipelines
5. **MIT 18.065 (Matrix Methods in Data Analysis)**: Advanced applications in data science and deep learning
The secret to mastering linear algebra is developing **geometric intuition** — always visualize what a matrix operation does to space. Combine that intuition with computational practice in NumPy, and you will have a powerful tool for any technical field.
현재 단락 (1/524)
Linear Algebra is the cornerstone of modern mathematics and engineering. From data science and machi...