Engineering Math Mastery Part 4: Numerical Methods

Numerical analysis is required when engineering problems cannot be solved analytically, or when numerical simulation of complex systems is needed. Analyzing the nonlinear characteristics of transistors, transient responses in complex circuit networks, and finite element analysis all rely on numerical methods. This article covers core numerical techniques from first principles to full Python implementation.

1. Error Analysis

1.1 Types of Errors

In numerical computation, a "perfect solution" does not exist. Understanding the types of errors is the foundation of reliable numerical computation.

Round-off Error: Occurs because computers represent real numbers with a finite number of digits.

IEEE 754 double precision: approximately 15-16 significant digits

$1/3 = 0.333333\ldots \to 0.3333333333333333 \text{ (finite representation)}$

Truncation Error: Occurs when an infinite series is approximated with a finite number of terms.

$e^x = 1 + x + \frac{x^2}{2!} + \frac{x^3}{3!} + \cdots \approx 1 + x + \frac{x^2}{2!} \text{ (higher-order terms truncated)}$

Absolute Error: $E_a = |x_{\text{true}} - x_{\text{approx}}|$

Relative Error: $E_r = \frac{|x_{\text{true}} - x_{\text{approx}}|}{|x_{\text{true}}|}$

Percentage error: $\epsilon_r = E_r \times 100\%$

Significant Figures: If $\epsilon_r < 0.5\%$ , this guarantees $d$ significant digits: $d = 2 - \log_{10}(E_r \times 100)$

1.2 Numerical Stability

Errors must not be amplified during computation.

Well-conditioned: Input errors are not significantly amplified in the output

Ill-conditioned: Small changes in input cause large changes in output

Condition Number: $\kappa(A) = \|A\| \cdot \|A^{-1}\|$

If $\kappa(A)$ is large in a linear system $Ax = b$ , the numerical solution is unstable.

import numpy as np

# Hilbert matrix (notoriously ill-conditioned)
n = 6
H = np.array([[1/(i+j+1) for j in range(n)] for i in range(n)])
kappa = np.linalg.cond(H)
print(f"Hilbert matrix ({n}x{n}) condition number: {kappa:.2e}")

# Simple matrix
A = np.array([[2, 1], [1, 3]], dtype=float)
print(f"Simple matrix condition number: {np.linalg.cond(A):.2f}")

# Error propagation demonstration
b_true = np.array([3, 4], dtype=float)
x_true = np.linalg.solve(A, b_true)

# Add a small error
b_perturbed = b_true + np.array([0.01, 0.0])
x_perturbed = np.linalg.solve(A, b_perturbed)

print(f"\nInput error: {np.linalg.norm(b_perturbed - b_true):.4f}")
print(f"Output error: {np.linalg.norm(x_perturbed - x_true):.4f}")

1.3 Big-O Notation and Order of Convergence

Order of Convergence:

$E_{n+1} \leq C \cdot E_n^p$

$p = 1$ : First-order (linear) convergence
$p = 2$ : Second-order (quadratic) convergence (much faster)
$p = 1.618$ : Superlinear convergence (secant method)

Big-O Notation:

$f(h) = O(h^p) \Leftrightarrow |f(h)| \leq C h^p \text{ as } h \to 0$

Example: Forward difference error is $O(h)$ ; central difference error is $O(h^2)$

2. Solving Nonlinear Equations

2.1 Bisection Method

Principle: Bolzano's theorem — if $f(a) \cdot f(b) < 0$ , there exists a root in $[a, b]$ .

Algorithm:

Verify $f(a) \cdot f(b) < 0$
Compute midpoint $c = (a + b)/2$
If $f(c) = 0$ , done; if $f(a) \cdot f(c) < 0$ , set $b = c$ ; otherwise set $a = c$
If $|b - a| < \epsilon$ , done

Convergence: The interval halves with each iteration — first-order convergence. Error after $n$ iterations: $E_n \leq (b-a)/2^n$

Required iterations: $n \geq \log_2\!\left(\frac{b-a}{\epsilon}\right)$

def bisection(f, a, b, tol=1e-10, max_iter=100):
    """Find root of f(x) = 0 using the bisection method"""
    if f(a) * f(b) > 0:
        raise ValueError("f(a) and f(b) must have opposite signs")

    history = []
    for i in range(max_iter):
        c = (a + b) / 2
        history.append({'iter': i+1, 'a': a, 'b': b, 'c': c,
                        'f(c)': f(c), 'error': abs(b-a)/2})

        if abs(f(c)) < tol or (b - a) / 2 < tol:
            print(f"Converged in {i+1} iterations")
            return c, history

        if f(a) * f(c) < 0:
            b = c
        else:
            a = c

    return (a + b) / 2, history

# Example: x^3 - 2x - 5 = 0
def equation(x):
    return x**3 - 2*x - 5

root, history = bisection(equation, 1, 3)
print(f"Root: {root:.10f}")
print(f"Verification: f({root:.6f}) = {equation(root):.2e}")

# Print convergence history
print("\nIter | Approximation | Error")
for h in history[:8]:
    print(f"  {h['iter']:2d} | {h['c']:.8f} | {h['error']:.2e}")

2.2 Newton-Raphson Method

Principle: First-order Taylor approximation at $x_n$

$f(x) \approx f(x_n) + f'(x_n)(x - x_n) = 0$

$x_{n+1} = x_n - \frac{f(x_n)}{f'(x_n)}$

Geometric interpretation: The tangent line at $x_n$ intersects the x-axis at $x_{n+1}$

Convergence: Ideally quadratic — the number of accurate digits doubles with each iteration!

$E_{n+1} \approx \frac{f''(x^*)}{2f'(x^*)}E_n^2$

Limitations and caveats:

Diverges if $f'(x_n) = 0$ (tangent line parallel to x-axis)
Convergence is not guaranteed (initial point selection matters)
Degrades to linear convergence at multiple roots

def newton_raphson(f, df, x0, tol=1e-10, max_iter=50):
    """Newton-Raphson method"""
    x = x0
    history = [{'iter': 0, 'x': x, 'f(x)': f(x), 'error': float('inf')}]

    for i in range(max_iter):
        fx = f(x)
        dfx = df(x)

        if abs(dfx) < 1e-15:
            print("Warning: derivative near zero, risk of divergence")
            break

        x_new = x - fx / dfx
        error = abs(x_new - x)
        history.append({'iter': i+1, 'x': x_new,
                        'f(x)': f(x_new), 'error': error})

        if error < tol:
            print(f"Converged in {i+1} iterations")
            return x_new, history

        x = x_new

    return x, history

# f(x) = x^3 - 2x - 5, f'(x) = 3x^2 - 2
def f(x):
    return x**3 - 2*x - 5

def df(x):
    return 3*x**2 - 2

root, history = newton_raphson(f, df, x0=2.0)
print(f"Newton-Raphson root: {root:.10f}")

print("\nIter | Approximation | Error")
for h in history:
    err_str = f"{h['error']:.2e}" if h['error'] != float('inf') else "   -"
    print(f"  {h['iter']:2d} | {h['x']:.10f} | {err_str}")

Convergence speed comparison:

import numpy as np
import matplotlib.pyplot as plt

# Bisection vs Newton-Raphson convergence comparison
true_root = 2.0945514815423265  # actual root of x^3 - 2x - 5 = 0

_, hist_bisect = bisection(f, 1, 3, tol=1e-12, max_iter=50)
_, hist_newton = newton_raphson(f, df, x0=2.0, tol=1e-12, max_iter=20)

errors_b = [abs(h['c'] - true_root) for h in hist_bisect]
errors_n = [abs(h['x'] - true_root) for h in hist_newton if h['error'] != float('inf')]

plt.figure(figsize=(10, 5))
plt.semilogy(range(1, len(errors_b)+1), errors_b, 'b-o', label='Bisection (1st order)')
plt.semilogy(range(1, len(errors_n)+1), errors_n, 'r-s', label='Newton-Raphson (2nd order)')
plt.xlabel('Iterations')
plt.ylabel('Absolute error (log scale)')
plt.title('Bisection vs Newton-Raphson Convergence')
plt.legend()
plt.grid(True, alpha=0.3)
plt.tight_layout()
plt.savefig('convergence_comparison.png', dpi=150)
plt.show()

2.3 Secant Method

When $f'(x)$ is unknown, approximate it with a finite difference:

$x_{n+1} = x_n - f(x_n)\cdot\frac{x_n - x_{n-1}}{f(x_n) - f(x_{n-1})}$

Order of convergence: $p = (1+\sqrt{5})/2 \approx 1.618$ (golden ratio, superlinear)

def secant_method(f, x0, x1, tol=1e-10, max_iter=50):
    """Secant method"""
    for i in range(max_iter):
        f0, f1 = f(x0), f(x1)
        if abs(f1 - f0) < 1e-15:
            break
        x2 = x1 - f1 * (x1 - x0) / (f1 - f0)
        if abs(x2 - x1) < tol:
            print(f"Converged in {i+1} iterations")
            return x2
        x0, x1 = x1, x2
    return x1

root_s = secant_method(f, 1.0, 3.0)
print(f"Secant method root: {root_s:.10f}")

2.4 Using scipy.optimize

from scipy.optimize import fsolve, brentq, minimize_scalar
import numpy as np

# Solve a single equation
def equation(x):
    return x**3 - 2*x - 5

# Brent's method (combination of bisection + secant + inverse quadratic interpolation)
root = brentq(equation, 1, 3, xtol=1e-12)
print(f"brentq root: {root:.12f}")

# fsolve (Newton-Raphson based)
root_fs = fsolve(equation, 2.0)[0]
print(f"fsolve root: {root_fs:.12f}")

# System of nonlinear equations
from scipy.optimize import fsolve

def system(vars):
    x, y = vars
    eq1 = x**2 + y**2 - 4   # circle
    eq2 = x*y - 1             # hyperbola
    return [eq1, eq2]

solutions = []
for x0, y0 in [(1, 1), (-1, 1), (1, -1), (-1, -1)]:
    sol = fsolve(system, [x0, y0], full_output=True)
    if sol[2] == 1:  # converged successfully
        solutions.append(sol[0])

print("\nSystem of equations solutions:")
for sol in solutions:
    print(f"  (x, y) = ({sol[0]:.6f}, {sol[1]:.6f})")

# Circuit equation: diode characteristics (nonlinear)
def diode_equation(V, V_s=5, R=1000, I_s=1e-12, n=1, V_T=0.02585):
    """V_s = V + R * I_s * (exp(V/(n*V_T)) - 1)"""
    I = I_s * (np.exp(V / (n * V_T)) - 1)
    return V_s - V - R * I

V_diode = brentq(diode_equation, 0, 1)
I_diode = 1e-12 * (np.exp(V_diode / 0.02585) - 1)
print(f"\nDiode operating point: V = {V_diode:.4f} V, I = {I_diode*1000:.4f} mA")

3. Numerical Differentiation and Integration

3.1 Finite Differences

Derivatives are numerically approximated from Taylor series.

Forward Difference:

$f(x+h) = f(x) + hf'(x) + \frac{h^2}{2}f''(x) + \cdots$

$f'(x) \approx \frac{f(x+h) - f(x)}{h} + O(h)$

First-order accuracy (error proportional to $h$ ).

Backward Difference:

$f'(x) \approx \frac{f(x) - f(x-h)}{h} + O(h)$

Central Difference:

$f(x+h) = f(x) + hf'(x) + \frac{h^2}{2}f''(x) + \frac{h^3}{6}f'''(x) + \cdots$

$f(x-h) = f(x) - hf'(x) + \frac{h^2}{2}f''(x) - \frac{h^3}{6}f'''(x) + \cdots$

Subtracting: $f'(x) \approx \frac{f(x+h) - f(x-h)}{2h} + O(h^2)$

Second-order accuracy — much more accurate than forward difference for the same $h$ .

Second derivative:

$f''(x) \approx \frac{f(x+h) - 2f(x) + f(x-h)}{h^2} + O(h^2)$

This is used in finite difference solutions of the heat equation and wave equation.

import numpy as np
import matplotlib.pyplot as plt

def numerical_derivative(f, x, h=1e-5, method='central'):
    """Numerical differentiation"""
    if method == 'forward':
        return (f(x + h) - f(x)) / h
    elif method == 'backward':
        return (f(x) - f(x - h)) / h
    elif method == 'central':
        return (f(x + h) - f(x - h)) / (2 * h)

def f(x):
    return np.sin(x)

def df_true(x):
    return np.cos(x)

x0 = np.pi / 4
true_deriv = df_true(x0)

# Error analysis for different step sizes
h_values = np.logspace(-16, 0, 100)
errors_fwd = []
errors_cen = []

for h in h_values:
    e_fwd = abs(numerical_derivative(f, x0, h, 'forward') - true_deriv)
    e_cen = abs(numerical_derivative(f, x0, h, 'central') - true_deriv)
    errors_fwd.append(e_fwd if e_fwd > 0 else 1e-17)
    errors_cen.append(e_cen if e_cen > 0 else 1e-17)

plt.figure(figsize=(10, 6))
plt.loglog(h_values, errors_fwd, 'b-', linewidth=2, label='Forward difference O(h)')
plt.loglog(h_values, errors_cen, 'r-', linewidth=2, label='Central difference O(h^2)')
plt.loglog(h_values, h_values, 'b--', alpha=0.5, label='O(h) reference')
plt.loglog(h_values, h_values**2, 'r--', alpha=0.5, label='O(h^2) reference')
plt.xlabel('h (step size)')
plt.ylabel('Absolute error')
plt.title('Numerical Differentiation Error Analysis (error vs h)')
plt.legend()
plt.grid(True, which='both', alpha=0.3)
plt.tight_layout()
plt.savefig('numerical_derivative_error.png', dpi=150)
plt.show()

3.2 Numerical Integration

Trapezoidal Rule:

$\int_a^b f(x)\,dx \approx h\left[\frac{f(a)}{2} + f(x_1) + f(x_2) + \cdots + f(x_{n-1}) + \frac{f(b)}{2}\right]$

where $h = (b-a)/n$ . Error: $O(h^2)$

Simpson's 1/3 Rule:

Approximate every two subintervals with a quadratic polynomial:

$\int_a^b f(x)\,dx \approx \frac{h}{3}\left[f(a) + 4f(x_1) + 2f(x_2) + 4f(x_3) + \cdots + 4f(x_{n-1}) + f(b)\right]$

$n$ must be even. Error: $O(h^4)$ — much more accurate than the trapezoidal rule!

Simpson's 3/8 Rule: Cubic polynomial, error $O(h^4)$ , $n$ must be a multiple of 3.

Gaussian Quadrature:

Maximizes accuracy by selecting optimal integration points and weights.

An $n$ -point Gauss-Legendre quadrature rule integrates polynomials up to degree $(2n-1)$ exactly.

$\int_{-1}^{1}f(x)\,dx \approx \sum_{i=1}^n w_i f(x_i)$

import numpy as np
from scipy import integrate
import matplotlib.pyplot as plt

def trapezoidal(f, a, b, n):
    """Trapezoidal rule"""
    x = np.linspace(a, b, n+1)
    h = (b - a) / n
    y = f(x)
    return h * (y[0]/2 + np.sum(y[1:-1]) + y[-1]/2)

def simpsons(f, a, b, n):
    """Simpson's 1/3 rule (n must be even)"""
    if n % 2 != 0:
        n += 1
    x = np.linspace(a, b, n+1)
    h = (b - a) / n
    y = f(x)
    return h/3 * (y[0] + 4*np.sum(y[1::2]) + 2*np.sum(y[2:-1:2]) + y[-1])

# Example: integral of sin(x) from 0 to pi = 2
f_test = np.sin
a, b = 0, np.pi
true_val = 2.0

print("Numerical Integration Accuracy Comparison")
print(f"Exact value: {true_val}")
print("-" * 60)
print(f"{'Method':<20} {'n':>6} {'Approx':>15} {'Rel. Error':>12}")
print("-" * 60)

for n in [4, 8, 16, 32]:
    trap = trapezoidal(f_test, a, b, n)
    simp = simpsons(f_test, a, b, n)
    print(f"{'Trapezoidal':<20} {n:>6} {trap:>15.10f} {abs(trap-true_val)/true_val:>12.2e}")
    print(f"{'Simpson':<20} {n:>6} {simp:>15.10f} {abs(simp-true_val)/true_val:>12.2e}")

# scipy.integrate.quad - adaptive quadrature
result, error = integrate.quad(f_test, a, b)
print(f"{'scipy.quad':<20} {'adaptive':>6} {result:>15.10f} {'<'+str(error):>12}")

# Difficult integral: int_0^inf exp(-x^2) dx = sqrt(pi)/2
def gaussian(x):
    return np.exp(-x**2)

result_gauss, _ = integrate.quad(gaussian, 0, np.inf)
print(f"\nGaussian integral: {result_gauss:.10f}")
print(f"Exact value (sqrt(pi)/2): {np.sqrt(np.pi)/2:.10f}")

# Double integral: int_0^1 int_0^1 x*y dx dy = 0.25
def f2d(y, x):
    return x * y

result_2d, _ = integrate.dblquad(f2d, 0, 1, 0, 1)
print(f"\nDouble integral: {result_2d:.10f} (exact: 0.25)")

4. Numerical ODE Solvers

4.1 Euler's Method

Idea: Predict the next point using the current slope

$y_{n+1} = y_n + hf(x_n, y_n)$

Local truncation error: $O(h^2)$ , global error: $O(h)$ (first-order method)

Geometric interpretation: Move $h$ in the direction of the tangent at each point

import numpy as np
import matplotlib.pyplot as plt

def euler_method(f, x0, y0, h, x_end):
    """Euler's method"""
    x_vals = [x0]
    y_vals = [y0]
    x = x0
    y = y0

    while x < x_end - 1e-10:
        h_actual = min(h, x_end - x)
        y = y + h_actual * f(x, y)
        x = x + h_actual
        x_vals.append(x)
        y_vals.append(y)

    return np.array(x_vals), np.array(y_vals)

4.2 Improved Euler / Heun's Method

Idea: Predict then correct

$k_1 = f(x_n, y_n)$ $\tilde{y}_{n+1} = y_n + hk_1 \quad \text{(predict)}$ $k_2 = f(x_n + h, \tilde{y}_{n+1})$ $y_{n+1} = y_n + \frac{h}{2}(k_1 + k_2) \quad \text{(correct)}$

Global error: $O(h^2)$ — much more accurate than Euler's method.

4.3 Fourth-Order Runge-Kutta (RK4)

The most widely used ODE numerical solver in engineering.

$k_1 = hf(x_n, y_n)$ $k_2 = hf\!\left(x_n + \frac{h}{2},\, y_n + \frac{k_1}{2}\right)$ $k_3 = hf\!\left(x_n + \frac{h}{2},\, y_n + \frac{k_2}{2}\right)$ $k_4 = hf(x_n + h,\, y_n + k_3)$ $y_{n+1} = y_n + \frac{1}{6}(k_1 + 2k_2 + 2k_3 + k_4)$

Intuition: Weighted average of four slopes within the interval.

$k_1$ : slope at the start
$k_2$ , $k_3$ : slopes at the midpoint (two estimates, using $k_1$ and $k_2$ )
$k_4$ : slope at the end

Global error: $O(h^4)$ — error proportional to $h^4$

Comparison: halving $h$

Euler: error halved
RK4: error reduced by factor of 1/16

import numpy as np
import matplotlib.pyplot as plt
from scipy.integrate import solve_ivp

def rk4_method(f, x0, y0, h, x_end):
    """Fourth-order Runge-Kutta method"""
    x_vals = [x0]
    y_vals = [y0]
    x = x0
    y = y0

    while x < x_end - 1e-10:
        h_actual = min(h, x_end - x)
        k1 = h_actual * f(x, y)
        k2 = h_actual * f(x + h_actual/2, y + k1/2)
        k3 = h_actual * f(x + h_actual/2, y + k2/2)
        k4 = h_actual * f(x + h_actual, y + k3)
        y = y + (k1 + 2*k2 + 2*k3 + k4) / 6
        x = x + h_actual
        x_vals.append(x)
        y_vals.append(y)

    return np.array(x_vals), np.array(y_vals)

# Complete RLC circuit simulation - comparing methods
R, L, C = 50, 0.2, 100e-6
Vs = 100

def rlc_ode(t, state):
    q, dq = state
    d2q = (Vs - R * dq - q/C) / L
    return [dq, d2q]

def rlc_scalar(t, y):
    """Coupled ODE as a single function"""
    return rlc_ode(t, y)

# Initial conditions
y0 = [0.0, 0.0]
t_end = 0.02
h_values = [0.001, 0.0005, 0.0001]

# scipy solve_ivp (reference solution)
t_ref = np.linspace(0, t_end, 5000)
sol_ref = solve_ivp(rlc_ode, [0, t_end], y0,
                    t_eval=t_ref, method='RK45',
                    rtol=1e-10, atol=1e-12)

v_ref = sol_ref.y[0] / C  # capacitor voltage

# Calculate with RK4
def rlc_2nd_order(t, y):
    """Coupled first-order system"""
    q, dq = y[0], y[1]
    return [dq, (Vs - R*dq - q/C) / L]

fig, axes = plt.subplots(2, 1, figsize=(12, 10))

axes[0].plot(sol_ref.t * 1000, v_ref, 'k-', linewidth=2.5, label='Reference (RK45 high precision)')

for h in h_values:
    n_steps = int(t_end / h)
    t_rk4 = np.linspace(0, t_end, n_steps + 1)

    # Manual RK4
    y = np.array(y0, dtype=float)
    y_hist = [y.copy()]
    for i in range(n_steps):
        t_i = t_rk4[i]
        k1 = h * np.array(rlc_2nd_order(t_i, y))
        k2 = h * np.array(rlc_2nd_order(t_i + h/2, y + k1/2))
        k3 = h * np.array(rlc_2nd_order(t_i + h/2, y + k2/2))
        k4 = h * np.array(rlc_2nd_order(t_i + h, y + k3))
        y = y + (k1 + 2*k2 + 2*k3 + k4) / 6
        y_hist.append(y.copy())

    y_arr = np.array(y_hist)
    v_rk4 = y_arr[:, 0] / C
    axes[0].plot(t_rk4 * 1000, v_rk4, '--',
                 label=f'RK4 (h={h*1000:.1f}ms)', alpha=0.8)

axes[0].set_xlabel('Time (ms)')
axes[0].set_ylabel('Capacitor Voltage (V)')
axes[0].set_title('RLC Series Circuit - RK4 Numerical Solution')
axes[0].legend()
axes[0].grid(True, alpha=0.3)
axes[0].axhline(y=Vs, color='gray', linestyle=':', alpha=0.7, label='Steady state')

# Current plot
i_ref = sol_ref.y[1]
axes[1].plot(sol_ref.t * 1000, i_ref * 1000, 'k-', linewidth=2, label='Current i(t)')
axes[1].set_xlabel('Time (ms)')
axes[1].set_ylabel('Current (mA)')
axes[1].set_title('RLC Circuit Current')
axes[1].legend()
axes[1].grid(True, alpha=0.3)

plt.tight_layout()
plt.savefig('rlc_numerical.png', dpi=150)
plt.show()

# Check damping characteristics
omega_0 = 1 / np.sqrt(L * C)
alpha = R / (2 * L)
omega_d = np.sqrt(max(omega_0**2 - alpha**2, 0))
print(f"Resonant frequency: {omega_0/(2*np.pi):.1f} Hz")
print(f"Damping coefficient: {alpha:.1f} s^-1")
if omega_0 > alpha:
    print(f"Underdamped, damped oscillation frequency: {omega_d/(2*np.pi):.1f} Hz")
elif omega_0 == alpha:
    print("Critically damped")
else:
    print("Overdamped")

4.4 Stiff ODEs and Adaptive Step Size

Stiff ODE: Coexistence of modes with very different time constants.

Example: A system with both fast RC transient response and slow temperature change

A small $h$ must be used, but in practice the slow component is what matters.

from scipy.integrate import solve_ivp
import numpy as np

# Stiff equation example: Robertson chemical reactions
def robertson(t, y):
    """
    dy0/dt = -0.04*y0 + 1e4*y1*y2
    dy1/dt = 0.04*y0 - 1e4*y1*y2 - 3e7*y1^2
    dy2/dt = 3e7*y1^2
    """
    dy0 = -0.04*y[0] + 1e4*y[1]*y[2]
    dy1 = 0.04*y[0] - 1e4*y[1]*y[2] - 3e7*y[1]**2
    dy2 = 3e7*y[1]**2
    return [dy0, dy1, dy2]

y0 = [1, 0, 0]
t_span = (0, 1e11)
t_eval = np.logspace(-6, 11, 1000)

# Non-stiff method (RK45): too slow
# Stiff method (Radau, BDF): efficient

sol = solve_ivp(robertson, t_span, y0,
                method='Radau',  # stiff method
                t_eval=t_eval,
                rtol=1e-6, atol=1e-9)

print(f"Function evaluations: {sol.nfev}")
print(f"Jacobian evaluations: {sol.njev}")

5. Numerical Solution of Linear Systems

5.1 Gaussian Elimination

Forward elimination: Create an upper triangular matrix using row operations

Back substitution: Solve the upper triangular system

Operation count: $O(n^3)$

Partial Pivoting: Use the element with the largest absolute value in each column as the pivot to improve numerical stability.

import numpy as np

def gaussian_elimination_pivoting(A, b):
    """Gaussian elimination with partial pivoting"""
    n = len(b)
    A = np.array(A, dtype=float)
    b = np.array(b, dtype=float)

    # Forward elimination
    for i in range(n):
        # Partial pivoting: find the largest element in column i
        max_row = i + np.argmax(abs(A[i:, i]))
        A[[i, max_row]] = A[[max_row, i]]
        b[[i, max_row]] = b[[max_row, i]]

        if abs(A[i, i]) < 1e-15:
            raise ValueError("Singular matrix!")

        for j in range(i+1, n):
            factor = A[j, i] / A[i, i]
            A[j, i:] -= factor * A[i, i:]
            b[j] -= factor * b[i]

    # Back substitution
    x = np.zeros(n)
    for i in range(n-1, -1, -1):
        x[i] = (b[i] - np.dot(A[i, i+1:], x[i+1:])) / A[i, i]

    return x

# Example: 3-loop circuit network (KVL)
A = np.array([
    [10, -3, -1],
    [-3, 12, -2],
    [-1, -2,  8]
], dtype=float)
b = np.array([100, 50, -30], dtype=float)

x_gauss = gaussian_elimination_pivoting(A, b)
x_numpy = np.linalg.solve(A, b)

print("Circuit loop currents (Gaussian elimination):")
for i, v in enumerate(x_gauss):
    print(f"  I{i+1} = {v:.6f} A")

print(f"\nError vs numpy result: {np.max(abs(x_gauss - x_numpy)):.2e}")

5.2 LU Decomposition

$A = LU$

$L$ : lower triangular matrix, $U$ : upper triangular matrix

Advantage: Efficiently solve multiple right-hand side vectors $b$ for the same $A$ .

$Ax = b \to Ly = b$ (forward substitution), $Ux = y$ (back substitution)

from scipy.linalg import lu, lu_factor, lu_solve
import numpy as np

A = np.array([[2, 1, -1], [-3, -1, 2], [-2, 1, 2]], dtype=float)
b = np.array([8, -11, -3], dtype=float)

# LU decomposition
P, L, U = lu(A)
print("L matrix:")
print(np.round(L, 4))
print("U matrix:")
print(np.round(U, 4))

# Solve using LU decomposition
lu_and_piv = lu_factor(A)
x = lu_solve(lu_and_piv, b)
print(f"\nSolution: {x}")

# Efficiently solve for multiple right-hand side vectors
for b_new in [b, b+1, b*2]:
    x_new = lu_solve(lu_and_piv, b_new)  # reuse L, U

5.3 Iterative Methods

More efficient than direct methods for large sparse matrices.

Jacobi Method: $x_i^{(k+1)} = \frac{1}{a_{ii}}\left(b_i - \sum_{j \neq i}a_{ij}x_j^{(k)}\right)$

Updates all variables simultaneously.

Gauss-Seidel Method: $x_i^{(k+1)} = \frac{1}{a_{ii}}\left(b_i - \sum_{j < i}a_{ij}x_j^{(k+1)} - \sum_{j > i}a_{ij}x_j^{(k)}\right)$

Uses updated values immediately — converges faster than Jacobi.

Convergence condition: Diagonally dominant matrix

$|a_{ii}| > \sum_{j \neq i}|a_{ij}| \quad \forall i$

import numpy as np

def gauss_seidel(A, b, x0=None, tol=1e-10, max_iter=1000):
    """Gauss-Seidel iterative method"""
    n = len(b)
    x = np.zeros(n) if x0 is None else x0.copy()

    for iteration in range(max_iter):
        x_old = x.copy()
        for i in range(n):
            sum1 = np.dot(A[i, :i], x[:i])
            sum2 = np.dot(A[i, i+1:], x_old[i+1:])
            x[i] = (b[i] - sum1 - sum2) / A[i, i]

        residual = np.linalg.norm(x - x_old)
        if residual < tol:
            print(f"Converged in {iteration+1} iterations, residual: {residual:.2e}")
            return x

    print(f"Stopped at max iterations {max_iter}, residual: {residual:.2e}")
    return x

# Diagonally dominant matrix example (nodal voltage analysis)
A = np.array([
    [ 4, -1, -1,  0],
    [-1,  4,  0, -1],
    [-1,  0,  4, -1],
    [ 0, -1, -1,  4]
], dtype=float)
b = np.array([5, 10, 15, 20], dtype=float)

x_gs = gauss_seidel(A, b)
x_exact = np.linalg.solve(A, b)
print(f"Gauss-Seidel: {x_gs}")
print(f"Exact solution: {x_exact}")
print(f"Max error: {np.max(abs(x_gs - x_exact)):.2e}")

5.4 Using numpy.linalg

import numpy as np

A = np.array([[3, -0.1, -0.2],
              [0.1, 7, -0.3],
              [0.3, -0.2, 10]], dtype=float)
b = np.array([7.85, -19.3, 71.4], dtype=float)

# Direct solve
x = np.linalg.solve(A, b)
print(f"Solution: {x}")

# Eigenvalues
eigenvalues, eigenvectors = np.linalg.eig(A)
print(f"\nEigenvalues: {eigenvalues}")

# Matrix inverse (not recommended - large numerical error)
A_inv = np.linalg.inv(A)
print(f"Inverse verification (max of A*A_inv - I): {np.max(abs(A @ A_inv - np.eye(3))):.2e}")

# Condition number and rank
print(f"Condition number: {np.linalg.cond(A):.2f}")
print(f"Rank: {np.linalg.matrix_rank(A)}")

# Least squares solution (overdetermined system)
A_over = np.vstack([A, [1, 1, 1]])
b_over = np.append(b, 10)
x_ls, residuals, rank, sv = np.linalg.lstsq(A_over, b_over, rcond=None)
print(f"\nLeast squares solution: {x_ls}")

6. Interpolation and Approximation

6.1 Lagrange Interpolation

Degree- $n$ polynomial passing through $n+1$ points $(x_0, y_0), \ldots, (x_n, y_n)$ :

$P_n(x) = \sum_{k=0}^n y_k L_k(x)$

Lagrange basis polynomials: $L_k(x) = \prod_{j=0, j\neq k}^n \frac{x - x_j}{x_k - x_j}$

Runge's Phenomenon: High-degree interpolation with equally spaced points causes oscillations near the boundaries.

Solution: Use Chebyshev nodes

$x_k = \cos\!\left(\frac{(2k+1)\pi}{2(n+1)}\right), \quad k = 0, 1, \ldots, n$

6.2 Spline Interpolation

Uses low-degree polynomials on each subinterval. Cubic splines are most common.

Function value, first derivative, and second derivative are all continuous across adjacent intervals — producing a smooth curve.

import numpy as np
import matplotlib.pyplot as plt
from scipy.interpolate import lagrange, CubicSpline, interp1d

# Measurement data (temperature vs resistance)
x_data = np.array([0, 20, 40, 60, 80, 100], dtype=float)  # Celsius
y_data = np.array([100, 108, 116, 125, 134, 143], dtype=float)  # Ohm

x_fine = np.linspace(0, 100, 300)

# Lagrange interpolation
poly = lagrange(x_data, y_data)
y_lagrange = poly(x_fine)

# Cubic spline
cs = CubicSpline(x_data, y_data)
y_spline = cs(x_fine)

# Linear interpolation
f_linear = interp1d(x_data, y_data)
y_linear = f_linear(x_fine)

plt.figure(figsize=(10, 6))
plt.scatter(x_data, y_data, s=100, zorder=5, label='Measured data', color='black')
plt.plot(x_fine, y_linear, 'g--', linewidth=1.5, label='Linear interpolation')
plt.plot(x_fine, y_lagrange, 'r-', linewidth=2, label='Lagrange interpolation')
plt.plot(x_fine, y_spline, 'b-', linewidth=2, label='Cubic spline')
plt.xlabel('Temperature (C)')
plt.ylabel('Resistance (Ohm)')
plt.title('Temperature-Resistance Interpolation Comparison')
plt.legend()
plt.grid(True, alpha=0.3)
plt.tight_layout()
plt.savefig('interpolation.png', dpi=150)
plt.show()

# Predict resistance at 50 degrees
print(f"Resistance prediction at 50 degrees:")
print(f"  Linear: {f_linear(50):.2f} Ohm")
print(f"  Spline: {cs(50):.2f} Ohm")

6.3 Least Squares Method

Best approximation of $m$ data points with a degree- $n$ polynomial ( $m > n+1$ ):

$\min \sum_{i=1}^m [y_i - p(x_i)]^2$

Normal Equations: $A^T A \mathbf{c} = A^T \mathbf{y}$

where $A$ is the Vandermonde matrix.

import numpy as np
import matplotlib.pyplot as plt
from scipy.optimize import curve_fit

# Noisy exponential decay data
np.random.seed(42)
t_data = np.linspace(0, 5, 30)
y_true = 3 * np.exp(-0.7 * t_data)
y_noisy = y_true + 0.2 * np.random.randn(len(t_data))

# Polynomial least squares fitting
degrees = [1, 2, 3, 5]
t_fine = np.linspace(0, 5, 200)

plt.figure(figsize=(12, 8))
plt.scatter(t_data, y_noisy, s=50, color='black', zorder=5, label='Measured data')
plt.plot(t_fine, 3*np.exp(-0.7*t_fine), 'k-', linewidth=2, label='True signal')

for deg in degrees:
    coeffs = np.polyfit(t_data, y_noisy, deg)
    y_fit = np.polyval(coeffs, t_fine)
    plt.plot(t_fine, y_fit, '--', linewidth=1.5, label=f'Degree {deg} polynomial')

plt.ylim([-1, 4])
plt.xlabel('Time (s)')
plt.ylabel('Amplitude')
plt.title('Least Squares Polynomial Fitting')
plt.legend()
plt.grid(True, alpha=0.3)
plt.tight_layout()
plt.savefig('polynomial_fit.png', dpi=150)
plt.show()

# Nonlinear least squares: exponential function fitting
def exp_model(t, A, k):
    return A * np.exp(-k * t)

popt, pcov = curve_fit(exp_model, t_data, y_noisy, p0=[3, 0.5])
A_fit, k_fit = popt
A_std, k_std = np.sqrt(np.diag(pcov))

print(f"Exponential fitting results:")
print(f"  A = {A_fit:.4f} +/- {A_std:.4f} (true: 3)")
print(f"  k = {k_fit:.4f} +/- {k_std:.4f} (true: 0.7)")

# Residuals and R-squared
y_pred = exp_model(t_data, *popt)
ss_res = np.sum((y_noisy - y_pred)**2)
ss_tot = np.sum((y_noisy - np.mean(y_noisy))**2)
r_squared = 1 - ss_res / ss_tot
print(f"  R^2 = {r_squared:.6f}")

7. Numerical Eigenvalue Problems

7.1 Power Method

Finds the largest magnitude eigenvalue $\lambda_1$ and corresponding eigenvector $\mathbf{v}_1$ of matrix $A$ .

Algorithm:

Choose initial vector $\mathbf{q}_0$
$\mathbf{z}_{k+1} = A\mathbf{q}_k$
$\mathbf{q}_{k+1} = \mathbf{z}_{k+1}/\|\mathbf{z}_{k+1}\|$
$\lambda_{k+1} = \mathbf{q}_{k+1}^T A \mathbf{q}_{k+1}$ (Rayleigh quotient)
Repeat until convergence

Convergence rate: $|\lambda_2/\lambda_1|^k$ (power of eigenvalue ratio)

import numpy as np

def power_method(A, tol=1e-10, max_iter=1000):
    """Power method (maximum eigenvalue)"""
    n = A.shape[0]
    q = np.ones(n) / np.sqrt(n)  # initial vector

    for k in range(max_iter):
        z = A @ q
        eigenvalue = np.dot(q, z)  # Rayleigh quotient
        q_new = z / np.linalg.norm(z)

        if np.linalg.norm(q_new - q) < tol or np.linalg.norm(q_new + q) < tol:
            print(f"Converged in {k+1} iterations")
            return eigenvalue, q_new

        q = q_new

    return eigenvalue, q

# Natural frequency analysis of a vibration system
# Stiffness matrix of a double-pendulum mass-spring system
k = 1000  # N/m
m = 1     # kg

K = np.array([[2*k, -k], [-k, k]], dtype=float)
M = np.array([[m, 0], [0, m]], dtype=float)

# Generalized eigenvalue problem: K*v = lambda*M*v
# Transform: M^(-1/2) * K * M^(-1/2) * u = lambda * u
M_inv_sqrt = np.diag(1/np.sqrt(np.diag(M)))
A_sym = M_inv_sqrt @ K @ M_inv_sqrt

lam_max, v_max = power_method(A_sym)
omega_max = np.sqrt(lam_max)

# All eigenvalues (numpy)
eigenvalues_all = np.linalg.eigvalsh(A_sym)
omega_all = np.sqrt(eigenvalues_all)

print("Vibration system natural frequencies:")
for i, omega in enumerate(omega_all):
    print(f"  Mode {i+1}: {omega/(2*np.pi):.3f} Hz")

print(f"\nPower method maximum natural frequency: {omega_max/(2*np.pi):.3f} Hz")

7.2 QR Algorithm

Finds all eigenvalues simultaneously. The core algorithm of modern numerical linear algebra.

Idea: Iterative QR decomposition; the matrix converges to upper triangular form, with the diagonal elements becoming eigenvalues.

import numpy as np

def qr_algorithm(A, max_iter=1000, tol=1e-10):
    """Basic QR algorithm"""
    A_k = A.copy().astype(float)
    n = A.shape[0]

    for k in range(max_iter):
        Q, R = np.linalg.qr(A_k)
        A_k_new = R @ Q

        # Check magnitudes of subdiagonal elements
        off_diag_norm = np.max(np.abs(np.tril(A_k_new, -1)))
        if off_diag_norm < tol:
            print(f"QR converged in {k+1} iterations")
            break

        A_k = A_k_new

    return np.diag(A_k)

# Symmetric matrix (real eigenvalues)
A = np.array([
    [4, 1, 0],
    [1, 3, 1],
    [0, 1, 2]
], dtype=float)

eig_qr = qr_algorithm(A)
eig_numpy = np.linalg.eigvalsh(A)

print("QR algorithm eigenvalues:", np.sort(eig_qr))
print("numpy eigenvalues:       ", np.sort(eig_numpy))

7.3 Complete Eigenvalue Analysis: Structural Resonance Analysis

import numpy as np
import matplotlib.pyplot as plt
from scipy import linalg

def vibration_analysis(K, M, n_modes=None):
    """
    Generalized eigenvalue problem: K*v = omega^2 * M * v
    K: stiffness matrix, M: mass matrix
    """
    # Solve generalized eigenvalue problem with scipy linalg.eigh
    eigenvalues, eigenvectors = linalg.eigh(K, M)

    omega = np.sqrt(np.abs(eigenvalues))  # angular frequency (rad/s)
    freq = omega / (2 * np.pi)           # frequency (Hz)

    if n_modes is None:
        n_modes = len(eigenvalues)

    print("Natural Frequency Analysis Results:")
    print("-" * 40)
    for i in range(n_modes):
        print(f"Mode {i+1}: {freq[i]:.3f} Hz ({omega[i]:.2f} rad/s)")

    return omega[:n_modes], eigenvectors[:, :n_modes]

# 3-DOF vibration system
k1, k2, k3 = 1000, 1500, 800   # N/m
m1, m2, m3 = 1.0, 1.5, 0.8    # kg

K = np.array([
    [k1+k2,   -k2,    0],
    [-k2,   k2+k3,  -k3],
    [0,       -k3,   k3]
], dtype=float)

M = np.diag([m1, m2, m3])

omegas, modes = vibration_analysis(K, M)

# Visualize mode shapes
positions = np.arange(1, 4)
fig, axes = plt.subplots(1, 3, figsize=(14, 5))

for i, ax in enumerate(axes):
    mode_shape = modes[:, i] / np.max(abs(modes[:, i]))
    ax.plot([0] + list(positions), [0] + list(mode_shape),
            'bo-', linewidth=2, markersize=10)
    ax.axhline(y=0, color='gray', linestyle='--', alpha=0.5)
    ax.set_title(f'Mode {i+1}\n{omegas[i]/(2*np.pi):.2f} Hz')
    ax.set_xlabel('DOF')
    ax.set_ylabel('Normalized mode shape')
    ax.grid(True, alpha=0.3)
    ax.set_xticks([0, 1, 2, 3])
    ax.set_xticklabels(['Fixed', 'Mass 1', 'Mass 2', 'Mass 3'])

plt.suptitle('3-DOF Vibration System Mode Shapes')
plt.tight_layout()
plt.savefig('vibration_modes.png', dpi=150)
plt.show()

8. Comprehensive Example: MOSFET Bias Circuit Numerical Analysis

import numpy as np
from scipy.optimize import fsolve
from scipy.integrate import solve_ivp
import matplotlib.pyplot as plt

# MOSFET simplified model parameters
Vth = 1.5   # threshold voltage (V)
k = 2e-3    # transconductance parameter (A/V^2)
VDD = 15    # supply voltage (V)
RD = 3000   # drain resistance (Ohm)
RS = 1000   # source resistance (Ohm)
R1 = 100e3  # voltage divider resistor 1 (Ohm)
R2 = 47e3   # voltage divider resistor 2 (Ohm)

def mosfet_bias_equations(vars):
    """MOSFET bias nonlinear equation system"""
    VGS, ID = vars

    # Gate voltage from voltage divider
    VG = VDD * R2 / (R1 + R2)

    # Drain current (saturation region)
    if VGS > Vth:
        ID_calc = k / 2 * (VGS - Vth)**2
    else:
        ID_calc = 0

    # KVL: VGS = VG - VS = VG - ID*RS
    VGS_calc = VG - ID * RS

    return [VGS - VGS_calc, ID - ID_calc]

VG = VDD * R2 / (R1 + R2)
initial_guess = [VG - 1, 1e-3]

solution = fsolve(mosfet_bias_equations, initial_guess, full_output=True)
VGS_dc, ID_dc = solution[0]
VDS_dc = VDD - ID_dc * (RD + RS)

print("MOSFET DC bias operating point:")
print(f"  VGS = {VGS_dc:.3f} V")
print(f"  ID  = {ID_dc*1000:.3f} mA")
print(f"  VDS = {VDS_dc:.3f} V")
print(f"  Saturation check: VDS > VGS - Vth = {VGS_dc - Vth:.3f} V -> {'OK' if VDS_dc > VGS_dc - Vth else 'Linear region'}")

# Transfer characteristic curve
VGS_range = np.linspace(0, 5, 200)
ID_range = np.where(VGS_range > Vth,
                    k/2 * (VGS_range - Vth)**2,
                    0)

plt.figure(figsize=(10, 5))
plt.subplot(1, 2, 1)
plt.plot(VGS_range, ID_range * 1000, 'b-', linewidth=2)
plt.axvline(x=VGS_dc, color='r', linestyle='--', label=f'Q-point VGS={VGS_dc:.2f}V')
plt.axhline(y=ID_dc*1000, color='g', linestyle='--', label=f'Q-point ID={ID_dc*1000:.2f}mA')
plt.xlabel('VGS (V)')
plt.ylabel('ID (mA)')
plt.title('MOSFET Transfer Characteristics')
plt.legend()
plt.grid(True, alpha=0.3)

plt.subplot(1, 2, 2)
VDS_range = np.linspace(0, VDD, 200)

for VGS_val in [2.0, 2.5, 3.0, 3.5]:
    if VGS_val <= Vth:
        continue
    ID_sat = k/2 * (VGS_val - Vth)**2
    ID_curve = np.where(
        VDS_range < VGS_val - Vth,
        k * ((VGS_val - Vth)*VDS_range - VDS_range**2/2),
        ID_sat
    )
    plt.plot(VDS_range, ID_curve * 1000,
             label=f'VGS={VGS_val}V')

# Load line
ID_load = (VDD - VDS_range) / (RD + RS)
plt.plot(VDS_range, ID_load * 1000, 'k--', linewidth=2, label='Load line')
plt.axvline(x=VDS_dc, color='r', linestyle=':', alpha=0.7)
plt.xlabel('VDS (V)')
plt.ylabel('ID (mA)')
plt.title('MOSFET Output Characteristics and Load Line')
plt.legend(fontsize=8)
plt.ylim([0, None])
plt.grid(True, alpha=0.3)

plt.tight_layout()
plt.savefig('mosfet_analysis.png', dpi=150)
plt.show()

Summary and Next Steps

Topics covered in Part 4:

Error Analysis: Round-off/truncation error, absolute/relative error, numerical stability, condition number
Nonlinear equations: Bisection (1st order), Newton-Raphson (2nd order), Secant (1.618th order), scipy.optimize
Numerical differentiation: Forward/backward/central differences, error analysis, optimal step size
Numerical integration: Trapezoidal, Simpson's, Gaussian quadrature, scipy.integrate
ODE numerical solvers: Euler's method, Heun's method, full RK4 implementation, stiff ODEs
Linear systems: Gaussian elimination, LU decomposition, iterative methods, numpy.linalg
Interpolation: Lagrange, splines, Runge's phenomenon, least squares, nonlinear curve fitting
Eigenvalue problems: Power method, QR algorithm, vibration analysis

Through the four-part engineering mathematics series covering ODEs, Fourier analysis, PDEs, complex analysis, Z-transform, and numerical methods, we have thoroughly mastered the core mathematical tools of electrical, electronic, and mechanical engineering.

9. Numerical Methods in AI/ML: Automatic Differentiation

9.1 Numerical vs Automatic Differentiation

Numerical differentiation approximates derivatives with finite differences; automatic differentiation (autograd) computes exact derivatives by mechanically applying the chain rule through a computation graph.

import numpy as np

def numerical_grad(f, x, h=1e-5):
    """Central difference gradient (vector input)."""
    grad = np.zeros_like(x, dtype=float)
    for i in range(len(x)):
        x_plus  = x.copy().astype(float); x_plus[i]  += h
        x_minus = x.copy().astype(float); x_minus[i] -= h
        grad[i] = (f(x_plus) - f(x_minus)) / (2 * h)
    return grad

# f(x, y) = x^2 + 3*x*y + y^2,  true gradient: [2x+3y, 3x+2y]
def f_quad(x):
    return x[0]**2 + 3*x[0]*x[1] + x[1]**2

x0 = np.array([1.0, 2.0])
grad_num  = numerical_grad(f_quad, x0)
grad_true = np.array([2*x0[0] + 3*x0[1], 3*x0[0] + 2*x0[1]])

print("Gradient at (1, 2):")
print(f"  Numerical:  {grad_num}")
print(f"  Analytical: {grad_true}")
print(f"  Max error:  {np.max(abs(grad_num - grad_true)):.2e}")

Key differences:

Property	Numerical diff.	Automatic diff. (AD)
Accuracy	$O(h^2)$ , limited by floating-point	Machine precision
Cost (gradient)	$O(n)$ forward passes for $n$ params	$O(1)$ backward pass (reverse mode)
Step-size tuning	Required	Not needed
Use in ML	Finite-difference gradient checks	Backpropagation in PyTorch / JAX

9.2 Gradient Descent Variants

Numerical optimization drives ML model training. The mathematics behind each optimizer directly determines convergence behavior.

import numpy as np
import matplotlib.pyplot as plt

# Rosenbrock function: f(x,y) = (1-x)^2 + 100*(y-x^2)^2
def rosenbrock(w):
    x, y = w[0], w[1]
    return (1 - x)**2 + 100*(y - x**2)**2

def rosenbrock_grad(w):
    x, y = w[0], w[1]
    gx = -2*(1 - x) - 400*x*(y - x**2)
    gy = 200*(y - x**2)
    return np.array([gx, gy])

w0 = np.array([-1.5, 1.5])
lr = 0.001
n_iter = 3000

# Standard SGD
w_sgd = w0.copy()
history_sgd = [w_sgd.copy()]
for _ in range(n_iter):
    w_sgd -= lr * rosenbrock_grad(w_sgd)
    history_sgd.append(w_sgd.copy())

# Momentum SGD
w_mom = w0.copy()
v = np.zeros(2)
beta = 0.9
history_mom = [w_mom.copy()]
for _ in range(n_iter):
    v = beta * v - lr * rosenbrock_grad(w_mom)
    w_mom += v
    history_mom.append(w_mom.copy())

# Adam
w_adam = w0.copy()
m_adam, v_adam = np.zeros(2), np.zeros(2)
beta1, beta2, eps = 0.9, 0.999, 1e-8
history_adam = [w_adam.copy()]
for t in range(1, n_iter + 1):
    g = rosenbrock_grad(w_adam)
    m_adam = beta1 * m_adam + (1 - beta1) * g
    v_adam = beta2 * v_adam + (1 - beta2) * g**2
    m_hat = m_adam / (1 - beta1**t)
    v_hat = v_adam / (1 - beta2**t)
    w_adam -= lr * m_hat / (np.sqrt(v_hat) + eps)
    history_adam.append(w_adam.copy())

loss_sgd  = [rosenbrock(w) for w in history_sgd]
loss_mom  = [rosenbrock(w) for w in history_mom]
loss_adam = [rosenbrock(w) for w in history_adam]

plt.figure(figsize=(10, 5))
plt.semilogy(loss_sgd,  label='SGD')
plt.semilogy(loss_mom,  label='Momentum')
plt.semilogy(loss_adam, label='Adam')
plt.xlabel('Iteration')
plt.ylabel('Loss (log scale)')
plt.title('Optimizer Convergence on Rosenbrock Function')
plt.legend()
plt.grid(True, alpha=0.3)
plt.tight_layout()
plt.savefig('optimizer_comparison.png', dpi=150)
plt.show()

print(f"Final loss — SGD: {loss_sgd[-1]:.4f}, Momentum: {loss_mom[-1]:.4f}, Adam: {loss_adam[-1]:.6f}")

Quiz

Q1. When does Newton-Raphson fail to converge?

Answer: Newton-Raphson fails in several situations:

Derivative is zero: If $f'(x_n) = 0$ at any iterate, the update is undefined (division by zero).
Poor initial guess: If $x_0$ is far from the root, the iteration may cycle or diverge.
Multiple roots: Near a root where $f'(x^*) = 0$ , convergence degrades from quadratic to linear.
Flat regions / local extrema: Near a local minimum, the tangent line intersects the x-axis far away.

Explanation: The quadratic convergence proof requires $f'(x^*) \neq 0$ and a sufficiently close starting point. In practice, bracket a root with bisection first, then switch to Newton-Raphson for fast final convergence.

Q2. Why is LU decomposition preferred over Gaussian elimination for multiple right-hand-side systems?

Answer: LU decomposition separates the expensive $O(n^3)$ factorization from the cheap $O(n^2)$ solve.

Explanation: Gaussian elimination costs $O(n^3)$ per system. With $k$ different right-hand side vectors $b_1, \ldots, b_k$ and the same matrix $A$ , the total cost is $O(kn^3)$ .

LU factorization $A = LU$ is performed once in $O(n^3)$ ; then each new right-hand side requires only two triangular solves ( $Ly = b$ , $Ux = y$ ) at $O(n^2)$ each. Total: $O(n^3) + k \cdot O(n^2)$ — dramatically cheaper when $k$ is large (e.g., finite element systems with multiple load cases).

Q3. What is the difference between Lagrange and spline interpolation for noisy data?

Answer: Spline interpolation is strongly preferred for noisy data.

Explanation: Lagrange interpolation fits a single degree- $n$ polynomial through all $n+1$ data points exactly. For noisy data this forces the polynomial to pass through corrupted measurements, producing Runge's phenomenon oscillations that grow severe at high degrees.

Cubic splines use degree-3 polynomials on each sub-interval with smoothness conditions (matching first and second derivatives at knots). This avoids oscillations and produces a physically sensible smooth curve. For truly noisy data, least-squares fitting (not interpolation) is best: fit a smooth model that minimizes residuals rather than passing through each point exactly.

Q4. Why does Runge-Kutta 4 have 4th-order accuracy?

Answer: RK4 evaluates $f$ at four strategically chosen points and combines them with weights that match the Taylor expansion through the $h^4$ term.

Explanation: Euler's method truncates the Taylor series after the first term, giving $O(h)$ global error. RK4 computes four slopes:

$k_1$ : slope at the start
$k_2$ , $k_3$ : slopes at midpoints
$k_4$ : slope at the end

The weighted combination $\frac{1}{6}(k_1 + 2k_2 + 2k_3 + k_4)$ exactly matches the Taylor expansion of $y(x_{n+1})$ through order $h^4$ . The local error is $O(h^5)$ and the global error is $O(h^4)$ . Halving the step size reduces global error by $2^4 = 16$ .

Q5. How does automatic differentiation (autograd) differ from numerical differentiation?

Answer: Automatic differentiation computes exact derivatives (to machine precision) by applying the chain rule through a computation graph; numerical differentiation approximates derivatives with finite differences and suffers from a truncation/round-off trade-off.

Explanation:

Numerical differentiation: $f'(x) \approx [f(x+h) - f(x)] / h$ . Optimal $h \approx \sqrt{\epsilon_{\text{machine}}} \approx 10^{-8}$ gives roughly 8 correct digits. Accuracy is fundamentally limited.
Reverse-mode AD (backpropagation): Decomposes the function into elementary operations and applies the chain rule exactly at each step. Computes gradients with respect to all parameters in a single backward pass at roughly the same cost as one forward pass — essential for neural networks with millions of parameters.
Key practical difference: AD gives machine-precision gradients with no step-size tuning, which is why PyTorch, JAX, and TensorFlow are built on it.

References

Burden, R. & Faires, J. "Numerical Analysis", 10th Edition, Cengage
Heath, M. "Scientific Computing: An Introductory Survey", 2nd Edition
Press, W. et al. "Numerical Recipes: The Art of Scientific Computing", 3rd Edition
Kreyszig, E. "Advanced Engineering Mathematics", 10th Edition, Wiley
NumPy Linear Algebra Documentation
SciPy Optimization Documentation
SciPy Integration Documentation