- Authors

- Name
- Youngju Kim
- @fjvbn20031
- 1. Introduction to Digital Systems
- 2. Boolean Algebra
- 3. Logic Gates
- 4. Karnaugh Maps
- 5. Combinational Logic Circuits
- 6. Sequential Logic Circuits
- 7. Registers and Counters
- 8. Finite State Machines (FSM)
- 9. Verilog HDL Fundamentals
- 10. FPGA Fundamentals
- 11. Basic CPU Design
- 12. Connecting to AI Hardware
- 13. Learning Roadmap
- Quiz
- References
1. Introduction to Digital Systems
Analog vs Digital
Analog signals carry continuous values while digital signals use only discrete states — 0 and 1. Modern electronics overwhelmingly favors digital systems for several key reasons:
- Noise immunity: Only two levels to distinguish, making the system highly resistant to electrical noise.
- Perfect reproduction: Copying digital data introduces no signal degradation.
- Logical processing: Boolean algebra provides a rigorous mathematical foundation.
- Storage: Flip-flops and memory cells store state reliably and indefinitely.
Number System Conversions
Digital systems regularly use binary (base-2), octal (base-8), and hexadecimal (base-16) representations.
Decimal to Binary — repeated division by 2:
45 ÷ 2 = 22 remainder 1
22 ÷ 2 = 11 remainder 0
11 ÷ 2 = 5 remainder 1
5 ÷ 2 = 2 remainder 1
2 ÷ 2 = 1 remainder 0
1 ÷ 2 = 0 remainder 1
Result: 101101 (binary)
| Decimal | Binary | Octal | Hex |
|---|---|---|---|
| 0 | 0000 | 0 | 0 |
| 5 | 0101 | 5 | 5 |
| 10 | 1010 | 12 | A |
| 15 | 1111 | 17 | F |
| 16 | 10000 | 20 | 10 |
Signed Number Representation
Two's complement is the universal standard for representing negative integers in hardware.
+5 → 0000 0101
Computing -5:
One's complement: 1111 1010
Add 1: 1111 1011 ← -5 in two's complement
Range for an 8-bit two's complement number: -128 to +127.
BCD and Gray Code
- BCD (Binary Coded Decimal): Each decimal digit is encoded as 4 bits. Example: 29 = 0010 1001.
- Gray code: Adjacent codewords differ by exactly one bit, minimizing transition errors in mechanical encoders and analog-to-digital converters.
2. Boolean Algebra
Fundamental Operations
Three primitive operations form the foundation of Boolean algebra:
- AND (logical product): A · B — output is 1 only when both inputs are 1.
- OR (logical sum): A + B — output is 1 when at least one input is 1.
- NOT (complement): A' — inverts the input.
Boolean Laws
Commutative: A + B = B + A, A · B = B · A
Associative: A + (B+C) = (A+B) + C
Distributive: A · (B+C) = AB + AC
Identity: A + 0 = A, A · 1 = A
Complement: A + A' = 1, A · A' = 0
Idempotent: A + A = A, A · A = A
De Morgan's Theorems — essential for NAND/NOR-based implementations:
(A · B)' = A' + B'
(A + B)' = A' · B'
Canonical Forms
- SOP (Sum of Products): Sum of minterms. Example: F = AB' + A'B + AB
- POS (Product of Sums): Product of maxterms. Example: F = (A+B)(A'+C)
Any Boolean function can be expressed in either canonical form, making these forms the starting point for logic minimization.
3. Logic Gates
Truth Tables for All Basic Gates
AND Gate: OR Gate: NOT Gate:
A B A·B A B A+B A A'
0 0 0 0 0 0 0 1
0 1 0 0 1 1 1 0
1 0 0 1 0 1
1 1 1 1 1 1
NAND Gate: NOR Gate:
A B (A·B)' A B (A+B)'
0 0 1 0 0 1
0 1 1 0 1 0
1 0 1 1 0 0
1 1 0 1 1 0
XOR Gate: XNOR Gate:
A B A⊕B A B (A⊕B)'
0 0 0 0 0 1
0 1 1 0 1 0
1 0 1 1 0 0
1 1 0 1 1 1
Universal Gates (NAND and NOR)
NAND alone can implement any Boolean function:
NOT A = A NAND A
A AND B = (A NAND B) NAND (A NAND B)
A OR B = (A NAND A) NAND (B NAND B)
This universality is commercially significant — an entire chip family can be fabricated from a single cell type, simplifying the manufacturing process. NOR gates are equally universal.
CMOS Implementation Overview
Modern digital ICs use CMOS (Complementary Metal-Oxide-Semiconductor) technology:
- NMOS transistor: Turns ON when gate is HIGH (pull-down network).
- PMOS transistor: Turns ON when gate is LOW (pull-up network).
- CMOS inverter: series PMOS + NMOS between VDD and GND.
- Static power consumption is nearly zero, which explains CMOS dominance in low-power design.
4. Karnaugh Maps
The Karnaugh map (K-map) provides a visual, systematic method for minimizing Boolean expressions by identifying groups of adjacent 1s.
4-Variable K-map Example
CD
AB 00 01 11 10
00 [ 1 0 1 1 ]
01 [ 1 1 1 0 ]
11 [ 0 1 1 0 ]
10 [ 0 0 1 1 ]
Grouping Rules
- Group sizes must be powers of 2: 1, 2, 4, 8, or 16.
- Groups must be rectangular (including squares).
- The map wraps around on all edges (toroidal topology).
- Use the fewest, largest possible groups.
- Don't-care conditions (X) may be included in groups to enlarge them.
Prime Implicants
A prime implicant is a group that cannot be further expanded. An essential prime implicant covers at least one minterm not covered by any other prime implicant — it must appear in the minimal SOP expression.
5. Combinational Logic Circuits
A combinational circuit's output depends solely on current inputs — there is no internal state or memory.
Half Adder and Full Adder
Half Adder:
Sum S = A ⊕ B
Carry C = A · B
Full Adder:
Sum S = A ⊕ B ⊕ Cin
Carry Cout = AB + Cin(A ⊕ B)
Cascading full adders creates a Ripple Carry Adder — simple but slow because carry propagates serially. The Carry Lookahead Adder (CLA) computes carries in parallel for high-speed arithmetic.
Multiplexer (MUX)
A 2-to-n-input MUX selects one of 2^n data inputs using n select lines.
4-to-1 MUX (select lines S1, S0):
S1=0, S0=0 → Y = I0
S1=0, S0=1 → Y = I1
S1=1, S0=0 → Y = I2
S1=1, S0=1 → Y = I3
A 2^n-to-1 MUX can implement any n-variable Boolean function directly.
Encoders and Decoders
- Encoder: Converts 2^n input lines to an n-bit binary code (e.g., priority encoder handles multiple simultaneous inputs).
- Decoder: Converts an n-bit code to up to 2^n output lines (e.g., 3-to-8 decoder asserts one of eight outputs).
A decoder combined with OR gates directly implements any sum-of-minterms (SOP) expression.
6. Sequential Logic Circuits
Sequential circuits depend on both current inputs and internal state (memory). They are the building blocks of registers, counters, and processors.
Latches
The SR latch is the simplest storage element, built from cross-coupled NAND or NOR gates.
SR Latch truth table:
S R Q(next)
0 0 Q (no change)
1 0 1 (Set)
0 1 0 (Reset)
1 1 Undefined (forbidden)
The D latch eliminates the forbidden state by tying S = D and R = D'. When Enable is HIGH, Q tracks D; when Enable is LOW, Q retains its value.
Flip-Flops
Flip-flops are edge-triggered — state changes only on the rising or falling clock edge, not during the entire clock period. This strict timing discipline is essential for synchronous design.
D flip-flop: Q(next) = D
JK flip-flop:
J=0, K=0 → Q (hold)
J=1, K=0 → Q = 1 (Set)
J=0, K=1 → Q = 0 (Reset)
J=1, K=1 → Q' (Toggle)
T flip-flop:
T=0 → Q (hold)
T=1 → Q' (Toggle)
The JK flip-flop has no undefined input combination. The T flip-flop is the natural choice for counter design.
7. Registers and Counters
Shift Registers
Shift registers chain multiple flip-flops so data moves (shifts) through them with each clock cycle.
- SISO (Serial In, Serial Out): Data enters and leaves serially.
- SIPO (Serial In, Parallel Out): Serial data is loaded, then all bits are read simultaneously.
- PISO (Parallel In, Serial Out): Parallel data is loaded, then transmitted bit by bit.
- PIPO (Parallel In, Parallel Out): Complete parallel load and parallel read.
Applications: serial-to-parallel conversion, CRC generation, pseudo-random number generators (LFSR), and fast multiply/divide by powers of 2.
Counters
3-bit Asynchronous (Ripple) Counter:
Three T flip-flops chained in series.
0 → 1 → 2 → 3 → 4 → 5 → 6 → 7 → 0 ...
Drawback: ripple delay accumulates with each stage.
3-bit Synchronous Counter:
All flip-flops share a common clock.
All outputs change simultaneously → no ripple delay.
A Modulo-N counter cycles through 0 to N-1. Example: a BCD counter (modulo-10) cycles 0 through 9 and is widely used in digital clocks.
8. Finite State Machines (FSM)
An FSM is the abstract model underlying all sequential logic systems. It consists of a finite set of states, transition conditions, and outputs.
Moore vs Mealy Machines
- Moore machine: Output depends only on the current state. Output labels appear inside state bubbles.
- Mealy machine: Output depends on current state AND inputs. Output labels appear on transitions.
Mealy machines generally require fewer states than equivalent Moore machines.
Vending Machine FSM Example
States: S0 (0 cents), S1 (10 cents), S2 (20 cents), S3 (dispense)
Inputs: D (dime = 10c), Q (quarter = 25c)
Transitions:
S0 --D--> S1
S0 --Q--> S2 (with 5c change output)
S1 --D--> S2
S1 --Q--> S3 (dispense, 5c change)
S2 --D--> S3 (dispense)
S2 --Q--> S3 (dispense, 15c change)
FSM Design Procedure
- Analyze the problem and define all states.
- Draw the state diagram.
- Construct the state table (next-state and output).
- Assign binary codes to states (state encoding).
- Derive next-state and output Boolean equations.
- Minimize with Karnaugh maps.
- Implement with flip-flops and combinational gates.
9. Verilog HDL Fundamentals
Verilog HDL (Hardware Description Language) allows designers to describe digital circuits textually, enabling simulation and automated synthesis to physical gates.
Module Structure
module module_name (
input wire clk,
input wire reset,
input wire [3:0] data_in,
output reg [3:0] data_out
);
wire [3:0] temp;
// Combinational logic — assign
assign temp = data_in & 4'b1111;
// Sequential logic — always
always @(posedge clk or posedge reset) begin
if (reset)
data_out <= 4'b0000;
else
data_out <= temp;
end
endmodule
D Flip-Flop
module d_ff (
input wire clk,
input wire reset,
input wire d,
output reg q
);
always @(posedge clk or posedge reset) begin
if (reset)
q <= 1'b0;
else
q <= d;
end
endmodule
4-bit Synchronous Counter
module counter_4bit (
input wire clk,
input wire reset,
output reg [3:0] count
);
always @(posedge clk or posedge reset) begin
if (reset)
count <= 4'b0000;
else
count <= count + 1'b1;
end
endmodule
Testbench
module tb_counter;
reg clk, reset;
wire [3:0] count;
counter_4bit uut (
.clk(clk),
.reset(reset),
.count(count)
);
// 10 ns clock period
initial clk = 0;
always #5 clk = ~clk;
initial begin
reset = 1;
#20 reset = 0;
#200 $finish;
end
initial begin
$monitor("t=%0t reset=%b count=%b", $time, reset, count);
end
endmodule
Verilog Data Types Summary
| Type | Description | Typical Use |
|---|---|---|
| wire | Combinational net, holds driven value | assign statements, port connections |
| reg | Can store a value between updates | Inside always blocks |
| integer | 32-bit signed integer | Loop variables, simulation |
| parameter | Named constant | Parameterizing module widths |
10. FPGA Fundamentals
FPGA vs ASIC vs Microprocessor
| Attribute | FPGA | ASIC | MCU / CPU |
|---|---|---|---|
| Reconfigurable | Yes | No | Via software |
| Peak performance | Medium–High | Highest | General purpose |
| NRE cost | None | Very high | Low |
| Unit cost (high volume) | High | Very low | Low |
| Parallelism | Excellent | Excellent | Limited |
| Best for | Prototyping, moderate volume | Mass production | Control tasks |
FPGA Internal Architecture
- CLB (Configurable Logic Block): Contains LUTs, flip-flops, and MUXes.
- LUT (Look-Up Table): An n-input LUT implements any n-variable Boolean function by storing the truth table directly in SRAM.
- IOB (I/O Block): Interface between the FPGA fabric and external pins; supports multiple voltage standards and I/O protocols.
- Block RAM (BRAM): Dedicated on-chip memory blocks for storing data arrays.
- DSP Blocks: Hardened fast multiply-accumulate units optimized for signal processing.
- PLL / Clock Management: On-chip phase-locked loops for frequency synthesis and clock distribution.
FPGA Design Flow
1. Design Entry — write HDL (Verilog/VHDL) or use a schematic editor
↓
2. Synthesis — convert HDL to a technology-independent gate-level netlist
↓
3. Implementation
├── Technology Mapping — map netlist to FPGA primitives (LUTs, FFs)
├── Placement (Place) — assign primitives to physical FPGA resources
└── Routing (Route) — connect placed elements with on-chip wires
↓
4. Timing Analysis — verify all setup/hold constraints are met
↓
5. Bitstream Generation — produce the programming file
↓
6. Device Programming — download bitstream to FPGA via JTAG
Leading FPGA Vendors
- AMD-Xilinx: Spartan (cost-optimized), Artix, Kintex, Virtex (high-performance), Zynq (ARM + FPGA SoC).
- Intel (Altera): Cyclone (low cost), Arria (mid-range), Stratix (high-end).
- Lattice Semiconductor: iCE40 (ultra-low power, open-source toolchain available).
11. Basic CPU Design
ALU (Arithmetic Logic Unit)
The ALU is the computational heart of any processor.
ALU Inputs:
A, B — n-bit operands
OpCode — selects the operation
ALU Outputs:
Result — n-bit result
Zero — asserted when Result = 0
Carry — asserted on unsigned overflow
Overflow — asserted on signed overflow
Negative — asserted when Result < 0
Example operations:
000 → A + B (addition)
001 → A - B (subtraction)
010 → A AND B
011 → A OR B
100 → A XOR B
101 → NOT A
110 → A << 1 (logical left shift)
111 → A >> 1 (logical right shift)
Register File
A register file provides fast, on-chip storage for operands and results. A typical 32-register file has two read ports and one write port.
module register_file (
input wire clk,
input wire we,
input wire [4:0] rs1, rs2, rd,
input wire [31:0] write_data,
output wire [31:0] read_data1,
output wire [31:0] read_data2
);
reg [31:0] regs [0:31];
assign read_data1 = regs[rs1];
assign read_data2 = regs[rs2];
always @(posedge clk) begin
if (we && rd != 0)
regs[rd] <= write_data;
end
endmodule
Simple CPU Datapath
IF — Instruction Fetch: read instruction from memory at PC
ID — Instruction Decode: read registers, generate control signals
EX — Execute: ALU performs computation
MEM — Memory Access: load/store data memory
WB — Write Back: write result to register file
Pipelining
A 5-stage pipeline overlaps execution of multiple instructions to increase throughput:
Cycle: 1 2 3 4 5 6 7 8 9
Instr 1: IF ID EX ME WB
Instr 2: IF ID EX ME WB
Instr 3: IF ID EX ME WB
Instr 4: IF ID EX ME WB
Instr 5: IF ID EX ME WB
Pipeline hazards disrupt ideal throughput:
- Structural: Two instructions need the same hardware resource simultaneously.
- Data (RAW/WAR/WAW): An instruction reads a value not yet written by a previous instruction. Resolved with forwarding, stalls, or compiler scheduling.
- Control: Branch outcome unknown until late in the pipeline. Resolved with branch prediction, delayed branching, or speculative execution.
12. Connecting to AI Hardware
GPU Digital Logic Architecture
A modern GPU contains thousands of small ALU cores (CUDA cores, SIMD lanes) operating in parallel. This massive parallelism maps perfectly onto the matrix multiplications that dominate deep learning workloads.
- NVIDIA Tensor Cores: Perform mixed-precision (FP16 × FP16 accumulate to FP32) matrix operations in a single clock cycle.
- SIMT execution model: Thousands of threads execute the same instruction on different data simultaneously.
TPU and NPU
- TPU (Tensor Processing Unit): Google's custom ASIC built around a systolic array of MAC (Multiply-Accumulate) units. Designed exclusively for neural network inference and training.
- NPU (Neural Processing Unit): Edge-device AI accelerators. Examples: Apple Neural Engine, Qualcomm Hexagon DSP, Samsung Exynos NPU.
FPGA-Based AI Acceleration
FPGAs occupy a useful niche in AI acceleration:
- Low latency: Suitable for real-time inference (autonomous driving, industrial vision).
- Flexibility: Reconfigure the bitstream when the model changes without replacing hardware.
- Custom precision: Implement 4-bit or 8-bit arithmetic to balance accuracy and resource usage.
- Real-world example: Microsoft Project Brainwave deploys FPGAs in Azure data centers for DNN inference.
Neuromorphic Computing
Neuromorphic chips model the brain's neuron–synapse architecture directly in silicon.
- Intel Loihi 2: Implements Spiking Neural Networks (SNNs) with on-chip learning.
- IBM TrueNorth: 1 million neurons and 256 million synapses on a single chip with milliwatt power consumption.
- Event-driven, asynchronous computation enables ultra-low power operation — a compelling advantage for always-on edge AI.
13. Learning Roadmap
[Foundations]
Boolean Algebra → Logic Gates → Karnaugh Maps
↓
[Combinational Circuits]
Adders, MUX, Decoder, Encoder
↓
[Sequential Circuits]
Latches → Flip-Flops → Registers → Counters
↓
[System-Level Design]
FSM → Datapath → Control Unit
↓
[HDL and FPGA]
Verilog → Simulation → FPGA Implementation
↓
[Processor Architecture]
ALU → Pipelining → Memory Hierarchy
↓
[AI Hardware]
GPU Architecture → TPU/NPU → FPGA Acceleration
Quiz
Q1. Why is NAND called a universal gate?
Answer: Because NAND alone can implement NOT, AND, and OR, making it possible to build any Boolean function from NAND gates only.
Explanation:
- NOT A = A NAND A
- A AND B = (A NAND B) NAND (A NAND B)
- A OR B = (A NAND A) NAND (B NAND B)
Since NOT, AND, and OR form a functionally complete set, any logic circuit can be realized with NAND gates exclusively. NOR gates share the same property.
Q2. What is the most negative value representable in 8-bit two's complement, and why is the range asymmetric?
Answer: -128 (binary: 10000000). The range -128 to +127 is asymmetric because zero is treated as positive, consuming one code from the positive half.
Explanation: With 8 bits there are 256 possible patterns. One pattern (00000000) represents zero. The two's complement of 10000000 would require 9 bits to represent as +128, so 10000000 maps exclusively to -128. This asymmetry is an inherent property of two's complement representation.
Q3. What is the key difference between a D latch and a D flip-flop?
Answer: A D latch is level-triggered — it passes D to Q continuously while Enable is HIGH. A D flip-flop is edge-triggered — it samples D only at the clock edge and holds the value until the next edge.
Explanation: Level-triggered behavior makes latches difficult to use in synchronous designs because the output can change multiple times while the enable is asserted. Edge-triggered flip-flops provide strict single-update-per-cycle timing, which is the foundation of clocked synchronous design methodology.
Q4. What is a Don't-care condition in a Karnaugh map and when is it used?
Answer: A Don't-care (X) is an input combination that either cannot occur or whose output value is irrelevant to the system. The designer may assign it 0 or 1 to maximize grouping and minimize the final expression.
Explanation: For example, in a circuit that processes BCD digits (0–9), input combinations 1010 through 1111 (10–15) never appear during normal operation. Marking these six minterms as Don't-cares allows larger groups in the K-map, reducing the number of gates required.
Q5. What is a data hazard in a pipelined CPU, and what are the main techniques to handle it?
Answer: A data hazard occurs when an instruction needs a value that has not yet been written back by an earlier instruction still in the pipeline (a Read-After-Write dependency).
Resolution techniques:
- Forwarding (Bypassing): Wire the ALU output directly to the next instruction's ALU input, skipping the register file write-back. Resolves most RAW hazards without stalling.
- Stalling (Pipeline bubble): Insert no-operation cycles until the dependent value is available. Simple but reduces throughput.
- Compiler scheduling: Reorder independent instructions to fill the slots between a producer and its consumer, avoiding stalls without hardware forwarding.
References
- Morris M. Mano & Michael D. Ciletti, Digital Design: With an Introduction to the Verilog HDL, VHDL, and SystemVerilog, Pearson
- Thomas L. Floyd, Digital Fundamentals, Pearson
- Frank Vahid, Digital Design, Wiley
- AMD-Xilinx FPGA Design Resources
- Intel FPGA (Altera) Documentation
- HDLBits — Verilog Practice Problems
- nandland — FPGA/VHDL/Verilog Tutorials