Language Learning Quiz

Based on: LLM 추론 최적화 완벽 가이드: vLLM, TensorRT-LLM, Speculative Decoding

Tensor Parallelism

텐서 병렬 처리

하나의 텐서 연산을 여러 GPU에 분할하여 병렬로 처리하는 기법으로, 단일 GPU 메모리에 담기지 않는 대형 모델 추론에 필수적이다

Tap to flip