Text Generation Interface (TGI) Review

Tech

임로켓 2025. 1. 29. 09:36

728x90

TGI의 소개 페이지에서는 맨 처음 여러가지 최적화와 기능들을 구현했다고 말하고 있습니다.

그 중에서 다음 몇가지 항목들에 대한 리뷰를 하고 정리해보겠습니다.

Tensor Parallelism for faster inference on multiple GPUs
Tokne streaming using Server-Senf Events (SSE)
Continuous batching of incoming requests for increased total throughput
Optimized transformers code for inference using Flash Attantion and Paged Attention on the most popular architectures
Quantization with bitsandbytes and GPT-Q
Stop sequences

그리고 server 및 client 코드를 살펴보고, 각각의 기능들에 대한 리뷰를 해보겠습니다.

728x90

[AI상식] LLM은 어떻게 동작할까 - Embedding (6)	2025.02.01
Deepseek v3 code review - model (0)	2025.01.30
TGI Review - server (0)	2025.01.29
Tensor Parallelism for faster inference on multiple GPUs (0)	2025.01.29
Rust toy project (0)	2023.04.29

Thinking, Writing, and.

소프트웨어 개발에 관련된 이야기, 조직문화 이야기, llm 관련 논문 리뷰, 그리고 이런저런 이야기들을 합니다.

07-22 03:38

250x250

sarathi, 논문리뷰, ai상식, MOE, paper, vllm, inference, 소프트웨어개발, LLM, nVidia, 북리뷰, transformer, 투자, GPU, Ai, ETF, CUDA, 조직문화, deepseek, 협업,

Thinking, Writing, and.