SGLang is an open-source framework for efficient serving of large language and multimodal models, ensuring low-latency and high-throughput performance.
SGLang is a high-performance serving framework designed for large language models and multimodal models. It focuses on delivering low-latency and high-throughput inference across various setups, from single GPUs to large distributed clusters.
Key features include:
SGLang is recognized as the industry standard for LLM inference engines, trusted by leading enterprises and institutions.