Top/inference

The most inspiring discoveries in inference

github.com

whichllm: Optimize Local LLM Performance

whichllm helps you find the best local LLM for your hardware, optimizing AI inference with real-time benchmarks.

aiapple-siliconbenchmarksclicommand-line-tool

flux

github.com

High-Performance Framework for Language Models

SGLang is an open-source framework for efficient serving of large language and multimodal models, ensuring low-latency and high-throughput performance.

attentionblackwellcudadeepseekdiffusion

flux

github.com

High-Throughput LLM Inference Engine - vLLM

vLLM is an efficient engine for LLM inference and serving, designed for high throughput and memory management.

amdblackwellcudadeepseekdeepseek-v3

flux

github.com

Oumi: Open Source LLM/VLM Training Platform

Oumi is an open-source platform for training and deploying LLMs and VLMs, providing tools for evaluation and data synthesis.

dpoevaluationfine-tuninggptgpt-oss

flux

github.com

High-Performance Inference Engine for LLMs

xLLM is an efficient inference engine for large language models, optimized for AI accelerators, enabling cost-effective enterprise deployment.

cppdeepseekglminferenceinference-engine

flux