Top/llm-inference

The most inspiring discoveries in llm inference

github.com

Lemonade is a local AI server that allows users to run optimized LLMs on their own hardware, ensuring privacy and cost-effectiveness.

aiamdcppgenaigpu

flux

github.com

Shimmy is a Rust-based inference server providing local, OpenAI-compatible endpoints for machine learning models.

api-servercommand-line-tooldeveloper-toolsggufgpt

flux

github.com

xLLM is an efficient inference engine for large language models, optimized for AI accelerators, enabling cost-effective enterprise deployment.

cppdeepseekglminferenceinference-engine

flux