flux/qwen

Lemonade helps users discover and run local AI apps by serving optimized LLMs right from their own GPUs and NPUs. Join our discord: https://discord.gg/5xXzkMu8Zk - lemonade-sdk/lemonade

Lucebox optimization hub: hand-tuned LLM inference, built for specific consumer hardware. - Luce-Org/lucebox-hub

AI enabled pair programmer for Claude, GPT, O Series, Grok, Deepseek, Gemini and 300+ models - tailcallhq/forgecode
SGLang is a high-performance serving framework for large language models and multimodal models. - sgl-project/sglang
A high-throughput and memory-efficient inference and serving engine for LLMs - vllm-project/vllm

AI enabled pair programmer for Claude, GPT, O Series, Grok, Deepseek, Gemini and 300+ models - antinomyhq/forgecode

Unified web UI for training and running open models like Qwen, DeepSeek, and Gemma locally. - unslothai/unsloth

A high-performance inference engine for LLMs, optimized for diverse AI accelerators. - jd-opensource/xllm

AI enabled pair programmer for Claude, GPT, O Series, Grok, Deepseek, Gemini and 300+ models - antinomyhq/forge