flux/cpp

Lemonade helps users discover and run local AI apps by serving optimized LLMs right from their own GPUs and NPUs. Join our discord: https://discord.gg/5xXzkMu8Zk - lemonade-sdk/lemonade

Run LLMs on AMD Ryzen™ AI NPUs in minutes. Just like Ollama - but purpose-built and deeply optimized for the AMD NPUs. - FastFlowLM/FastFlowLM

Lucebox optimization hub: hand-tuned LLM inference, built for specific consumer hardware. - Luce-Org/lucebox-hub

MNN: A blazing-fast, lightweight inference engine battle-tested by Alibaba, powering high-performance on-device LLMs and Edge AI. - alibaba/MNN

A high-performance inference engine for LLMs, optimized for diverse AI accelerators. - jd-opensource/xllm

LLM inference in C/C++. Contribute to ggml-org/llama.cpp development by creating an account on GitHub.