flux/llama

Lemonade helps users discover and run local AI apps by serving optimized LLMs right from their own GPUs and NPUs. Join our discord: https://discord.gg/5xXzkMu8Zk - lemonade-sdk/lemonade

Run LLMs on AMD Ryzen™ AI NPUs in minutes. Just like Ollama - but purpose-built and deeply optimized for the AMD NPUs. - FastFlowLM/FastFlowLM
⚡ Python-free Rust inference server — OpenAI-API compatible. GGUF + SafeTensors, hot model swap, auto-discovery, single binary. FREE now, FREE forever. - Michael-A-Kuykendall/shimmy
SGLang is a high-performance serving framework for large language models and multimodal models. - sgl-project/sglang
A high-throughput and memory-efficient inference and serving engine for LLMs - vllm-project/vllm

Easily fine-tune, evaluate and deploy gpt-oss, Qwen3, DeepSeek-R1, or any open source LLM / VLM! - oumi-ai/oumi

Unified web UI for training and running open models like Qwen, DeepSeek, and Gemma locally. - unslothai/unsloth

:robot: The free, Open Source alternative to OpenAI, Claude and others. Self-hosted and local-first. Drop-in replacement, running on consumer-grade hardware. No GPU required. Runs gguf, transforme...

LangChain4j is an open-source Java library that simplifies the integration of LLMs into Java applications through a unified API, providing access to popular LLMs and vector databases. It makes impl...