flux/gpt

IBM just released Granite 4.1, a family of open source language models built specifically for enterprise use. Three sizes, Apache 2.0 licensed and trained on 15 trillion tokens with a level of pipeline obsession that’s worth understanding. But there’s one result in the benchmarks I keep coming back to. The 8B model. Dense architecture, no MoE tricks, no extended reasoning chains. It matches or beats Granite 4.0-H-Small across basically every benchmark they ran. That older model has 32B parameters with 9B active. This one has 8 billion. Full stop. That result is either very impressive or it means the old model was underbuilt. Probably both. Here’s how they built it, what the numbers actually say, and whether any of it matters for your use case.

Find, benchmark and install in CLI 200+ FREE coding LLM models across 20+ providers in real time - vava-nessa/free-coding-models

Bridge local AI coding agents (Claude Code, Cursor, Gemini CLI, Codex) to messaging platforms (Feishu/Lark, DingTalk, Slack, Telegram, Discord, LINE, WeChat Work). Chat with your AI dev assistant f...
LLPhant - A comprehensive PHP Generative AI Framework using OpenAI GPT 4. Inspired by Langchain - LLPhant/LLPhant
⚡ Python-free Rust inference server — OpenAI-API compatible. GGUF + SafeTensors, hot model swap, auto-discovery, single binary. FREE now, FREE forever. - Michael-A-Kuykendall/shimmy

Translate full-length books and documents with Ollama, OpenAI (comptatible), Gemini, Mistral, Poe or OpenRouter. Preserves formatting. Resumes where you left off. No file size limits. - hydropix/Tr...
High-performance AI gateway written in Go - unified OpenAI-compatible API for OpenAI, Anthropic, Gemini, Groq, xAI & Ollama. LiteLLM alternative with observability, guardrails & streaming. ...
🧊 Open source LLM observability platform. One line of code to monitor, evaluate, and experiment. YC W23 🍓 - Helicone/helicone

AI enabled pair programmer for Claude, GPT, O Series, Grok, Deepseek, Gemini and 300+ models - tailcallhq/forgecode
A high-throughput and memory-efficient inference and serving engine for LLMs - vllm-project/vllm