github.com

whichllm: Optimize Local LLM Performance

whichllm helps you find the best local LLM for your hardware, optimizing AI inference with real-time benchmarks.

flux

Tech Stack

Summary

whichllm is a command-line tool designed to identify the best local large language model (LLM) that operates efficiently on your hardware. It automatically detects your GPU, CPU, and RAM specifications, and ranks models from HuggingFace based on real performance benchmarks rather than just parameter counts.

Key features:

Auto-detect hardware - Identifies NVIDIA, AMD, Apple Silicon, or CPU-only setups.
Smart ranking - Ranks models by VRAM fit, speed, and benchmark quality.
One-command chat - Instantly starts a chat session with the selected model.
Live data integration - Fetches models directly from the HuggingFace API.
Benchmark-aware - Incorporates real evaluation scores with confidence-based adjustments.

This tool is particularly useful for developers and researchers looking to optimize their AI inference tasks by selecting the most suitable model for their specific hardware configuration.

Comments

No comments yet. Sign in to add the first comment!