github.com

Python-free Rust Inference Server for OpenAI API

Shimmy is a Rust-based inference server providing local, OpenAI-compatible endpoints for machine learning models.

flux

Tech Stack

GitHub Cloudflare Cloudflare Workers Nginx Fly.io Railway Render Node.js JavaScript Cargo Rust Express Jest Docker Bash Python Codecov GitHub Actions C Objective-C Ruby

Summary

Shimmy is a Python-free Rust inference server designed to provide OpenAI-compatible endpoints for GGUF models. It allows users to run AI models locally, ensuring privacy and eliminating the need for external API calls.

Key features:

Single Binary - Download and run without any compilation or dependencies.
Automatic Model Discovery - Finds models from local directories and caches without manual configuration.
Advanced MOE Support - Efficiently runs large models on consumer hardware with intelligent CPU/GPU processing.
Zero Configuration - Automatically allocates ports and detects model adapters.
Compatibility - Works seamlessly with existing OpenAI SDKs and tools.

Shimmy is ideal for developers looking for a reliable and efficient way to run machine learning models locally while maintaining control over their data and environment.

Comments

No comments yet. Sign in to add the first comment!