The newest discoveries in llama

K8sgpt is a tool for diagnosing Kubernetes issues using AI, enhancing cluster management and troubleshooting.

TensorZero is an open-source LLMOps platform unifying API access, observability, evaluation, optimization, and experimentation for large language models.

A curated list of services offering free access to LLM APIs for developers and researchers.

Lemonade is a local AI server that allows users to run optimized LLMs on their own hardware, ensuring privacy and cost-effectiveness.

FastFlowLM enables efficient execution of large language models on AMD Ryzen AI NPUs, optimizing performance without GPU dependency.
Shimmy is a Rust-based inference server providing local, OpenAI-compatible endpoints for machine learning models.
SGLang is an open-source framework for efficient serving of large language and multimodal models, ensuring low-latency and high-throughput performance.
vLLM is an efficient engine for LLM inference and serving, designed for high throughput and memory management.

Oumi is an open-source platform for training and deploying LLMs and VLMs, providing tools for evaluation and data synthesis.

Unsloth is a web UI for training and running AI models locally, enhancing efficiency and performance.