The most inspiring discoveries in evaluation

MLflow is an open source platform for managing AI applications, enabling teams to optimize and monitor production-quality models.
Helicone is an open-source LLM observability platform that enables AI engineers to monitor and evaluate models efficiently.

Oumi is an open-source platform for training and deploying LLMs and VLMs, providing tools for evaluation and data synthesis.

Langfuse is an open source platform for LLM observability and management, enabling teams to develop and debug AI applications efficiently.

Agenta is an open-source platform for building reliable LLM applications with integrated management, evaluation, and observability tools.

WeKnora is an LLM-powered framework for intelligent knowledge management and semantic retrieval, enhancing document understanding and Q&A capabilities.

Promptfoo is a CLI tool for evaluating and securing LLM applications through automated testing and red teaming.