Top/evaluation

The most inspiring discoveries in evaluation

github.com

MLflow: Open Source AI Engineering Platform

MLflow is an open source platform for managing AI applications, enabling teams to optimize and monitor production-quality models.

agentopsagentsaiai-agentsai-governance

flux

github.com

Open Source LLM Observability Platform - Helicone

Helicone is an open-source LLM observability platform that enables AI engineers to monitor and evaluate models efficiently.

agent-monitoringanalyticsevaluationgptlangchain

flux

github.com

Oumi: Open Source LLM/VLM Training Platform

Oumi is an open-source platform for training and deploying LLMs and VLMs, providing tools for evaluation and data synthesis.

dpoevaluationfine-tuninggptgpt-oss

flux

github.com

Open Source LLM Engineering Platform - Langfuse

Langfuse is an open source platform for LLM observability and management, enabling teams to develop and debug AI applications efficiently.

analyticsautogenevaluationlangchainlarge-language-models

flux

github.com

Agenta: Open-Source LLMOps Platform for Developers

Agenta is an open-source platform for building reliable LLM applications with integrated management, evaluation, and observability tools.

agentsevaluationllmllm-as-a-judgellm-evaluation

flux

github.com

WeKnora: LLM Framework for Document Understanding

WeKnora is an LLM-powered framework for intelligent knowledge management and semantic retrieval, enhancing document understanding and Q&A capabilities.

agentagenticaichatbotchatbots

flux

github.com

Promptfoo: CLI for LLM Evaluation and Security Testing

Promptfoo is a CLI tool for evaluating and securing LLM applications through automated testing and red teaming.

cici-cdcicdclaudeevaluation

flux