Code/evaluation-framework

Community curated code

github.com

Open Source LLM Evaluation Framework - DeepEval

DeepEval is an open-source framework for evaluating large language models, offering customizable metrics and seamless integration with popular AI frameworks.

evaluation-frameworkevaluation-metricsllmllm-evaluationllm-evaluation-framework

flux

github.com

Framework for Few-Shot Evaluation of Language Models

A framework for evaluating language models with a focus on few-shot tasks, supporting various model backends and benchmarks.

evaluation-frameworklanguage-modelllmopen-source-coding-agentpython

flux

github.com

Promptfoo: CLI for LLM Evaluation and Security Testing

Promptfoo is a CLI tool for evaluating and securing LLM applications through automated testing and red teaming.

cici-cdcicdclaudeevaluation

flux