flux/evaluation

RELATED TAGS

llm+4llm-evaluation+3llmops+3prompt-engineering+3typescript+3llm-observability+2observability+2open-source-coding-agent+2openai+2prompt-management+2

github.com

GitHub - langfuse/langfuse: 🪢 Open source LLM engineering platform: LLM Observability, metrics, evals, prompt management, playground, datasets. Integrates with OpenTelemetry, Langchain, OpenAI SDK, LiteLLM, and more. 🍊YC W23

🪢 Open source LLM engineering platform: LLM Observability, metrics, evals, prompt management, playground, datasets. Integrates with OpenTelemetry, Langchain, OpenAI SDK, LiteLLM, and more. 🍊YC W23 ...

langchainllmopen-source-coding-agentopen-sourceplayground

1 day

github.com

GitHub - Agenta-AI/agenta: The open-source LLMOps platform: prompt playground, prompt management, LLM evaluation, and LLM observability all in one place.

The open-source LLMOps platform: prompt playground, prompt management, LLM evaluation, and LLM observability all in one place. - Agenta-AI/agenta

llmopen-source-coding-agentevaluationagentsobservability

1 day

github.com

GitHub - Tencent/WeKnora: LLM-powered framework for deep document understanding, semantic retrieval, and context-aware answers using RAG paradigm.

LLM-powered framework for deep document understanding, semantic retrieval, and context-aware answers using RAG paradigm. - Tencent/WeKnora

llmragagentgolangmulti-tenant

1 day

github.com

GitHub - promptfoo/promptfoo: Test your prompts, agents, and RAGs. Red teaming/pentesting/vulnerability scanning for AI. Compare performance of GPT, Claude, Gemini, Llama, and more. Simple declarative configs with command line and CI/CD integration.

Test your prompts, agents, and RAGs. Red teaming/pentesting/vulnerability scanning for AI. Compare performance of GPT, Claude, Gemini, Llama, and more. Simple declarative configs with command line ...

gptclauderagtestingci

1 day