Back
Join now
About

Popular Tags

  • typescript
  • react
  • open-source-coding-agent
  • llm
  • ui-components
  • shadcn-ui
  • tailwind
  • ai-agents
  • open-source
  • python

Top Sources

  • github.com
  • clerk.com
  • 1771technologies.com
  • 21st.dev
  • abui.io
  • activepieces.com
  • ai-sdk.dev
  • alash3al.github.io
  • alchemy.run
  • altsendme.com

Browse by Type

  • Tools
  • Code
bookmrks.io - Discovery, refined.
Tags
  • agent-evals
    1
  • agent-skills
    1
  • agentskills
    1
  • ai-agents
    1
  • cli
    1
  • javascript
    1
  • jsonl
    1
  • llm-evals
    1
  • llm-evaluation
    1
  • open-source-coding-agent
    1
  • openai-compatible
    1
  • typescript
    1
  • yaml
    1
Website favicongithub.com
Website preview

AI Skill Evaluation Framework for Agents

Evaluate AI agent skills with this TypeScript-based tool that provides objective performance assessments.

flux
Tech Stack
GitHubTypeScriptnpmNode.jsDependabotGitHub ActionsCSSJavaScript
Summary

agent-skills-eval is a test runner designed for evaluating AI agent skills based on the agentskills.io standard. It facilitates the assessment of AI skills by comparing outputs generated with and without the skill in context, providing a clear measure of effectiveness.

Key features:

  • Dual Evaluation - Runs evaluations with and without the skill loaded, allowing for a direct comparison of performance.
  • Judge Model Grading - Utilizes a judge model to grade both outputs, ensuring objective assessment.
  • Static HTML Reports - Generates comprehensive reports that can be published anywhere, summarizing evaluation results.
  • TypeScript SDK and CLI - Offers a command-line interface for easy integration into CI pipelines and a full SDK for custom implementations.
  • OpenAI-Compatible - Works seamlessly with various AI models that support the OpenAI chat API.

This framework is particularly useful for developers and researchers looking to validate the performance of their AI skills in a structured manner.

Comments
No comments yet. Sign in to add the first comment!