Back
Join now
About

Popular Tags

  • typescript
  • react
  • open-source-coding-agent
  • llm
  • ui-components
  • ai-agents
  • shadcn-ui
  • tailwind
  • open-source
  • python

Top Sources

  • github.com
  • clerk.com
  • 1771technologies.com
  • 21st.dev
  • abui.io
  • activepieces.com
  • ai-sdk.dev
  • alash3al.github.io
  • alchemy.run
  • altsendme.com

Browse by Type

  • Tools
  • Code
bookmrks.io - Discovery, refined.
Website favicongithub.com
Website preview

Framework for Few-Shot Evaluation of Language Models

A framework for evaluating language models with a focus on few-shot tasks, supporting various model backends and benchmarks.

flux
Tech Stack
GitHubGitHub ActionsPythonBashC++
Summary

lm-evaluation-harness is a framework designed for the few-shot evaluation of language models. It provides a unified interface to test generative models across a variety of evaluation tasks, ensuring reproducibility and comparability in research.

Key features:

  • Over 60 standard benchmarks - Includes hundreds of subtasks and variants for comprehensive evaluation.
  • Support for multiple model backends - Compatible with models from Hugging Face, GPT-NeoX, and Megatron-DeepSpeed.
  • Flexible tokenization - Offers a tokenization-agnostic interface for ease of use.
  • Fast inference - Optimized for memory efficiency with support for vLLM.
  • Custom prompts and metrics - Allows users to define their own evaluation criteria.

The framework is widely used in academic research and by organizations such as NVIDIA and Cohere, making it a vital tool for evaluating the performance of language models.

Comments
No comments yet. Sign in to add the first comment!
Tags
  • evaluation-framework
    1
  • language-model
    1
  • llm
    1
  • open-source-coding-agent
    1
  • python
    1
  • transformer
    1