Back
Join now
About

Popular Tags

  • typescript
  • llm
  • ai-agents
  • open-source
  • ai
  • open-source-coding-agent
  • python
  • claude
  • claude-code
  • mcp

Top Sources

  • github.com
  • 21st.dev
  • activepieces.com
  • alchemy.run
  • altsendme.com
  • anthropic.com
  • better-auth-ui.com
  • better-hub.com
  • better-i18n.com
  • better-t-stack.dev

Browse by Type

  • Tools
  • Code
bookmrks.io - Discovery, refined.
Website favicongithub.com
Website preview

High-Performance Inference Engine for LLMs

xLLM is an efficient inference engine for large language models, optimized for AI accelerators, enabling cost-effective enterprise deployment.

flux
Summary

xLLM is a high-performance inference engine designed for large language models (LLMs), optimized specifically for diverse AI accelerators. This framework enables efficient enterprise-grade deployment, significantly enhancing performance while reducing operational costs.

Key features include:

  • Service-Engine Decoupled Architecture - Achieves breakthrough efficiency through elastic scheduling and dynamic PD disaggregation.
  • Multi-Stream Parallel Computing - Utilizes graph fusion optimization and speculative inference for improved throughput.
  • Global KV Cache Management - Implements intelligent offloading and prefetching strategies.
  • Dynamic Load Balancing - Ensures efficient distribution of resources among multiple experts.

xLLM supports the deployment of mainstream models such as DeepSeek-V3.1 and Qwen2/3, facilitating applications in intelligent customer service, risk control, and supply chain optimization.

Comments
No comments yet. Sign in to add the first comment!
Top
  • cpp
    1
  • deepseek
    1
  • glm
    1
  • inference
    1
  • inference-engine
    1
  • large-language-models
    1
  • llm
    1
  • llm-inference
    1
  • open-source-coding-agent
    1
  • qwen
    1