Back
Join now
About

Popular Tags

  • typescript
  • llm
  • open-source
  • ai-agents
  • ai
  • open-source-coding-agent
  • python
  • claude
  • mcp-server
  • mcp

Top Sources

  • github.com
  • 21st.dev
  • activepieces.com
  • alchemy.run
  • altsendme.com
  • anthropic.com
  • better-auth-ui.com
  • better-hub.com
  • better-i18n.com
  • better-t-stack.dev

Browse by Type

  • Tools
  • Code
bookmrks.io - Discovery, refined.
Website favicongithub.com
Website preview

Efficient LLM Inference with llama.cpp in C/C++

llama.cpp enables high-performance LLM inference in C/C++, supporting various hardware and model types.

flux
Summary

llama.cpp is an open-source project designed for LLM inference using C/C++. It aims to provide high-performance inference capabilities with minimal setup, making it suitable for a variety of hardware configurations, both locally and in the cloud.

Key features:

  • Plain C/C++ implementation - no dependencies required.
  • Optimized for Apple Silicon - utilizes ARM NEON, Accelerate, and Metal frameworks.
  • Support for multiple architectures - includes AVX, AVX2, AVX512 for x86 and RVV for RISC-V.
  • Flexible quantization options - supports 1.5-bit to 8-bit integer quantization for improved performance.
  • Custom CUDA kernels - enables efficient execution on NVIDIA GPUs.

The project serves as a platform for developing new features for the ggml library and supports a wide range of models from various sources, including Hugging Face.

Comments
No comments yet. Sign in to add the first comment!
Top
  • cpp
    1
  • ggml
    1
  • llm
    1
  • open-source-coding-agent
    1