Back
Join now
About

Popular Tags

  • react
  • typescript
  • ui-components
  • shadcn-ui
  • tailwind
  • react-components
  • llm
  • open-source-coding-agent
  • ai-agents
  • open-source

Top Sources

  • github.com
  • 1771technologies.com
  • 21st.dev
  • abui.io
  • activepieces.com
  • ai-sdk.dev
  • alchemy.run
  • altsendme.com
  • animate-ui.com
  • anthropic.com

Browse by Type

  • Tools
  • Code
bookmrks.io - Discovery, refined.
Website favicongithub.com
Website preview

Efficient LLM Inference with llama.cpp in C/C++

llama.cpp enables high-performance LLM inference in C/C++, supporting various hardware and model types.

flux
Tech Stack
GitHubNode.jsCObjective-CHugging FaceSeleniumLucide IconsPlaywrightMCPZodSvelteKitESLintPrettierStorybookSvelteTailwind CSSTypeScriptViteVitestnpmJavaScriptBashCSSSCSSPoetryDockerPythonGitHub ActionsC++SwiftKotlinMATLABGLSL
Summary

llama.cpp is an open-source project designed for LLM inference using C/C++. It aims to provide high-performance inference capabilities with minimal setup, making it suitable for a variety of hardware configurations, both locally and in the cloud.

Key features:

  • Plain C/C++ implementation - no dependencies required.
  • Optimized for Apple Silicon - utilizes ARM NEON, Accelerate, and Metal frameworks.
  • Support for multiple architectures - includes AVX, AVX2, AVX512 for x86 and RVV for RISC-V.
  • Flexible quantization options - supports 1.5-bit to 8-bit integer quantization for improved performance.
  • Custom CUDA kernels - enables efficient execution on NVIDIA GPUs.

The project serves as a platform for developing new features for the ggml library and supports a wide range of models from various sources, including Hugging Face.

Comments
No comments yet. Sign in to add the first comment!
Tags
  • cpp
    1
  • ggml
    1
  • llm
    1
  • open-source-coding-agent
    1