Back
Join now
About

Popular Tags

  • react
  • typescript
  • ui-components
  • shadcn-ui
  • tailwind
  • llm
  • open-source-coding-agent
  • react-components
  • open-source
  • ai-agents

Top Sources

  • github.com
  • clerk.com
  • 1771technologies.com
  • 21st.dev
  • abui.io
  • activepieces.com
  • ai-sdk.dev
  • alchemy.run
  • altsendme.com
  • amd-gaia.ai

Browse by Type

  • Tools
  • Code
bookmrks.io - Discovery, refined.
Website favicongithub.com
Website preview

Lucebox: Optimized LLM Inference for RTX 3090

Lucebox is a hub for optimized LLM inference tailored for specific consumer hardware, enhancing AI performance and efficiency.

flux
Tech Stack
C++PythonCObjective-C
Summary

Lucebox is an optimization hub for LLM inference, specifically designed for hand-tuned performance on various consumer hardware. This repository includes tailored kernels, speculative decoding, and quantization techniques that enhance the efficiency of large language models (LLMs) on specific chips.

Key features:

  • Megakernel Qwen3.5 - A highly efficient kernel for hybrid DeltaNet/Attention LLMs, achieving 1.87 tokens per joule on an RTX 3090.
  • DFlash DDtree Qwen3.5 - The first GGUF port of DFlash speculative decoding, demonstrating significant speed improvements over traditional methods.
  • Custom CUDA kernels - Optimized for tree-aware state rollback, enhancing performance on supported GPUs.
  • Benchmarking tools - Includes scripts for evaluating model performance across various tasks.

The project aims to democratize access to powerful AI capabilities by enabling efficient local AI deployment without vendor lock-in.

Comments
No comments yet. Sign in to add the first comment!
Tags
  • apple-silicon
    1
  • cpp
    1
  • cuda
    1
  • kernel
    1
  • llama-cpp
    1
  • llm
    1
  • local-ai
    1
  • m5
    1
  • m5-max
    1
  • megakernel
    1
  • nvidia-cuda
    1
  • open-source-coding-agent
    1
  • python
    1
  • qwen
    1
  • rtx3090
    1