
Orthrus is a framework for efficient parallel token generation in LLMs, ensuring lossless output and significant speed improvements.
Orthrus is a dual-architecture framework designed for memory-efficient parallel token generation. It combines the generation fidelity of autoregressive Large Language Models (LLMs) with the high-speed capabilities of diffusion models.
Key features include:
Orthrus sets a new standard for parallel generation fidelity and outperforms existing speculative decoding methods, making it a valuable tool for researchers and developers in the field of natural language processing.