
FastFlowLM enables efficient execution of large language models on AMD Ryzen AI NPUs, optimizing performance without GPU dependency.
FastFlowLM (FLM) is a specialized runtime designed to run large language models on AMD Ryzen™ AI NPUs efficiently. It allows users to execute models without the need for a GPU, achieving faster performance while being over 10× more power-efficient. The tool supports context lengths of up to 256k tokens and is lightweight, with an installation size of only 17 MB.
Key features:
FastFlowLM is particularly beneficial for developers looking to leverage local AI capabilities efficiently and effectively.