【lite_llama:轻量级推理框架,专为大型语言模型优化,提供高达3.4倍的加速比,支持最新模型和流式输出】'The llama model inference lite framework by triton.' GitHub: github.com/harleyszhang/lite_llama
【lite_llama:轻量级推理框架,专为大型语言模型优化,提供高达3.4倍的加速比,支持最新模型和流式输出】'The llama model inference lite framework by triton.' GitHub: github.com/harleyszhang/lite_llama