6 links
tagged with all of: inference + open-source
Click any tag below to further narrow down your results
Links
Bitnet.cpp is a framework designed for efficient inference of 1-bit large language models (LLMs), offering significant speed and energy consumption improvements on both ARM and x86 CPUs. The software enables the execution of large models locally, achieving speeds comparable to human reading, and aims to inspire further development in 1-bit LLMs. Future plans include GPU support and extensions for other low-bit models.
R-4B is a multimodal large language model that enhances general-purpose auto-thinking by dynamically switching between thinking and non-thinking modes based on task complexity. It employs a two-stage training approach to improve response efficiency and reduce computational costs, achieving state-of-the-art performance among similar models. The model is open-source and offers user control over its thinking capabilities.
InferenceMAX™ is an open-source automated benchmarking tool that continuously evaluates the performance of popular inference frameworks and models to ensure benchmarks remain relevant amidst rapid software improvements. The platform, supported by major industry players, provides real-time insights into inference performance and is seeking engineers to expand its capabilities.
Tokasaurus is a newly released LLM inference engine designed for high-throughput workloads, outperforming existing engines like vLLM and SGLang by more than 3x in benchmarks. It features optimizations for both small and large models, including dynamic prefix identification and various parallelism techniques to enhance efficiency and reduce CPU overhead. The engine supports various model families and is available as an open-source project on GitHub and PyPI.
ANEMLL is an open-source project designed to facilitate the porting of Large Language Models (LLMs) to Apple Neural Engine (ANE) with features like model evaluation, optimized conversion tools, and on-device inference capabilities. The project includes support for various model architectures, a reference implementation in Swift, and automated testing scripts for seamless integration into applications. Its goal is to ensure privacy and efficiency for edge devices by enabling local model execution.
SmolVLA is a compact and open-source Vision-Language-Action model designed for robotics, capable of running on consumer hardware and trained on community-shared datasets. It significantly outperforms larger models in both simulation and real-world tasks, while offering faster response times through asynchronous inference. The model's lightweight architecture and efficient training methods aim to democratize access to advanced robotics capabilities.