18 links
tagged with all of: llm + open-source
Click any tag below to further narrow down your results
Links
LLM-SRBench is a new benchmark aimed at enhancing scientific equation discovery using large language models, featuring comprehensive evaluation methods and open-source implementation. It includes a structured setup guide for running and contributing new search methods, as well as the necessary configurations for various datasets. The benchmark has been recognized for its significance, being selected for oral presentation at ICML 2025.
Bitnet.cpp is a framework designed for efficient inference of 1-bit large language models (LLMs), offering significant speed and energy consumption improvements on both ARM and x86 CPUs. The software enables the execution of large models locally, achieving speeds comparable to human reading, and aims to inspire further development in 1-bit LLMs. Future plans include GPU support and extensions for other low-bit models.
Memento is a memory-based continual-learning framework designed for LLM agents, enabling them to learn from experiences without updating model weights through a planner-executor architecture and case-based reasoning. It features a comprehensive tool ecosystem for web search, document processing, and more, demonstrating strong benchmark performance across various datasets. Open-source code for parametric and non-parametric case-based reasoning inference has been released, along with tools for local deployment and collaboration.
LLMc is a novel compression engine that utilizes large language models (LLMs) to achieve superior data compression by leveraging rank-based encoding. It surpasses traditional methods such as ZIP and LZMA, demonstrating enhanced efficiency in processing and decompression. The project is open-source and aims to encourage contributions from the research community.
Index is an advanced open-source browser agent that simplifies complex web tasks by transforming any website into an accessible API. It supports multiple reasoning models, structured output for data extraction, and offers both a command-line interface and serverless API for seamless integration into projects. Users can also trace agent actions and utilize a personal browser for enhanced functionality.
Oneiromancer is a reverse engineering assistant that leverages a fine-tuned LLM to analyze code snippets, providing high-level descriptions, recommended function names, and variable renaming suggestions. It supports cross-platform integration with popular IDEs and allows for easy installation via crates.io or building from source. The tool aims to enhance code analysis efficiency and improve developers' understanding of their code's functionality.
Kimi-Dev-72B is an advanced open-source coding language model designed for software engineering tasks, achieving a state-of-the-art performance of 60.4% on the SWE-bench Verified benchmark. It leverages large-scale reinforcement learning to autonomously patch real repositories and ensures high-quality solutions by only rewarding successful test suite completions. Developers and researchers are encouraged to explore and contribute to its capabilities, available for download on Hugging Face and GitHub.
BrowserBee is an open-source Chrome extension that enables users to control their browser using natural language, leveraging LLMs for instruction parsing and Playwright for automation. The project has been halted due to the current limitations of LLM technology in effectively interacting with web pages, despite a growing competition in AI browser tools. Users are advised to proceed with caution as the development ceases and future improvements in web page representation and LLM capabilities are anticipated.
ANEMLL is an open-source project designed to facilitate the porting of Large Language Models (LLMs) to Apple Neural Engine (ANE) with features like model evaluation, optimized conversion tools, and on-device inference capabilities. The project includes support for various model architectures, a reference implementation in Swift, and automated testing scripts for seamless integration into applications. Its goal is to ensure privacy and efficiency for edge devices by enabling local model execution.
Tokasaurus is a newly released LLM inference engine designed for high-throughput workloads, outperforming existing engines like vLLM and SGLang by more than 3x in benchmarks. It features optimizations for both small and large models, including dynamic prefix identification and various parallelism techniques to enhance efficiency and reduce CPU overhead. The engine supports various model families and is available as an open-source project on GitHub and PyPI.
Sakana AI introduces Multi-LLM AB-MCTS, a novel approach that enables multiple large language models to collaborate on tasks, outperforming individual models by 30%. This technique leverages the strengths of diverse AI models, enhancing problem-solving capabilities and is now available as an open-source framework called TreeQuest.
The first beta release of OM1, an open-source and modular operating system for robots, has been announced, featuring integrations with multiple LLM providers, advanced autonomy capabilities, and simulator support. Key enhancements include support for various robots, speech-to-text and text-to-speech functionalities, and improvements in navigation and interaction with hardware components. Developers can leverage this release to prototype and deploy robotics applications across different platforms.
Octo is a zero-telemetry coding assistant that supports various OpenAI-compatible and Anthropic-compatible LLM APIs, allowing users to switch models mid-conversation. It features built-in Docker support, customizable configuration, and can work seamlessly with local LLMs. Octo prioritizes user privacy and provides functionalities to manage coding tasks effectively while maintaining a user-friendly interface.
Tunix is a new open-source, JAX-native library designed to simplify the post-training process for large language models (LLMs). It offers a comprehensive toolkit for model alignment, including various algorithms for supervised fine-tuning, preference tuning, reinforcement learning, and knowledge distillation, all optimized for performance on TPUs. The library enhances the developer experience with a white-box design and seamless integration into the JAX ecosystem.
RepoAudit is a multi-agent bug detection tool designed for various programming languages, offering features such as compilation-free analysis and support for multiple bug types. It utilizes LLMSCAN for code parsing and implements two agents for scanning, helping identify and fix a significant number of bugs in open-source projects. The tool is easy to use with a simple command-line interface for project scanning.
Sidekick is an open-source CLI-based AI tool designed as an alternative to existing AI coding assistants, allowing users to choose from various LLM providers without vendor lock-in. It features a flexible system for managing AI models, project-specific guidance, and a user-friendly command line interface, currently in beta and actively developed. Users can easily install it and configure their preferences for enhanced productivity.
PromptMe is an educational project that highlights security vulnerabilities in large language model (LLM) applications, featuring 10 hands-on challenges based on the OWASP LLM Top 10. Aimed at AI security professionals, it provides a platform to explore risks and mitigation strategies, using Python and the Ollama framework. Users can set up the application to learn about vulnerabilities through CTF-style challenges, with solutions available for beginners.
Lemonade is a tool designed to help users efficiently run local large language models (LLMs) by configuring advanced inference engines for their hardware, including NPUs and GPUs. It supports both GGUF and ONNX models, offers a user-friendly interface for model management, and is utilized by various organizations, from startups to large companies like AMD. The platform also provides an API and CLI for Python application integration, alongside extensive hardware support and community collaboration opportunities.