8 links
tagged with all of: open-source + language-models
Click any tag below to further narrow down your results
Links
The article describes the implementation of the DeepSeek R1-zero style training for large language models (LLMs) using a single or multiple GPUs, with a focus on simplicity and efficiency. It highlights the capabilities of the nanoAhaMoment project, which includes full parameter tuning, multi-GPU support, and a full evaluation suite, while maintaining competitive performance with minimal complexity. The repository offers interactive Jupyter notebooks and scripts for training, complete with installation instructions and dependency management.
The article discusses Switzerland's development of an open-source AI model named Apertus, designed to facilitate research in large language models (LLMs). The initiative aims to promote transparency and collaboration in AI advancements, allowing researchers to access and contribute to the model's evolution.
OLMo 2 is a family of fully-open language models designed for accessibility and reproducibility in AI research. The largest model, OLMo 2 32B, surpasses GPT-3.5-Turbo and GPT-4o mini on various academic benchmarks, while the smaller models (7B, 13B, and 1B) are competitive with other open-weight models. Ai2 emphasizes the importance of open training data and code to advance collective scientific research.
NanoChat allows users to create their own customizable and hackable language models (LLMs), providing an accessible platform for developers and hobbyists to experiment with AI technology. The initiative aims to democratize LLMs, enabling personalized setups that cater to individual needs without requiring extensive resources. By leveraging open-source principles, NanoChat encourages innovation and exploration in the AI space.
DeepSeek has launched its Terminus model, an update to the V3.1 family that improves agentic tool use and reduces language mixing errors. The new version enhances performance in tasks requiring tool interaction while maintaining its open-source accessibility under an MIT License, challenging proprietary models in the AI landscape.
LRAGE is an open-source toolkit designed for evaluating Large Language Models in a Retrieval-Augmented Generation context, specifically for legal applications. It integrates various tools and datasets to streamline the evaluation process, allowing researchers to effectively assess model performance with minimal engineering effort. Key features include a modular architecture for retrievers and rerankers, a user-friendly GUI, and support for LLM-as-a-Judge evaluations.
Qwen3 has been launched as the latest advanced large language model, featuring two primary models with varying parameters and enhanced capabilities in coding, reasoning, and multilingual support. The model introduces a hybrid thinking approach, enabling users to choose between detailed reasoning and quick responses, significantly improving user experience and performance across various tasks. Additionally, the models are now available for integration on platforms like Hugging Face and Kaggle, aimed at fostering innovation in research and development.
EleutherAI has released the Common Pile v0.1, an 8 TB dataset of openly licensed and public domain text for training large language models, marking a significant advancement from its predecessor, the Pile. The initiative emphasizes the importance of transparency and openness in AI research, aiming to provide researchers with essential tools and a shared corpus for better collaboration and accountability in the field. Future collaborations with cultural heritage institutions are planned to enhance the quality and accessibility of public domain works.