24 links
tagged with all of: machine-learning + language-models
Click any tag below to further narrow down your results
Links
StableToken is introduced as a noise-robust semantic speech tokenizer that addresses the fragility of existing tokenizers when faced with irrelevant acoustic perturbations. By leveraging a multi-branch architecture and a consensus-driven bit-wise voting mechanism, StableToken significantly enhances token stability and improves the performance of SpeechLLMs across various tasks, reducing Unit Edit Distance under noisy conditions.
Set Block Decoding (SBD) introduces a novel approach to accelerate the inference process in autoregressive language models by integrating next token prediction and masked token prediction. This method allows for parallel sampling of multiple tokens and achieves a significant reduction in computational requirements without compromising accuracy, as demonstrated through fine-tuning existing models like Llama-3.1 and Qwen-3. SBD provides a 3-5x decrease in forward passes needed for generation while maintaining performance levels similar to standard training methods.
A new active learning method developed by Google significantly reduces the amount of training data required for fine-tuning large language models (LLMs) while enhancing alignment with human expert evaluations. This scalable curation process allows for the identification of the most informative examples and achieves up to a 10,000x reduction in training data, enabling more effective responses to the evolving challenges of ad safety content classification.
LoRA (Low-Rank Adaptation) is a parameter-efficient fine-tuning method that allows large language models to be updated with fewer parameters, making post-training faster and more resource-efficient. Recent experiments show that LoRA can achieve performance comparable to full fine-tuning (FullFT) under certain conditions, particularly with small-to-medium-sized datasets, but may struggle with larger datasets and high batch sizes. Key findings suggest a "low-regret regime" where LoRA's efficiency aligns with FullFT, paving the way for its broader application in various scenarios.
Recent advancements in large language models (LLMs) have prompted discussions about their reasoning capabilities. This study introduces a representation engineering approach that leverages model activations to create control vectors, enhancing reasoning performance on various tasks without additional training. The results indicate that modulating model activations can effectively improve LLMs' reasoning abilities.
Apple has unveiled updates to its on-device and server foundation language models, enhancing generative AI capabilities while prioritizing user privacy. The new models, optimized for Apple silicon, support multiple languages and improved efficiency, incorporating advanced architectures and diverse training data, including image-text pairs, to power intelligent features across its platforms.
Large language models (LLMs) typically cannot adapt their weights dynamically to new tasks or knowledge. The Self-Adapting LLMs (SEAL) framework addresses this limitation by allowing models to generate their own finetuning data and directives for self-adaptation through a reinforcement learning approach, resulting in persistent weight updates and improved performance in knowledge incorporation and few-shot generalization tasks.
The survey explores the integration of Large Language Models (LLMs) in time series analytics, addressing the cross-modality gap between text and time series data. It categorizes existing methodologies, reviews key strategies for alignment and fusion, and evaluates their effectiveness through experiments on multimodal datasets. The study also outlines future research directions for enhancing LLM-based time series modeling.
The article serves as an introduction to VLLM, a framework designed for serving large language models efficiently. It discusses the benefits of using VLLM, including reduced latency and improved resource management, making it suitable for production environments. Key features and implementation steps are also highlighted to assist users in adopting this technology.
The study investigates the impact of instruction tuning on the confidence calibration of large language models (LLMs), revealing significant degradation in calibration post-tuning. It introduces label smoothing as a promising solution to mitigate overconfidence during supervised fine-tuning, while also addressing challenges related to memory consumption in the computation of cross-entropy loss.
Pinterest has improved its search relevance by implementing a large language model (LLM)-based pipeline that enhances how search queries align with Pins. The system utilizes knowledge distillation to scale a student relevance model from a teacher model, integrating enriched text features and conducting extensive offline and online experiments to validate its effectiveness. Results indicate significant improvements in search feed relevance and fulfillment rates across diverse languages and regions.
Privacy-preserving synthetic data can enhance the performance of both small and large language models (LLMs) in mobile applications like Gboard, improving user typing experiences while minimizing privacy risks. By utilizing federated learning and differential privacy, Google researchers have developed methods to synthesize data that mimics user interactions without accessing sensitive information, resulting in significant accuracy improvements and efficient model training. Ongoing advancements aim to further refine these techniques and integrate them into mobile environments.
Large Language Models (LLMs) can significantly enhance data annotation but often produce incorrect labels due to uncertainty. This work proposes a candidate annotation paradigm that encourages LLMs to provide multiple possible labels, utilizing a teacher-student framework called CanDist to distill these annotations into unique labels for downstream tasks. Experiments demonstrate the effectiveness of this method across various text classification challenges.
Large language models (LLMs) have revolutionized programming by enabling non-technical users to write code, yet questions remain about their understanding of code concepts, particularly nullability. This article explores how LLMs infer nullability through internal representations and offers insights into their reasoning processes when generating code, highlighting both their strengths and limitations in handling nullable types.
TreeRL is a novel reinforcement learning framework that integrates on-policy tree search to enhance the training of language models. By incorporating intermediate supervision and optimizing search efficiency, TreeRL addresses issues common in traditional reinforcement learning methods, such as distribution mismatch and reward hacking. Experimental results show that TreeRL outperforms existing methods in math and code reasoning tasks, showcasing the effectiveness of tree search in this domain.
DuPO introduces a dual learning-based preference optimization framework designed to generate annotation-free feedback, overcoming limitations of existing methods such as RLVR and traditional dual learning. By decomposing a task's input into known and unknown components and reconstructing the unknown part, DuPO enhances various tasks, achieving significant improvements in translation quality and mathematical reasoning accuracy. This framework positions itself as a scalable and general approach for optimizing large language models (LLMs) without the need for costly labels.
ReLearn is a novel pipeline for unlearning in large language models that enhances targeted forgetting while maintaining high-quality output. It addresses limitations of existing methods by introducing a comprehensive evaluation framework that includes new metrics for knowledge preservation and generation quality. Experiments demonstrate that ReLearn effectively mitigates the negative effects of reverse optimization on coherent text generation.
T5Gemma introduces a new collection of encoder-decoder large language models (LLMs) developed by adapting pretrained decoder-only models. This approach enhances performance across various tasks, demonstrating significant improvements in quality and inference efficiency compared to traditional models. The release includes multiple sizes and configurations, offering opportunities for further research and application development.
The article discusses the limitations of tokenization in large language models (LLMs) and argues for a shift towards more general methods that leverage compute and data, in line with The Bitter Lesson principle. It explores potential alternatives, such as Byte Latent Transformers, and examines the implications of moving beyond traditional tokenization approaches, emphasizing the need for improved modeling of natural language.
The article discusses Stripe's advancements in payment technology, particularly focusing on the transition from traditional machine learning (ML) to large language models (LLMs) like GPT. It emphasizes how Stripe is setting new standards in the payments industry by leveraging these advanced AI technologies to improve user experience and transaction efficiency.
The article explores advanced techniques in topic modeling using large language models (LLMs), highlighting their effectiveness in extracting meaningful topics from textual data. It discusses various methodologies and tools that leverage LLMs for improved accuracy and insights in topic identification. Practical applications and examples illustrate how these techniques can enhance data analysis in various fields.
Fine-tuned small language models (LLMs) can outperform larger models while being significantly more cost-effective, achieving results at 5 to 30 times lower costs. This efficiency is attributed to programmatic data curation techniques that enhance the training process of these smaller models.
VaultGemma is a new 1B-parameter language model developed by Google Research that incorporates differential privacy from the ground up, addressing the inherent trade-offs between privacy, compute, and utility. The model is designed to minimize memorization of training data while providing robust performance, and its training was guided by newly established scaling laws for differentially private language models. Released alongside its weights, VaultGemma aims to foster the development of safe and private AI technologies.
StochasTok is a novel stochastic tokenization method that enhances large language models' (LLMs) understanding of subword structures by randomly splitting tokens during training. This approach significantly improves performance on various subword-level tasks, such as character counting and substring identification, without the high computational costs associated with previous methods. Additionally, StochasTok can be easily integrated into existing pretrained models, yielding considerable improvements with minimal changes.