Quit Emailing Yourself

GitHub - tatonetti-lab/pingkit

Pingkit is a toolkit designed for training reproducible, capacity-aware models using transformer activations. It offers features for extracting embeddings, training neural architectures, and creating custom probes tailored to specific research needs. The toolkit is integrated with Hugging Face models and provides various utilities for data processing and model training.

Saved by markshervey · Last saved November 23, 2025 · 6 min read

+ generative-models + machine-learning + embeddings transformers ✓ + toolkit

allenai/OLMo-2-0425-1B · Hugging Face

OLMo 2 1B is the smallest model in the OLMo 2 family, featuring a transformer-style architecture with 4 trillion training tokens. It supports multiple models and fine-tuning options, and is designed for language modeling applications. The model and its associated resources are available on GitHub under an Apache 2.0 license.

Saved by tldr-importer · Last saved October 29, 2025 · 5 min read

+ olmo + language-model transformers ✓ + ai + deep-learning

Text-to-LoRA: Instant Transformer Adaption

Text-to-LoRA (T2L) is a hypernetwork that enables the instant adaptation of large language models to specific tasks using only natural language descriptions, eliminating the need for extensive fine-tuning and dataset curation. Trained on various pre-existing LoRA adapters, T2L can generate task-specific adapters in a single forward pass, demonstrating performance comparable to traditional methods while significantly reducing computational requirements and allowing zero-shot generalization to new tasks.

Saved by tldr-importer · Last saved October 29, 2025 · 2 min read

+ machine-learning transformers ✓ + hypernetwork + model-adaptation + foundation-models

Attention Wasn't All We Needed - Stephen Diehl

Modern techniques have emerged since the original "Attention Is All You Need" paper to optimize transformer architectures, focusing on reducing memory usage and computational costs during inference. Key advancements include Group Query Attention, Multi-head Latent Attention, and various architectural innovations that enhance performance without significantly compromising quality. These methods aim to improve the efficiency of large models in practical applications.

Saved by tldr-importer · Last saved October 29, 2025 · 6 min read

transformers ✓ + attention + optimization + pytorch + neural-networks

OmDet-Turbo

OmDet-Turbo is a real-time open-vocabulary object detection model that integrates components from RT-DETR and features an Efficient Fusion Head for enhanced performance. It achieves impressive results with up to 100.2 FPS and 53.4 AP on COCO zero-shot, making it suitable for industrial applications that require rapid and accurate detection capabilities. The model's unique architecture allows for efficient text embedding caching, improving the decoding process for object detection tasks.

Saved by tldr-importer · Last saved October 29, 2025 · 5 min read

+ object-detection transformers ✓ + real-time + open-vocabulary + multimodal

Efficient AI Computing,Transforming the Future.

Researchers discovered that language models fail on long conversations due to the removal of initial tokens, which act as "attention sinks" that stabilize attention distribution. Their solution, StreamingLLM, retains these tokens permanently, allowing models to process sequences of over 4 million tokens effectively. This approach has been integrated into major frameworks like HuggingFace and OpenAI's latest models.

Saved by tldr-importer · Last saved October 29, 2025 · 6 min read

+ language-models + attention-sinks + streamingllm + openai transformers ✓

Release: Power Attention - Manifest AI

Power Attention is an open-source implementation designed to optimize the core operation of symmetric power transformers, enabling efficient training and inference on long-context sequences. It serves as a drop-in replacement for various attention forms, significantly improving performance metrics like loss-per-FLOP compared to traditional and linear attention models. The architecture’s adjustable hyperparameter allows for better balance between weight and state FLOPs, enhancing scalability and learning efficiency.

Saved by tldr-importer · Last saved October 29, 2025 · 2 min read

+ power-attention transformers ✓ + machine-learning + open-source + long-context

Tricks from OpenAI gpt-oss YOU 🫵 can use with transformers

OpenAI's GPT-OSS models introduce several efficiency upgrades for transformers, including MXFP4 quantization and specialized kernels that enhance performance during model loading and execution. The updates allow for faster inference and fine-tuning while maintaining compatibility across major models in the transformers library. Additionally, community-contributed kernels are integrated to streamline usage and performance optimization.

Saved by tldr-importer · Last saved October 29, 2025 · 6 min read

transformers ✓ + quantization + gpt-oss + machine-learning + performance

One-Minute Video Generation with Test-Time Training

Test-Time Training (TTT) layers enhance pre-trained Transformers' ability to generate one-minute videos from text narratives, yielding improved coherence and aesthetics compared to existing methods. Despite notable artifacts and limitations in the current implementation, TTT-MLP shows significant advancements in temporal consistency and motion smoothness, particularly when tested on a dataset of Tom and Jerry cartoons. Future work aims to extend this approach to longer videos and more complex storytelling.

Saved by tldr-importer · Last saved October 29, 2025 · 2 min read

+ video-generation transformers ✓ + test-time-training + machine-learning + temporal-consistency

NN-Former: Rethinking Graph Structure in Neural Architecture Representation

NN-Former introduces a novel approach to neural architecture representation by combining the strengths of Graph Neural Networks and transformers while addressing their limitations. It emphasizes the importance of sibling nodes in the architecture topology and proposes new mechanisms for predicting accuracy and latency, achieving improved performance in learning Directed Acyclic Graph topology.

Saved by tldr-importer · Last saved October 29, 2025 · 1 min read

+ neural-networks + graph-structures + machine-learning transformers ✓ + architecture-design

Transformers backend integration in SGLang

SGLang has integrated Hugging Face transformers as a backend, enhancing inference performance for models while maintaining the flexibility of the transformers library. This integration allows for high-throughput, low-latency tasks and supports models not natively compatible with SGLang, streamlining deployment and usage. Key features include automatic fallback to transformers and optimized performance through mechanisms like RadixAttention.

Saved by tldr-importer · Last saved October 29, 2025 · 3 min read

+ sglang transformers ✓ + inference + performance + integration

GitHub - bscho333/ReVisiT

ReVisiT is a decoding-time algorithm designed for language-vision models (LVLMs) that enhances visual grounding by utilizing internal vision tokens as references. It aligns text generation with visual semantics without altering the underlying model, requiring specific implementations for various Transformer versions. The repository offers setup instructions, evaluation scripts, and integration guidance for users looking to incorporate ReVisiT into their own environments.

Saved by tldr-importer · Last saved October 29, 2025 · 1 min read

+ vision-tokens + lvmls + decoding transformers ✓ + evaluation

Provable In-Context Vector Arithmetic via Retrieving Task Concepts

The study introduces a theoretical framework for understanding in-context learning (ICL) in large language models (LLMs) by utilizing hierarchical concept modeling and optimization theory. It demonstrates how nonlinear residual transformers can effectively perform factual-recall tasks through vector arithmetic, proving strong generalization and robustness against concept recombination and distribution shifts. Empirical simulations support these theoretical findings, showcasing the advantages of transformers over traditional static embeddings.

Saved by tldr-importer · Last saved October 29, 2025 · 1 min read

+ in-context-learning + machine-learning transformers ✓ + optimization + vector-arithmetic

unsloth/DeepSeek-R1-GGUF at main

The DeepSeek-R1-GGUF model repository on Hugging Face hosts large datasets and model files for text generation tasks, specifically utilizing the DeepSeek architecture. It includes multiple versions of the model, all under an MIT license, and is part of a community-driven project by Unsloth AI.

Saved by tldr-importer · Last saved October 29, 2025 · 1 min read

+ deepseek + text-generation transformers ✓ + gguf + open-source

How Transformers Learn Regular Language Recognition: A Theoretical Study on Training Dynamics and Implicit Bias

This study investigates how a one-layer transformer learns to recognize regular languages, focusing on tasks such as 'even pairs' and 'parity check'. Through theoretical analysis of training dynamics under gradient descent, it reveals two distinct phases in the learning process, demonstrating how the attention and linear layers interact to achieve effective separation of data sequences. Experimental results confirm the theoretical findings.

Saved by tldr-importer · Last saved October 29, 2025 · 2 min read

transformers ✓ + machine-learning + natural-language-processing + training-dynamics + regular-languages

Reddit - The heart of the internet

The article discusses the subreddit r/Transformemes, which is dedicated to memes related to the Transformers franchise. It highlights various aspects of the Transformers universe, including character designs, humorous transformations, and iconic battles, while also providing related topics and community engagement.

Saved by hn_user_2 · Last saved October 28, 2025 · 1 min read

transformers ✓ + memes + subreddit

[2510.00184] Why Can't Transformers Learn Multiplication? Reverse-Engineering Reveals Long-Range Dependency Pitfalls

This article investigates why transformer models struggle with multi-digit multiplication despite their advanced capabilities. Through reverse-engineering, the authors identify that while the model can encode necessary long-range dependencies, it converges to a local optimum that lacks these dependencies, suggesting that introducing an auxiliary loss can help the model learn this task effectively.

Saved by hn_user_15 · 2 others saved this · Last saved October 28, 2025 · 3 min read

transformers ✓ + multiplication + long-range dependencies + machine learning

Sakana AI's CTO says he's 'absolutely sick' of transformers, the tech that powers every major AI model | VentureBeat

Llion Jones, CTO of Sakana AI and co-author of the influential transformer paper, expressed concerns at the TED AI conference about the stagnation in AI research due to an overwhelming focus on transformer architecture. He argues that the pressure for quick returns and competitive research has stifled creativity, preventing the exploration of potentially groundbreaking innovations in the field. Jones is now seeking to shift away from transformers in pursuit of the next big advancement in AI.

Saved by hn_user_4 · 1 other saved this · Last saved October 28, 2025 · 3 min read

transformers ✓ + ai research + creativity + ai + innovation

Links