Quit Emailing Yourself

8 links tagged with all of: transformers + machine-learning

Click any tag below to further narrow down your results

Links

GitHub - tatonetti-lab/pingkit

Pingkit is a toolkit designed for training reproducible, capacity-aware models using transformer activations. It offers features for extracting embeddings, training neural architectures, and creating custom probes tailored to specific research needs. The toolkit is integrated with Hugging Face models and provides various utilities for data processing and model training.

Saved by markshervey · Last saved November 23, 2025 · 6 min read

+ generative-models machine-learning ✓ + embeddings transformers ✓ + toolkit

Text-to-LoRA: Instant Transformer Adaption

Text-to-LoRA (T2L) is a hypernetwork that enables the instant adaptation of large language models to specific tasks using only natural language descriptions, eliminating the need for extensive fine-tuning and dataset curation. Trained on various pre-existing LoRA adapters, T2L can generate task-specific adapters in a single forward pass, demonstrating performance comparable to traditional methods while significantly reducing computational requirements and allowing zero-shot generalization to new tasks.

Saved by tldr-importer · Last saved October 29, 2025 · 2 min read

machine-learning ✓ transformers ✓ + hypernetwork + model-adaptation + foundation-models

Tricks from OpenAI gpt-oss YOU 🫵 can use with transformers

OpenAI's GPT-OSS models introduce several efficiency upgrades for transformers, including MXFP4 quantization and specialized kernels that enhance performance during model loading and execution. The updates allow for faster inference and fine-tuning while maintaining compatibility across major models in the transformers library. Additionally, community-contributed kernels are integrated to streamline usage and performance optimization.

Saved by tldr-importer · Last saved October 29, 2025 · 6 min read

transformers ✓ + quantization + gpt-oss machine-learning ✓ + performance

Release: Power Attention - Manifest AI

Power Attention is an open-source implementation designed to optimize the core operation of symmetric power transformers, enabling efficient training and inference on long-context sequences. It serves as a drop-in replacement for various attention forms, significantly improving performance metrics like loss-per-FLOP compared to traditional and linear attention models. The architecture’s adjustable hyperparameter allows for better balance between weight and state FLOPs, enhancing scalability and learning efficiency.

Saved by tldr-importer · Last saved October 29, 2025 · 2 min read

+ power-attention transformers ✓ machine-learning ✓ + open-source + long-context

NN-Former: Rethinking Graph Structure in Neural Architecture Representation

NN-Former introduces a novel approach to neural architecture representation by combining the strengths of Graph Neural Networks and transformers while addressing their limitations. It emphasizes the importance of sibling nodes in the architecture topology and proposes new mechanisms for predicting accuracy and latency, achieving improved performance in learning Directed Acyclic Graph topology.

Saved by tldr-importer · Last saved October 29, 2025 · 1 min read

+ neural-networks + graph-structures machine-learning ✓ transformers ✓ + architecture-design

One-Minute Video Generation with Test-Time Training

Test-Time Training (TTT) layers enhance pre-trained Transformers' ability to generate one-minute videos from text narratives, yielding improved coherence and aesthetics compared to existing methods. Despite notable artifacts and limitations in the current implementation, TTT-MLP shows significant advancements in temporal consistency and motion smoothness, particularly when tested on a dataset of Tom and Jerry cartoons. Future work aims to extend this approach to longer videos and more complex storytelling.

Saved by tldr-importer · Last saved October 29, 2025 · 2 min read

+ video-generation transformers ✓ + test-time-training machine-learning ✓ + temporal-consistency

Provable In-Context Vector Arithmetic via Retrieving Task Concepts

The study introduces a theoretical framework for understanding in-context learning (ICL) in large language models (LLMs) by utilizing hierarchical concept modeling and optimization theory. It demonstrates how nonlinear residual transformers can effectively perform factual-recall tasks through vector arithmetic, proving strong generalization and robustness against concept recombination and distribution shifts. Empirical simulations support these theoretical findings, showcasing the advantages of transformers over traditional static embeddings.

Saved by tldr-importer · Last saved October 29, 2025 · 1 min read

+ in-context-learning machine-learning ✓ transformers ✓ + optimization + vector-arithmetic

How Transformers Learn Regular Language Recognition: A Theoretical Study on Training Dynamics and Implicit Bias

This study investigates how a one-layer transformer learns to recognize regular languages, focusing on tasks such as 'even pairs' and 'parity check'. Through theoretical analysis of training dynamics under gradient descent, it reveals two distinct phases in the learning process, demonstrating how the attention and linear layers interact to achieve effective separation of data sequences. Experimental results confirm the theoretical findings.

Saved by tldr-importer · Last saved October 29, 2025 · 2 min read

transformers ✓ machine-learning ✓ + natural-language-processing + training-dynamics + regular-languages