Click any tag below to further narrow down your results
Links
This article explains tensor parallelism (TP) in transformer models, focusing on how it allows for efficient matrix multiplication across multiple GPUs. It details the application of TP in both the Multi-Head Attention and Feed-Forward Network components, highlighting its constraints and practical usage with the Hugging Face library.