8 links
tagged with all of: performance + gpu
Click any tag below to further narrow down your results
Links
The article explores the workings of GPUs, focusing on key performance factors such as compute and memory hierarchy, performance regimes, and strategies for optimization. It highlights the imbalance between computational speed and memory bandwidth, using the NVIDIA A100 GPU as a case study, and discusses techniques like data fusion and tiling to enhance performance. Additionally, it addresses the importance of arithmetic intensity in determining whether operations are memory-bound or compute-bound.
Sirius is a GPU-native SQL engine that integrates with existing databases like DuckDB using the Substrait query format, achieving approximately 10x speedup over CPU query engines for TPC-H workloads. It is designed for interactive analytics and supports various AWS EC2 instances, with detailed setup instructions for installation and performance testing. Sirius is currently in active development, with plans for additional features and support for more database systems.
Python data science workflows can be significantly accelerated using GPU-compatible libraries like cuDF, cuML, and cuGraph with minimal code changes. The article highlights seven drop-in replacements for popular Python libraries, demonstrating how to leverage GPU acceleration to enhance performance on large datasets without altering existing code.
Tile Language (tile-lang) is a domain-specific language designed to simplify the creation of high-performance GPU/CPU kernels with a Pythonic syntax, built on the TVM infrastructure. Recent updates include support for Apple Metal, Huawei Ascend chips, and various performance enhancements for AMD and NVIDIA GPUs. The language allows developers to efficiently implement complex AI operations while focusing on productivity and optimization.
Polars, a DataFrame library designed for performance, has introduced GPU execution capabilities that can achieve up to a 70% speed increase compared to its CPU execution. This enhancement is particularly beneficial for data processing tasks, making it a powerful tool for data engineers and analysts looking to optimize their workflows.
GPUs are critical for high-performance computing, particularly for neural network inference workloads, but achieving optimal GPU utilization can be challenging. This guide outlines three key metrics of GPU utilization—allocation, kernel, and model FLOP/s utilization—and discusses strategies to improve efficiency and performance in GPU applications. Modal's solutions aim to enhance GPU allocation and kernel utilization, helping users achieve better performance and cost-effectiveness.
The article discusses five common performance bottlenecks in pandas workflows, providing solutions for each issue, including using faster parsing engines, optimizing joins, and leveraging GPU acceleration with cudf.pandas for significant speed improvements. It also highlights how users can access GPU resources for free on Google Colab, allowing for enhanced data processing capabilities without code modifications.
Many pandas workflows slow down significantly with large datasets, leading to frustration for data analysts. By utilizing NVIDIA's GPU-accelerated cuDF library, common tasks like analyzing stock prices, processing text-heavy job postings, and building interactive dashboards can be dramatically sped up, often by up to 20 times faster. Additionally, advancements like Unified Virtual Memory allow for processing larger datasets than the GPU's memory, simplifying the workflow for users.