Quit Emailing Yourself

# performance → gpu

8 links tagged with all of: performance + gpu

Click any tag below to further narrow down your results

Links

Basic facts about GPUs

The article explores the workings of GPUs, focusing on key performance factors such as compute and memory hierarchy, performance regimes, and strategies for optimization. It highlights the imbalance between computational speed and memory bandwidth, using the NVIDIA A100 GPU as a case study, and discusses techniques like data fusion and tiling to enhance performance. Additionally, it addresses the importance of arithmetic intensity in determining whether operations are memory-bound or compute-bound.

Saved by tldr-importer · Last saved October 29, 2025 · 7 min read

gpu ✓ performance ✓ + optimization + arithmetic-intensity + memory-bandwidth

GitHub - sirius-db/sirius

Sirius is a GPU-native SQL engine that integrates with existing databases like DuckDB using the Substrait query format, achieving approximately 10x speedup over CPU query engines for TPC-H workloads. It is designed for interactive analytics and supports various AWS EC2 instances, with detailed setup instructions for installation and performance testing. Sirius is currently in active development, with plans for additional features and support for more database systems.

Saved by tldr-importer · Last saved October 29, 2025 · 5 min read

gpu ✓ + sql + duckdb performance ✓ + analytics

7 Drop-In Replacements to Instantly Speed Up Your Python Data Science Workflows | NVIDIA Technical Blog

Python data science workflows can be significantly accelerated using GPU-compatible libraries like cuDF, cuML, and cuGraph with minimal code changes. The article highlights seven drop-in replacements for popular Python libraries, demonstrating how to leverage GPU acceleration to enhance performance on large datasets without altering existing code.

Saved by tldr-importer · Last saved October 29, 2025 · 5 min read

+ python gpu ✓ + data-science performance ✓ + acceleration

GitHub - tile-ai/tilelang: Domain-specific language designed to streamline the development of high-performance GPU/CPU/Accelerators kernels

Tile Language (tile-lang) is a domain-specific language designed to simplify the creation of high-performance GPU/CPU kernels with a Pythonic syntax, built on the TVM infrastructure. Recent updates include support for Apple Metal, Huawei Ascend chips, and various performance enhancements for AMD and NVIDIA GPUs. The language allows developers to efficiently implement complex AI operations while focusing on productivity and optimization.

Saved by tldr-importer · Last saved October 29, 2025 · 5 min read

gpu ✓ performance ✓ + language + programming + open-source

[no-title]

Polars, a DataFrame library designed for performance, has introduced GPU execution capabilities that can achieve up to a 70% speed increase compared to its CPU execution. This enhancement is particularly beneficial for data processing tasks, making it a powerful tool for data engineers and analysts looking to optimize their workflows.

Saved by tldr-importer · Last saved October 29, 2025 · 1 min read

+ polars gpu ✓ + data-processing performance ✓ + speed-up

'I paid for the whole GPU, I am going to use the whole GPU': A high-level guide to GPU utilization

GPUs are critical for high-performance computing, particularly for neural network inference workloads, but achieving optimal GPU utilization can be challenging. This guide outlines three key metrics of GPU utilization—allocation, kernel, and model FLOP/s utilization—and discusses strategies to improve efficiency and performance in GPU applications. Modal's solutions aim to enhance GPU allocation and kernel utilization, helping users achieve better performance and cost-effectiveness.

Saved by tldr-importer · Last saved October 29, 2025 · 6 min read

gpu ✓ + utilization performance ✓ + neural-networks + inference

How to Spot (and Fix) 5 Common Performance Bottlenecks in pandas Workflows | NVIDIA Technical Blog

The article discusses five common performance bottlenecks in pandas workflows, providing solutions for each issue, including using faster parsing engines, optimizing joins, and leveraging GPU acceleration with cudf.pandas for significant speed improvements. It also highlights how users can access GPU resources for free on Google Colab, allowing for enhanced data processing capabilities without code modifications.

Saved by tldr-importer · Last saved October 29, 2025 · 5 min read

+ pandas performance ✓ gpu ✓ + data-processing + acceleration

3 pandas Workflows That Slowed to a Crawl on Large Datasets—Until We Turned on GPUs | NVIDIA Technical Blog

Many pandas workflows slow down significantly with large datasets, leading to frustration for data analysts. By utilizing NVIDIA's GPU-accelerated cuDF library, common tasks like analyzing stock prices, processing text-heavy job postings, and building interactive dashboards can be dramatically sped up, often by up to 20 times faster. Additionally, advancements like Unified Virtual Memory allow for processing larger datasets than the GPU's memory, simplifying the workflow for users.

Saved by tldr-importer · Last saved October 29, 2025 · 3 min read

+ pandas gpu ✓ + cudf + data-analysis performance ✓