Quit Emailing Yourself

# python → data-processing

4 links tagged with all of: python + data-processing

Click any tag below to further narrow down your results

Links

Spark Declarative Pipelines Programming Guide

This article explains Spark Declarative Pipelines (SDP), a framework for creating data pipelines in Spark. It covers key concepts like flows, datasets, and pipelines, along with how to implement them in Python and SQL. The guide also includes installation instructions and usage of the command line interface.

Saved by tldr-importer · Last saved February 14, 2026 · 5 min read

+ spark + pipelines data-processing ✓ + sql python ✓

GitHub - Eventual-Inc/Daft: Distributed query engine providing simple and reliable data processing for any modality and scale

Daft is a distributed query engine designed for large-scale data processing using Python or SQL, built with Rust. It offers a familiar interactive API, powerful query optimization, and seamless integration with data catalogs and multimodal types, making it suitable for complex data operations in cloud environments. Daft supports interactive and distributed computing, allowing users to efficiently handle diverse data types and perform operations across large clusters.

Saved by tldr-importer · Last saved October 29, 2025 · 3 min read

data-processing ✓ + distributed-computing python ✓ + sql + multimodal

Pandas Switches from NumPy to PyArrow for Enhanced Data Processing Speed

Python's Pandas library has moved away from using NumPy in favor of the faster PyArrow for data processing tasks. This shift aims to improve performance and efficiency in handling large datasets, highlighting a significant change in the way data manipulation is approached in Python environments.

Saved by tldr-importer · Last saved October 29, 2025 · 1 min read

python ✓ + pandas + pyarrow + numpy data-processing ✓

GitHub - anishathalye/semlib: Build data processing and data analysis pipelines that leverage the power of LLMs 🧠

Semlib is a Python library that facilitates the construction of data processing and analysis pipelines using large language models (LLMs), employing natural language descriptions instead of traditional code. It enhances data processing quality, feasibility, latency, cost efficiency, security, and flexibility by breaking down complex tasks into simpler, manageable subtasks. The library combines functional programming principles with the capabilities of LLMs to optimize data handling and improve results.

Saved by tldr-importer · Last saved October 29, 2025 · 2 min read

+ semlib data-processing ✓ python ✓ + functional-programming + llms