Quit Emailing Yourself

# data-engineering → pyspark

4 links tagged with all of: data-engineering + pyspark

Click any tag below to further narrow down your results

Links

Introducing Apache Spark® 4.1

This article introduces the features of Apache Spark 4.1, highlighting advancements like Spark Declarative Pipelines for easier data transformation, Real-Time Mode for low-latency streaming, and improved PySpark performance with Arrow-native UDFs. It also covers enhancements in SQL capabilities and Spark Connect for better stability and scalability.

Saved by tldr-importer · Last saved February 14, 2026 · 6 min read

+ spark data-engineering ✓ + streaming + sql pyspark ✓

Semantic Operators Meet Dataframes: Building Context for Agents with FENIC

Kostas Pardalis discusses Fenic, an open-source DataFrame engine inspired by PySpark, aimed at enhancing data engineering for AI applications. He highlights how Fenic incorporates semantic operators to improve data transformation and management, addressing the limitations of traditional data infrastructure in the AI era.

Saved by tldr-importer · Last saved February 14, 2026 · 4 min read

+ fenic + dataframes data-engineering ✓ + llm pyspark ✓

High Performance Map Transformations in PySpark | by Abayomi Latunde | Feb, 2026 | Data Engineer Things

This article explains how to use built-in PySpark functions to efficiently manipulate map data types in data pipelines. It covers functions like `transform_keys`, `map_filter`, and `map_contains_key`, highlighting their utility in cleaning and transforming semi-structured data.

Saved by tldr-importer · Last saved February 14, 2026 · 5 min read

pyspark ✓ data-engineering ✓ + maps + transformations + big-data

The Chameleon Architecture: Mastering Schema Evolution with Apache Hudi | by Shaik Sameer | Medium

This article explains how Apache Hudi manages schema evolution in data lakehouses, allowing for seamless changes in data structures without disrupting pipelines. It covers practical implementation using PySpark and highlights the benefits of agility, backward compatibility, and pipeline reliability.

Saved by tldr-importer · Last saved February 14, 2026 · 5 min read

+ schema-evolution + apache-hudi + data-lakehouse pyspark ✓ data-engineering ✓