Quit Emailing Yourself

# data-engineering → spark

3 links tagged with all of: data-engineering + spark

Click any tag below to further narrow down your results

Links

Introducing Apache Spark® 4.1

This article introduces the features of Apache Spark 4.1, highlighting advancements like Spark Declarative Pipelines for easier data transformation, Real-Time Mode for low-latency streaming, and improved PySpark performance with Arrow-native UDFs. It also covers enhancements in SQL capabilities and Spark Connect for better stability and scalability.

Saved by tldr-importer · Last saved February 14, 2026 · 6 min read

spark ✓ data-engineering ✓ + streaming + sql + pyspark

From Chaos to Scale: Templatizing Spark Declarative Pipelines with DLT-META

This article explains how DLT-META, a metadata-driven framework, helps automate and standardize Spark Declarative Pipelines. It addresses common data engineering challenges like scaling, maintenance, and logic consistency, allowing teams to onboard new data sources quickly and efficiently.

Saved by tldr-importer · Last saved February 14, 2026 · 5 min read

+ dlt-meta spark ✓ data-engineering ✓ + automation + pipelines

How to tune Spark Shuffle Partitions. - Confessions of a Data Guy

Tuning Spark Shuffle Partitions is essential for optimizing performance in data processing, particularly in managing DataFrame partitions effectively. By understanding how to adjust the number of partitions and leveraging features like Adaptive Query Execution, users can significantly enhance the efficiency of their Spark jobs. Experimentation with partition settings can reveal notable differences in runtime, emphasizing the importance of performance tuning in Spark applications.

Saved by tldr-importer · Last saved October 29, 2025 · 7 min read

spark ✓ + performance-tuning data-engineering ✓ + partitions + databricks