Quit Emailing Yourself

# performance → spark

2 links tagged with all of: performance + spark

Links

The Fastest Way to Insert Data to Postgres - Confessions of a Data Guy

To efficiently insert large datasets into a Postgres database, combining Spark's parallel processing with Python's COPY command can significantly enhance performance. By repartitioning the data and utilizing multiple writers, the author was able to insert 22 million records in under 14 minutes, leveraging Postgres's bulk-loading capabilities over traditional JDBC methods.

Saved by tldr-importer · Last saved October 29, 2025 · 3 min read

+ postgres spark ✓ + data-ingestion performance ✓ + python

Accelerating LinkedIn Sales Navigator's search system with Spark transformations

LinkedIn optimized its Sales Navigator search pipeline by migrating from MapReduce to Spark, reducing execution time from 6-7 hours to approximately 3 hours. The optimization involved pruning job graphs, identifying bottlenecks, and addressing data skewness to enhance efficiency across over 100 data manipulation jobs. This transformation significantly improves the speed at which users can access updated search results.

Saved by tldr-importer · Last saved October 29, 2025 · 6 min read

spark ✓ + data-pipeline performance ✓ + optimization + linkedin