Quit Emailing Yourself

# databricks → spark

3 links tagged with all of: databricks + spark

Click any tag below to further narrow down your results

Links

GitHub - StabRise/spark-pdf: PDF DataSource for Apache Spark, allow to read PDF files directly to the DataFrame and ocr it

The project provides a custom data source for Apache Spark, enabling users to read PDF files into Spark DataFrames. It supports efficient reading of large PDF files, including scanned documents with OCR capabilities, and is compatible with various Spark versions and Databricks. The package is available in the Maven Central Repository and includes various configuration options for handling PDFs.

Saved by tldr-importer · Last saved October 29, 2025 · 3 min read

spark ✓ + pdf databricks ✓ + ocr + data-source

[no-title]

The blog introduces the new DataFrame API for table-valued functions in Databricks, which enhances the functionality of data manipulation and analysis in Spark applications. This API allows users to leverage SQL capabilities directly within DataFrames, improving the integration of SQL queries and data transformations. The post includes examples and use cases to illustrate its benefits for developers and data scientists.

Saved by tldr-importer · Last saved October 29, 2025 · 1 min read

+ dataframe-api + table-valued-functions databricks ✓ spark ✓ + sql-queries

How to tune Spark Shuffle Partitions. - Confessions of a Data Guy

Tuning Spark Shuffle Partitions is essential for optimizing performance in data processing, particularly in managing DataFrame partitions effectively. By understanding how to adjust the number of partitions and leveraging features like Adaptive Query Execution, users can significantly enhance the efficiency of their Spark jobs. Experimentation with partition settings can reveal notable differences in runtime, emphasizing the importance of performance tuning in Spark applications.

Saved by tldr-importer · Last saved October 29, 2025 · 7 min read

spark ✓ + performance-tuning + data-engineering + partitions databricks ✓