3 min read
|
Saved October 29, 2025
|
Copied!
Do you care about this?
The project provides a custom data source for Apache Spark, enabling users to read PDF files into Spark DataFrames. It supports efficient reading of large PDF files, including scanned documents with OCR capabilities, and is compatible with various Spark versions and Databricks. The package is available in the Maven Central Repository and includes various configuration options for handling PDFs.
If you do, here's more
Click "Generate Summary" to create a detailed 2-4 paragraph summary of this article.
Questions about this article
No questions yet.