5 links
tagged with all of: machine-learning + data-processing
Click any tag below to further narrow down your results
Links
Pinterest has developed an effective Feature Backfill solution to accelerate machine learning feature iterations, overcoming challenges associated with traditional forward logging methods. This approach reduces iteration time and costs significantly, allowing engineers to integrate new features more efficiently while addressing issues like data integrity and resource management. The article details the evolution of their backfill processes, including a two-stage method to enhance parallel execution and reduce computational expenses.
Klaviyo utilizes Ray's open-source framework to enhance data processing, model training, and hyperparameter optimization across large datasets. By employing Ray Data, Ray Train, and Ray Tune, the company streamlines its machine learning workflows, allowing for efficient handling and deployment of models while managing compute costs effectively.
Xorq is a batch transformation framework that integrates with multiple engines like DuckDB, Snowflake, and DataFusion, allowing for reproducible builds and efficient data processing. It features a YAML-based multi-engine manifest, compute catalog, and supports scikit-learn for machine learning pipelines. Xorq focuses on deterministic batch executions, enabling easy sharing and serving of compute artifacts across teams.
Pinterest has enhanced its machine learning (ML) infrastructure by extending the capabilities of Ray beyond just training and inference. By addressing challenges such as slow data pipelines and inefficient compute usage, Pinterest implemented a Ray-native ML infrastructure that improves feature development, sampling, and labeling, leading to faster, more scalable ML iteration.
LinkedIn has developed an incremental and online training platform to enhance AI-driven recommendations by enabling rapid model updates and cost-efficient training processes. The platform has demonstrated significant improvements in user interactions and advertisement effectiveness while addressing various engineering challenges such as data ingestion, monitoring, and model calibration. Key infrastructure components, including Kubernetes and Kafka, facilitate seamless integration and operational efficiency in training and serving machine learning models.