Click any tag below to further narrow down your results
Links
This article argues that Clojure may rival Python in the Data Science field due to its general-purpose nature, strong performance on the JVM, and rich library ecosystem. It highlights how Clojure's advantages address Python's limitations, particularly in speed and interop with native code.
This article explains how to use the Pandera library in Python to create data contracts that ensure data quality in pipelines. It highlights the common issues of schema drift and demonstrates how to validate incoming data against defined schemas to prevent errors. The author provides a practical example using marketing leads data.
The article argues that Python, while popular for data science, is not the best choice for many tasks outside of deep learning. It highlights the frustrations users face due to Python's cumbersome tools and compares its performance to R in data analysis tasks. The author shares personal experiences from a research lab to illustrate these points.
The removal of Python's Global Interpreter Lock (GIL) marks a significant shift in the language's ability to handle multithreading and concurrency. With the introduction of PEP 703, developers can now compile Python with or without the GIL, enabling true parallelism and reshaping how systems are designed, particularly in data science and AI. This change presents both opportunities and challenges, requiring developers to adapt to new concurrency patterns.
The article provides a practical guide to causal structure learning using Bayesian methods in Python. It covers essential concepts, techniques, and implementations that enable readers to effectively analyze causal relationships in their data. This resource is tailored for data professionals looking to deepen their understanding of causal inference.
Python data science workflows can be significantly accelerated using GPU-compatible libraries like cuDF, cuML, and cuGraph with minimal code changes. The article highlights seven drop-in replacements for popular Python libraries, demonstrating how to leverage GPU acceleration to enhance performance on large datasets without altering existing code.
Kedro is an open-source Python framework designed for creating production-ready data science and data engineering pipelines. It emphasizes software engineering best practices to ensure reproducibility, maintainability, and modularity, and offers various features like a project template, data catalog, and flexible deployment options. The framework supports collaboration among teams with diverse software engineering knowledge and is maintained by a growing community of contributors.