Click any tag below to further narrow down your results
+ ai
(6)
+ automation
(6)
+ compliance
(6)
+ data-engineering
(4)
+ analytics
(4)
+ ai-agents
(4)
+ cybersecurity
(3)
+ sustainability
(2)
+ accountability
(2)
+ systems-of-record
(2)
+ business-intelligence
(2)
+ artificial-intelligence
(2)
+ enterprise-architecture
(1)
+ metric-definitions
(1)
+ cloud-technologies
(1)
Links
This article discusses how organizations can effectively implement agentic AI, highlighting real-world examples and offering guidance on development frameworks and integration. It also covers the ethical and technical challenges of using generative AI, with insights from experts on navigating data governance and deployment strategies.
Yelp outlines its approach to processing Amazon S3 server-access logs at scale, addressing challenges like high log volume and storage costs. They now compress logs into Parquet files, greatly reducing storage needs and improving query performance for analytics tasks. This system supports various operational use cases, from debugging to cost analysis.
Material helps organizations protect sensitive data without disrupting collaboration. It automates the discovery and classification of files in Google Drive, monitors sharing behaviors, and enforces data governance policies. The platform also detects potential threats and provides a scorecard to evaluate security posture.
This article outlines the importance of having governed and discoverable data for successful AI projects. It highlights common pitfalls in AI implementation and presents a structured approach to ensure data quality and compliance. A roadmap is provided for creating a reliable data stack that supports effective AI systems.
This article discusses the limitations of traditional BI tools' semantic layers and introduces the Boring Semantic Layer (BSL) as a more pragmatic solution. BSL aims to streamline the process of defining metrics and relationships, making them accessible across various platforms without the complexity of existing tools. It integrates with existing data pipelines and allows for easier governance and multi-modal data access.
This article argues that data teams should transition to context engineering, integrating data governance, engineering, and science to create reliable knowledge sources for AI agents. It highlights the need for a structured context stack to ensure accurate answers and effective performance from these agents.
This article discusses how prompt injection techniques can enhance data governance by alerting users about sensitive information in corporate documents before they interact with AI tools. It highlights experiments with embedding warnings in documents to raise awareness and prevent data leaks through unapproved AI applications.
This article covers recent advancements in technology, including new AI capabilities from IBM and Cisco, as well as updates on cloud revenue driven by generative AI. It also highlights trends in data governance and unified communications.
This article discusses the evolution of data governance from a rigid, compliance-focused approach to a more dynamic, context-driven model. It argues that as AI systems become more autonomous, organizations need to shift from controlling data to ensuring accountability and intentionality in how data is used. The author emphasizes the importance of negotiating meaning and maintaining oversight in increasingly complex socio-technical environments.
This article discusses the evolving role of observability in organizations, highlighting a significant increase in maturity and the challenges of managing costs. It emphasizes the need for businesses to improve reporting on the impact of observability and the importance of democratizing data across various teams.
Richard Glew discusses the importance of improving data quality testing by applying established software testing principles. He highlights the differences between software and data engineering, emphasizing the need for a structured quality strategy and the involvement of non-technical users in the process. The article sets the stage for practical strategies in future installments.
This article discusses how Google integrates AI agents into its cybersecurity operations. It outlines key lessons learned in building these agents, focusing on trust, real problem-solving, performance measurement, and the importance of foundational practices.
This article critiques common misconceptions about data warehousing, emphasizing that mistrust in data stems from semantic issues rather than technology. It argues for a shift in perspective, viewing data warehousing as a tool for business understanding rather than just an IT system.
The article outlines the challenges enterprises will face in scaling AI systems by 2026. It emphasizes the need for robust data governance, vendor independence, and updated infrastructure to handle the demands of AI workloads. Companies not adapting to these changes risk falling behind.
In this episode, Matt Topper discusses the challenges of identity, credentials, and access control in modern data platforms. He offers practical solutions for managing these issues, including the use of JWTs, policy engines, and database proxies, while emphasizing the need for a unified approach to trust and governance across data systems.
This article explores the gap between the potential of Reinforcement Learning (RL) and its actual use in real-world applications. While RL shows promise for product self-improvement and enterprise automation, many companies are still experimenting with it and face challenges like data governance and talent scarcity. It emphasizes the need for tailored approaches rather than relying solely on improving foundational models.
The article argues that while traditional systems of record aren't dying, they are evolving in response to automation and AI agents. A reliable source of truth is still essential for enterprises, but the way that truth is accessed and managed is changing. The author emphasizes the need for clear definitions and governance as workflows become more complex.
This article outlines how Agent Bricks creates tailored AI agents using your organization’s data. It emphasizes automated evaluation, continuous improvement through human feedback, and offers resources for getting started with AI agents effectively.
Google Cloud introduces new AI-powered features in Cloud Storage, including auto annotate and object contexts, to help organizations analyze and derive insights from their unstructured data. These tools automate the generation of metadata and allow users to attach custom tags, facilitating data discovery, curation for AI, and governance at scale. This shift transforms unstructured data from a passive resource into an active asset driving innovation.
As budget season approaches for data leaders, the article emphasizes the importance of strategic investments in 2026 to adapt to rapidly changing AI technologies. It outlines key recommendations, including creating API infrastructures, investing in solution architect roles, and enhancing data governance practices to prepare for a future where AI agents play a significant role in business processes.
Palantir Technologies' Gotham platform significantly enhances the U.S. government's data integration capabilities, allowing for efficient surveillance and profiling of individuals. However, this raises serious concerns about civil liberties, accountability, and the potential for abuse in a system increasingly driven by proprietary algorithms. The partnership between Palantir and various government agencies suggests a shift in governance where data-driven decision-making may undermine traditional legal safeguards.
AWS is establishing an independent European governance structure for its European Sovereign Cloud, launching a new region in Brandenburg, Germany, by the end of 2025. This initiative aims to meet stringent digital sovereignty requirements, ensuring that customer data remains within the EU and is managed by EU-based personnel, while also providing a comprehensive suite of AWS services. The move reflects a broader trend in Europe towards technological sovereignty and reducing reliance on non-European cloud providers.
Medallion Architecture organizes data in layers but lacks the framework for treating data as a product, such as pricing and governance structures. ODPS 4.0 addresses these gaps by introducing reusable components for service level agreements, data quality, and access models, enabling organizations to monetize and govern their data assets effectively. Combining both models allows for the creation of monetized, governed data products that enhance business value.
Systems of record may be perceived as becoming obsolete due to the rise of AI agents that automate tasks and generate data. However, the author argues that these systems will become increasingly essential in governing AI activities, managing data access, and ensuring compliance. The future may see a transformation of systems of record into control layers, focusing on agent governance rather than merely being places where work occurs.
The article discusses the misalignment of data contracts in organizations, emphasizing that they often do not reflect the actual requirements and expectations of data stakeholders. It advocates for the establishment of clear and effective data contracts to enhance data governance and collaboration. The piece highlights the importance of aligning data contracts with organizational goals to improve data management practices.
+ data-contracts
data-governance ✓
+ data-management
+ organizational-alignment
+ stakeholder-engagement
Amazon SageMaker Catalog now includes a data lineage feature that allows users to visually track and understand data flow across various AWS services like Amazon EMR, AWS Glue, and Amazon Redshift. This feature enhances data governance, quality, and collaboration by providing insights into data origins, transformations, and dependencies, while also supporting compliance and troubleshooting efforts through automated lineage capture.
OpenMetadata is an open-source platform that simplifies metadata management, enabling organizations to effectively manage their data assets through a centralized repository. It addresses challenges such as fragmented data sources and enhances data discoverability, governance, and collaboration by providing features like lineage tracking, data quality monitoring, and a user-friendly interface. With extensive connector support and a schema-first approach, OpenMetadata is gaining popularity in the data engineering community.
Data governance in not-for-profits often struggles due to a lack of structure and resources, leading to issues like inconsistent data and compliance risks. Non-Invasive Data Governance (NIDG) offers a solution by integrating governance into existing roles and processes without adding bureaucratic layers, ensuring that organizations can manage their data effectively while focusing on their mission. This approach promotes accountability and compliance in a practical, sustainable manner.
AI is transforming workplace productivity but introduces significant security challenges, as revealed by a survey of security leaders. Key issues include limited visibility into AI tool usage, weak policy enforcement, unintentional data exposure, and unmanaged AI, highlighting the urgent need for enhanced governance and security strategies to mitigate risks associated with AI adoption.
Data engineering best practices are being challenged by modern demands for speed, agility, and purpose-driven architecture. Experts advocate for a shift from traditional centralized models to more flexible, intent-driven approaches that prioritize real business outcomes and guided autonomy. The need for a balance between standardization and freedom is crucial to avoid chaos and technical debt in data platforms.
Distributed cloud computing offers a decentralized approach to data processing, enhancing security and efficiency while minimizing data breach risks. By integrating Privacy Enhanced Technologies (PETs) and Artificial Intelligence (AI), organizations can ensure secure data analysis and foster collaboration without compromising privacy. This article discusses various architectures like hybrid cloud, multi-cloud, and edge computing, and highlights how PETs like Amazon Clean Rooms and Microsoft Azure Purview can safeguard sensitive information during data processing.
Data governance initiatives often falter due to shifting priorities, lack of consistent purpose, and external pressures such as regulatory changes or technological advancements. The authors argue for a redefinition of success in data governance, emphasizing the importance of sustainability and resilience to adapt to evolving organizational needs. They propose that effective data governance should continuously improve and engage with strategic planning to remain relevant in changing environments.
The Data Act aims to enhance the accessibility and sharing of data within the EU, promoting innovation and fostering a more data-driven economy. It establishes a framework for data governance, ensuring that data is used responsibly while balancing the interests of data providers and users. The Act is part of the EU's broader strategy to become a global leader in digital transformation and data management.
Delta Lake enhances data lakes by providing ACID compliance, schema governance, and time-travel capabilities through its structured transaction log, known as _delta_log. This log records all changes to tables, enabling reliable data pipelines, auditability, and version control, while also supporting concurrent writes and failure recovery. Understanding and managing _delta_log is crucial for building resilient and scalable data platforms.
The article outlines five key concepts in data engineering that are essential for professionals in the field. It emphasizes the importance of understanding data architecture, pipeline construction, data governance, scalable systems, and the use of cloud technologies. These concepts are crucial for building efficient and effective data solutions.
The article discusses the architecture and implementation of a Robust Automated Governance (RAG) system for enterprises, focusing on strategies to enhance data management and compliance. It emphasizes the importance of integrating various data sources and maintaining a structured approach to governance to ensure effective operation and decision-making.
Discrepancies in reported monthly active user counts among different teams stemmed from varying definitions and interpretations of what constitutes an "active user." After a thorough audit, a unified definition was established and implemented consistently, leading to more productive leadership meetings focused on actionable insights rather than resolving conflicting data reports.
Data engineering is evolving rapidly due to the integration of artificial intelligence, necessitating professionals to acquire new skills. Key areas of focus include data architecture, machine learning, and data governance, which are essential for harnessing AI's potential in data-driven decision-making. Continuous learning and adaptation are crucial for engineers to stay relevant in this AI-centric landscape.
+ data-engineering
+ artificial-intelligence
+ machine-learning
data-governance ✓
+ skills-development