7 min read
|
Saved February 14, 2026
|
Copied!
Do you care about this?
Josh Clemm discusses the development of Dropbox Dash, focusing on how it integrates knowledge graphs and indexing to streamline access to work-related content across various apps. He explains the technical challenges and advantages of using index-based retrieval versus federated retrieval, along with the role of MCP in optimizing data processing.
If you do, here's more
Josh Clemm, Dropbox's VP of Engineering, shared insights on the company's work with knowledge graphs and the creation of Dropbox Dash during a recent online course. He emphasized the challenges users face with numerous open tabs and accounts, which complicate access to work-related content. Traditional LLMs struggle to assist because they lack access to proprietary company data. Dropbox Dash aims to address this by integrating content from various third-party apps into a single platform, enabling users to search and retrieve information more efficiently.
The tech stack behind Dash starts with custom connectors that interface with different applications, overcoming unique API challenges, rate limits, and permission systems. Once data is collected, it undergoes normalization, extraction, and enrichment. For example, document text is indexed, images require media understanding, and videos demand scene extraction for effective searching. This information is then modeled into a knowledge graph, linking related documents, transcripts, and individuals, which enhances context for users.
Clemm also explained the choice of index-based retrieval over federated retrieval. While federated retrieval offers ease of setup and fresh data, it has limitations, such as dependency on external APIs and slower response times. Index-based retrieval allows for pre-processing and the creation of enriched datasets but demands significant upfront work and resources. Furthermore, he addressed the challenges of using MCP (Multi-Context Processing), detailing how it can quickly fill context windows and slow down query responses. To mitigate these issues, Dropbox has developed a βsuper toolβ that consolidates multiple retrieval tools and optimizes token usage by leveraging knowledge graphs.
Questions about this article
No questions yet.