From 4c0982a903cea246e4679c5620fa94e691885486 Mon Sep 17 00:00:00 2001
From: Shreya Shankar
- New IDE Released!{" "} - - Dec 2, 2024 - - ! Try out our new web-based IDE. -
-- New blog post!{" "} - - September 24, 2024 - -
++ Launched our IDE!{" "} + + Dec 2024 + +
++ New paper on agentic query optimization!{" "} + + Oct 2024 + +
++ New blog post!{" "} + + Sep 2024 + +
+- While traditional database systems excel at structured data - processing, semantic operations powered by LLMs bring - unprecedented expressiveness and flexibility. However, these - operations introduce new challenges: they can be incorrect, - are computationally intensive, and typically rely on remote - API calls. We're reimagining data systems throughout the - stack to address these unique challenges. Here are some - projects we are working on: + Traditional databases are great for structured data, but they + weren't built for the Gen AI era. We're rethinking + how data systems should work with LLMs - making them more + reliable, cost-effective, and actually usable in production. + Here's what we're working on:
- Current LLM-powered systems focus mainly on cost - reduction. But for complex tasks, even well-crafted - operations can produce inaccurate results. The DocETL - optimizer uses LLM agents to automatically rewrite - pipelines, by breaking operations down into smaller, - well-scoped tasks to improve accuracy.{" "} + Most LLM systems just try to cut costs, but accuracy is + the real challenge. Our optimizer uses LLM agents to + automatically break down complex operations into smaller, + more focused tasks; kind of like having a smart teaching + assistant that helps structure your work. Early results + show this approach can significantly improve reliability. + We are also working on finding plans that are both cheap + and accurate.{" "} - Read our paper → + Check out our paper →
- Our users consistently highlight map operations as the - most valuable feature, but these require at least one LLM - call per document—making them prohibitively expensive at - scale. We're exploring novel techniques to - dramatically reduce costs for open-ended map operations - without sacrificing accuracy. + Users consistently highlight map operations as the most + powerful operationts, but they can get expensive fast - + imagine paying for an LLM call on every single document if + you have tens of thousands of documents. We're + working on techniques to dramatically reduce these costs + without compromising on quality. Approximate query + processing will have its comeback!
- Semantic operations are highly expressive, but this power - comes with a challenge—they can be fuzzy and ambiguous in - practice. Consequently, users often need many iterations - to get semantic operations right. Through the DocETL IDE, - we're designing interfaces that help users explore - data, refine their intents, and quickly iterate on prompts - and operations. + Prompts are the primary interface between humans and + LLM-powered data systems, but crafting them is more art + than science. Our IDE explores new ways to make prompt + engineering systematic and intuitive, with interactive + tools that help users express their intent clearly and + debug unexpected behaviors.{" "} + + Try it yourself → + {" "} + + (Paper coming January 2025) +
- There are many domain-specific unstructured data processing - needs that can benefit from systems like DocETL. We work with - partners at universities, governments, and institutions to - explore how AI can improve data workflows, especially for - domain experts and those who may not have data or ML - expertise. If you'd like to learn more (e.g., bring - DocETL to your team or join our case studies), please reach - out to{" "} + We're working with universities, governments, and + organizations to solve real-world data challenges - especially + for teams without ML expertise. Want to use DocETL for your + project or be part of our case studies? Drop a line at{" "} (