What OpenAI's Data Agent Article Tells Us About the Future of Enterprise Analytics
Last week, OpenAI published a detailed breakdown of their internal data agent: a system built to help employees query 600+ petabytes across 70,000 datasets. The article provides rare visibility into what it actually takes to make AI work reliably with enterprise data.
For us at Gravity, reading their approach felt validating. We arrived at the same fundamental conclusions independently while building Orion.
The Hard Truth About Text-to-SQL
OpenAI's team learned what we discovered early in Orion's development: you cannot simply point an LLM at a data warehouse and expect reliable results. Even with the most advanced models available, accurate data analysis requires substantial supporting infrastructure.
Their solution involves six layers of context:
- Table metadata and lineage
- Human annotations from domain experts
- Code-level definitions extracted via Codex
- Institutional knowledge retrieved through RAG
- Organizational and personal memory
- Runtime schema inspection
This isn't over-engineering. This is the minimum viable architecture for production-grade AI data analysis.
Where the Problem Gets Harder
OpenAI built their agent for internal use by data teams: people who understand SQL, can validate query logic, and know when results look suspicious. It's designed to be reactive: you ask a question, it generates SQL, and returns results for you to interpret.
That works well when your users are data professionals. But most enterprises need something different.
When a CFO asks, "What was our revenue last quarter?" they need more than raw SQL and a number. They need to know: was that GAAP revenue or ARR? Did it include refunds? How does it compare to the forecast? What drove the variance?
The analysis itself is just the starting point. What matters is understanding what the numbers mean and what to do about them.
Building Orion for Proactive Analysis
We built Orion with a different design goal: make sophisticated data analysis accessible to business teams, not just data specialists.
That required moving beyond reactive query generation to proactive investigation. Orion doesn't wait for users to ask the right questions. It analyzes continuously, connects internal metrics with external signals (market trends, weather patterns, demographic shifts), and surfaces insights before anyone thinks to look.
The system understands your business context through the same kind of layered architecture OpenAI describes:
- Deep integration with your existing data stack (BigQuery, Snowflake, Looker, dbt)
- Institutional knowledge from documentation, Slack, and previous analyses
- Long-term memory that learns from corrections and feedback
- Transparent reasoning so users can verify how conclusions were reached
But Orion goes further. It doesn't just answer "what happened." It investigates why it happened, identifies root causes, and recommends specific actions.
The Real Challenge: Trust at Scale
OpenAI emphasizes something critical in their article: they use golden queries as continuous evaluations to ensure quality doesn't drift. This resonated deeply with our team.
Accuracy isn't a feature. It's the entire foundation. As our CTO Drew Gillson puts it: "AI earns trust slowly and loses it instantly."
We built Orion with this principle as the primary design constraint:
- Every insight links back to the source data
- Analyses are explainable, not black boxes
- The system operates within existing security and access controls
- Continuous validation against known-good results
In production, this means Orion can automatically deliver personalized analytics to hundreds of clients or internal teams. It transforms what used to take data teams weeks into automated reports that arrive before business users start their day.
What This Means for Enterprise Analytics
OpenAI's article demonstrates that building reliable AI data systems requires serious engineering investment. They're absolutely right about that.
The question for most enterprises isn't whether this infrastructure is necessary. The question is whether building it internally is the best use of limited resources.
Most companies don't have OpenAI's talent density or engineering capacity. They need solutions that work within their existing data infrastructure without requiring ground-up rebuilds.
That's exactly what we designed Orion to be: an autonomous AI analyst that integrates with the tools you already use, learns your business context, and delivers trustworthy insights at scale without requiring a team of ML engineers to maintain it.
Looking Forward
OpenAI's transparency about what it takes to build production AI for data analysis helps set realistic expectations across the industry. The shortcut doesn't exist. Context is foundational. Trust must be earned.
We're grateful they shared their approach. It validates the hard work our team has invested in building Orion the right way.
If you're interested in how Orion can bring this level of analytical capability to your organization, schedule a demo to see it in action.