For OpenAI employees, asking a mobility moreover arsenic seemingly elemental arsenic really galore ChatGPT Pro users it has successful 1 peculiar state tin beryllium amazingly difficult, fixed that nan needed information whitethorn beryllium stretched beryllium crossed aggregate information sources, each of which is accessible successful a somewhat different way.
Overall, OpenAI has 70,000 different information sets, amassing 600PBs of caller information daily. About 3,500 labor tin entree this data, utilizing immoderate 1 of 15 tools. The institution has ever kept adjacent count of its users, but arsenic it adds much regions and plans and features, tallying numbers becomes much difficult.
But each caller query brought its ain challenges. So analysts often recovered themselves successful extended Slack conversations aliases moreover meetings pinch their peers conscionable to fig retired really to entree nan data.
“Simple questions shouldn’t beryllium this difficult, aliases clip consuming,” said Bonnie Xu, OpenAI method unit personnel connected nan information productivity team, during a talk astatine QCON AI conference, held successful New York past month.
Xu was location to talk a instrumentality nan institution created, called Kepler, to streamline this process.
Kepler is simply a adjuvant supplier designed to answers questions for OpenAI employees, hiding nan sometimes-many tasks it must undertake to find nan answer.
Originally, Kepler was designed chiefly for nan information scientists astatine nan company, but since its motorboat nan personification guidelines has expanded to different individual successful finance, quality resources and different departments.
One Kepler user, according to Xu, really said it was nan closest they’ve knowledgeable an AI strategy travel to AGI.
OpenAI’s Challenges successful Internal Data Retrieval
For nan business analyst, galore of nan database tables mightiness look very similar. One database whitethorn see logged-out users, while different does not. Some tables included encrypted users, while others do not and must person that information joined into nan resulting dataset. Which 1 to use?
Writing nan correct SQL codification to extract nan information tin beryllium difficult arsenic well, particularly if they impact joins crossed different tables.
“Missing 1 nuance tin lead to an reply that is incorrect by an bid of magnitude, and this tin beryllium catastrophic erstwhile making important business decisions,” Xu said.

Is this SQL query correct? Many managers would not beryllium capable to tell.
Kepler successful Action
Built in-house, Kepler is simply a information analyst, 1 that tin pat each of OpenAI’s soul information stores to reply questions. OpenAI labor tin interface pinch Kepler done Slack aliases done an IDE, specified as Cursor, aliases by integration pinch a circumstantial workflow, aliases by mobile aliases different distant clients. In nan background, Kepler uses GPT-5 to parse nan request.
To supply an illustration of really Kepler works, Xu walked done a mobility astir New York taxi travels times, She wanted to cognize which times during nan time person nan astir variance successful recreation times, arsenic good arsenic which journeys are nan “most unreliable,” meaning location pairs (origin and destination) wherever nan variance betwixt nan shortest and longest recreation times is nan greatest.
The demo showed nan “chain of thought,” aliases a bid of evaluations and actions, that Kepler executed to reply this question.
First, it does an soul knowledge search, identifying 2 perchance applicable datasets, including a 2016 postulation of NYC recreation clip information for taxis, which included prime up and driblet disconnected times, and nan zip codes of some nan destination and departure locations.
The supplier past calculated nan median clip for each zip code, identifying nan 95th and 99th percentiles. The supplier makes knowledgeable guesses connected really to constitute nan due queries to get nan needed information, testing each one, and soon finds 1 that works.
“You tin imagine, doing this manually yourself takes a batch of time, but nan supplier is conscionable going done these query and consequence steps connected your behalf,” Xu said. When nan queries look to beryllium correct, it sorts nan results, past does immoderate ray formatting and moreover prepares a floor plan to coming nan information to nan user. (The answers showed that unreserved hours and precocious nights are nan astir unreliable times.)
Another demo Xu provided showed Kepler moving done a mobility of why location was a large spike successful ChatGPT users successful March 2025. It consulted a dashboard and a ngo archive to find nan array that displayed this data. Kepler wrote different queries to effort to pinpoint nan abrupt summation successful usage, specified arsenic querying by region. It looked for errors, specified arsenic log information duplication.
The chain-of-thought identified a imaginable reason, namely nan motorboat of a caller generative image feature. They did a web hunt to cross-reference nan hypothesis, uncovering nan merchandise notes and news articles connected nan launch.
Kepler stores each nan questions, truthful you tin prime up follow-up threads later on. When asked a follow-up mobility astir taxi rides connected February 14, Valentine’s Day, shows that nan supplier knows which tables and queries to use.
You tin besides interrupt Kepler if you spot from its concatenation of thought that it is going successful nan incorrect direction.
And since analysts thin to inquire nan aforesaid type of questions, specified arsenic merchandise study and information validation, Kepler keeps sets of civilization workflows for these sorts of questions
How Kepler Works
At its core, Kepler is simply a group of APIs that pass straight pinch ChatGPT (currently type 5). Kepler besides has nonstop connections to a group of pre-processed information, including an soul information knowledge guidelines and an soul documents service. It tin besides make calls to information warehouses and different information services moving connected Apache Spark, Airflow and different platforms.
Using nan Model Control Protocol (MCP), invented by Anthropic, has been “so helpful” for Kepler, Xu said. Kepler uses nan soul documents to understand really to query databases aliases execute different tasks connected MCP. If nan results didn’t travel retired arsenic expected, it tin past rerun nan query pinch flimsy modifications. In effect, nan Kepler supplier is reasoning by itself.
“So alternatively of you giving nan feedback, Kepler is moving tools, past utilizing nan correct devices to return nan adjacent group of steps, depending connected immoderate feedback that’s given,” Xu said.
Typically, agents moving connected their ain tin return wildly inaccurate results, but pinch further discourse successful hand, they tin understand erstwhile thing is incorrect and effort to alteration their approach.
“So nan really beautiful point is that Kepler tin interactively research nan information itself, and contented is carried complete nan full time,” she said.
The Importance of Metadata
Also helping build retired nan discourse is nan metadata.
“It’s not capable to look astatine nan array by itself, conscionable arsenic it is. You request to understand really nan array was created and wherever it came from,” Xu said. This is nan concealed to nan supplier really knowing nan differences betwixt tables.
An offline occupation is tally to compile this accusation astir each table.
Much of this information has already been compiled by nan company. Rich metadata astir each database array has been captured, specified arsenic why it was created and what it is being utilized for, and moreover what its superior keys are.

How OpenAI generates further metadata from its ain documentation.
It besides uses codex generation to build retired metadata from nan codification itself.
“Since this is each refreshed periodically by an offline job, nan discourse stays caller without immoderate manual involvement.”
If Kepler, aliases a user, finds an error, it saves nan correction successful memory.
“For us, representation is really nan system that helps nan supplier continuously study and improve,” Xu said. “Contacts will get you possibly 80 – 90% of nan measurement there. But sometimes you request those last small corrections that are really difficult to conscionable infer.”

How Kepler saves corrections successful memory.
To measure nan value of nan answers, OpenAI runs an Eval Grader, which offers a people for each reply tested. It looks astatine really nan delivered results disagree from nan expected aliases correct results.
In galore cases, a correct reply whitethorn person a somewhat different SQL query than what was expected, but nan improvement squad plans for this.
“When we comparison resoluteness tests, we really springiness a small wiggle room pinch things that don’t meaningfully effect nan answer,” Xu said.
Users bring their ain credentials to Kepler, frankincense ensuring they don’t spot immoderate information that they don’t person support to see.
Future Steps for Kepler
At present, OpenAI has nary plans to unfastened root Kepler aliases connection it arsenic an endeavor product, Xu said, noting that she isn’t successful nan position to make these decisions.
Nonetheless, moving an agent-based soul information study instrumentality seems to bring a batch of worth to nan company.
“I deliberation astatine slightest from what we’ve heard from our users, straight utilizing Kepler is simply a batch faster. It’s much productive, conscionable because erstwhile you’re looking astatine different sources, that’s a batch of worldly you person to do, and you person to link nan dots,” Xu said. “Kepler is really that furniture connected top, that abstraction that does it for you.”
Videos of each nan QCON AI talks will beryllium disposable via a Video Conference pass, starting January 15.
YOUTUBE.COM/THENEWSTACK
Tech moves fast, don't miss an episode. Subscribe to our YouTube channel to watercourse each our podcasts, interviews, demos, and more.
Group Created pinch Sketch.
English (US) ·
Indonesian (ID) ·