Openai Recovers 30,000 Cpu Cores With Fluent Bit Tweak

Sedang Trending 2 bulan yang lalu

ATLANTA — When systems turn ample enough, moreover very mini optimizations tin lead to very ample savings.

This was nan instruction that OpenAI Technical Staff Member Fabian Ponce imparted earlier nan keynote crowd astatine KubeCon+CloudNativeCon North America 2025, being held this week successful Atlanta.

OpenAI’s Observability Challenge astatine Scale

Each loop of OpenAI’s ChatGPT person brought large improvements, on pinch much Kubernetes clusters and greater volumes of postulation — “And orders of magnitude much telemetry to support it each running,” Ponce said.

In bid to make it each tally smoothly, OpenAI requires “an perfectly monolithic magnitude of telemetry and making it fast, queryable and actionable astatine scale,” he said.

Fluent Bit’s Critical Role successful Data Telemetry

OpenAI runs Fluent Bit, an observability level stewarded by nan Cloud Native Computing Foundation, connected each Kubernetes node. It digests log files and enriches them pinch samples of web streams, formats nan results and sends them to nan due information stores.

With architecture, Fluent Bit generates 10PBs of information a day, stored connected Clickhouse.

The Drive for Resource Efficiency Amidst Massive Growth

OpenAI, Ponce admitted, has an “absolutely insatiable appetite” for GPUs. OpenAI CEO Sam Altman has plans for nan institution to usage of complete 1 cardinal GPUs by nan extremity of nan year, and promises to summation that number 100x.

And each those GPUs will besides request CPUs to run.

So contempt these gargantuan acquisition orders, nan company’s observability engineers, anyway, are still mindful of utilizing resources efficiently. So 1 ngo is to make Fluent Bit arsenic “lean arsenic possible.”

Using perf, a Linux instrumentality for gathering capacity data, nan observability squad looked astatine nan CPU cycles Fluent Bit was using. Ponce hypothesized that astir of nan activity Fluent D was doing would beryllium successful preparing and formatting nan incoming data.

Uncovering a Surprising CPU Bottleneck With perf

But what amazed Ponce, was that this wasn’t nan lawsuit astatine all. Instead, astatine slightest 35% of nan information was chewed up by a azygous usability (fstatat64) whose intent was to fig retired really ample log files were earlier reference them.

So nan squad turned disconnected this capacity — and nan results were instantly apparent:

“The results speak for themselves,” Fabian Ponce told nan crowd. “We person a caller load shape present that uses astir half arsenic overmuch CPU while doing precisely nan aforesaid work.”

Every clip a caller record is written, Fluent Bit executes nan fstatat64 to publication nan size of nan file.

“If nan process is continually emitting caller logs, statement by line, past Fluent Bit is going to title that, and proceed to tally fstatat64 each clip that happens,” Ponce explained. “That is going to pain a ton of other compute.”

And it turns retired nan institution didn’t really request that information, astatine slightest not astatine that level of nuance.

The Impact of Disabling a Hungry Function

While nan attraction squad knew nan alteration would trim CPU usage, possibly they would beryllium forgiven for not realizing really overmuch savings would accrue.

In fact, erstwhile Fluent Bit was modified system-wise, it ended up “returning astir 30,000 CPU cores to our Kubernetes clusters,” Ponce said.

“If we tin return a CPU to each node, past possibly that’s 1 much microservice that we tin fresh into a fixed host,” he said.

The squad went connected to optimize Fluent Bit successful different ways arsenic well, though this 1 tweak had nan biggest wide impact. The company’s engineers are preparing for Fluent Bit a spot that would let users to specify a little period of notifications.

Key Takeaways for Performance Optimization

The takeaway for Ponce was clear: There is ever worth successful breaking retired your “profiler of choice, and seeing what is happening nether nan hood. ”

As famed Golang programmer Rob Pike erstwhile advised successful his Five Rules of Programming: “You can’t show wherever a programme will walk its time. Bottlenecks hap successful astonishing places.”

And successful ample distributed systems, those small bottlenecks tin beryllium costly unless they are uncorked.

You tin bask nan full talk here:

YOUTUBE.COM/THENEWSTACK

Tech moves fast, don't miss an episode. Subscribe to our YouTube channel to watercourse each our podcasts, interviews, demos, and more.

Group Created pinch Sketch.

Selengkapnya