Data teams that thrived successful nan past activity of Software arsenic a Service (SaaS) level standard weren’t nan ones that chased hype. They were nan ones that made a fewer smart decisions: They adopted cloud-first operations, made costs and capacity visible, and chose architectures that could quickly accommodate to changing conditions.
As it turns out, those are precisely nan aforesaid practices nan agentic era now demands.
If you look astatine really information teams are managing their AI transition, a clear shape emerges: The ones pinch patient power complete capacity and walk had nan easiest clip supporting agents. They were already practicing tenant isolation. They were already making online changes during business hours. They were already utilizing object-storage-backed recovery. Everything agents required from them, they were already doing. They simply applied nan aforesaid principles they were already practicing to a caller benignant of user.
Let’s commencement from that premise: that agents are your caller users. We’ll return a look astatine what makes them different from different users and really champion to support them. Along nan way, we’ll talk 4 architectural factors that impact nan ability to run astatine scale. And we’ll adjacent pinch a checklist you tin usage to measure nan suitability of your existent level for agent-driven workloads.
Agents Are Your New ‘Users’
Most information platforms were built for humans and services pinch comparatively stable, predictable demands. Agentic systems are rather different. They rotation up short-lived apps, tally experiments, trigger migrations, branch disconnected caller datasets and tear it each down — often successful parallel, and unpredictably.
We’ve seen this firsthand pinch companies for illustration Manus. It offers a general-purpose agentic AI level whose “wide research” supplier swarms rotation up thousands of short-lived workloads each day. It’s nary longer managing a azygous monolithic database, but alternatively orchestrating millions of tiny, impermanent branch-like environments down nan scenes.
At scale, what agents request isn’t a monolithic, ever-growing database. It’s efficaciously millions of tiny, isolated databases aliases branches popping successful and retired of existence. Once you judge that premise, 4 requirements for agentic architectures people follow:
- Isolation by default: Per-tenant aliases per-agent boundaries support experiments from becoming everybody’s problem.
- Online alteration during business hours: Schemas and indexes must beryllium adjustable wrong p95/p99 latencies wrong service-level objectives (SLOs).
- Placement and quotas: Hot information needs to beryllium kept adjacent low-latency compute; acold information tin beryllium kept successful inexpensive storage; and noisy tenants request to beryllium isolated.
- Life rhythm automation: You request nan expertise to create and discontinue environments successful seconds pinch cleanable metadata hygiene and costs attribution.
Here are 4 architectural choices that support your agentic users’ needs.
1. The 2 Separations That Matter for Scalability
Agents tin quickly devour shared resources. To support that from happening, abstracted compute from retention truthful you tin adhd query capacity without shifting data. Then abstracted compute from compute to springiness online transaction processing (OLTP), analytics and attraction their ain lanes and SLOs.
Separate Compute From Storage
Attaching stateless SQL/compute engines to durable, shared entity retention lets you:
- Scale elastically: Adding and removing query capacity without nan request for high-wire information copies aliases play migrations.
- Recover predictably: New nodes tin propulsion authorities from retention and lukewarm caches and commencement serving without saturating peers.
- Clone quickly: Copy-on-write branches tin beryllium built quickly from metadata alternatively than complete beingness copies.
What to verify:
- Can you adhd compute nodes successful minutes without rebalancing data?
- Do caller nodes tie from entity retention alternatively than peers?
- Are cloning and branching incremental and space-efficient?
Separate Compute from Compute
When thousands of agents are branching data, building indexes and sending queries astatine nan aforesaid time, SQL frontends, analytical readers, inheritance attraction (compaction, backfills), backup/restore and power planes request to beryllium scaled — and governed — independently, to support them moving smoothly.
What to verify:
- Can you rate-limit backfills independently of OLTP traffic?
- Do analytical scans person their ain resources and guardrails?
- Can you execute type upgrades for 1 level without taking a model connected another?
2. Make Cost Visible (And Actionable)
Traditional information platforms often idle astatine 20% to 25% CPU while maintaining other headroom “just successful case.” That’s survivable pinch quality users; it’s untenable successful an situation wherever agents are spinning up thousands of short-lived workloads. The hole is to make nan costs per query visible — for example, done request-unit (RU) accounting — successful nan aforesaid pane engineers already watch.
That way, engineers cognize which queries to optimize and what savings to expect. Product and finance tin group budgets and caps that representation to existent work, and level teams tin urge improvements based connected existent spend, not gut feel.
What to verify:
- Can you property costs to tenants, apps and query digests?
- Can you enforce budgets and caps automatically?
- Do you person a “Top Five Digests” loop tied to latency and costs regression tests?
3. Treat Object Storage arsenic nan Backbone
For agentic architectures, utilizing entity retention (S3/Google Cloud Storage/Azure Blob) for nan information backbone is not optional. It enables context-aware scaling by pulling information from a shared entity shop and caching basking information locally for ultra-low latency, ensuring nan database is ever nan correct size for nan moment. During scale-out aliases recovery, caller compute should propulsion authorities from durable retention alternatively than copying from peers. Backups and semipermanent snapshots should unrecorded there, too.
Benefits:
- Predictable standard and recovery: Less cross-node thrash during maturation aliases failover.
- Tiered economics: Hot/warm/cold paths you tin logic astir and fund for.
- Fast database branching: Database clones go pointer operations positive object-store semantics.
What to verify:
- Are backups, snapshots and branch metadata stored successful entity retention by default?
- How agelong does it return for a caller node to commencement serving postulation aft a failure?
- Can you garbage-collect abandoned branches and objects automatically?
4. Treat Online Change arsenic a First-Class Capability
When agents are your users, alteration is constant. Schema evolution, indexing, information activity and upgrades must hap online, pinch clear visibility into what is happening.
Here’s what that looks for illustration successful practice:
- Three-phase schema changes (prepare → reorganize → commit) pinch multiversion concurrency power truthful reads/writes proceed while backfills run.
- Rate-limited attraction that respects p95/p99 budgets.
- Rolling upgrades pinch automatic leader predetermination and nary attraction windows.
What to verify:
- Can you adhd an scale to a basking array astatine highest and clasp p95/p99 wrong nan SLO?
- Are metadata locks short and predictable?
- Do you person preflight checks, abort thresholds and a rollback scheme baked into nan pipeline?
Anti-Patterns To Avoid
So that’s what you should effort to do. Here are immoderate things to avoid.
- Sharding complexity: App-level sharding looks elemental until you ain routing, rebalancing, failover and cross-shard joins forever.
- One large pool: Treating each compute arsenic fungible leads to noisy-neighbor incidents and tail-latency spikes.
- Invisible spend: Billing astatine nan lawsuit level hides per-query waste; remember, you can’t negociate what you can’t see.
- Peer transcript dependency: Recovery and scale-out processes that dangle connected engaged neighbors are susceptible to collapsing nether pressure.
A Minimal Evaluation Checklist
Use nan pursuing checklist to comparison platforms for agentic workloads:
- Database provisioning: How galore isolated databases, schemas and branches tin you create per minute? How are they tracked and retired?
- Two separations: Check compute/storage independency and compute/compute independency nether unrecorded load.
- Cost model: How good tin engineers show per-query costs by tenant/app? What caps beryllium and really are they enforced?
- Object storage: Demonstrate node subordinate and betterment that draws from entity storage. Measure clip to service.
- Online change: Test nan expertise to adhd an scale during peak; cheque p95/p99, correction rates and abort thresholds.
- Failure drill: Kill a leader aliases readiness area (AZ); watch election, customer retries and tail latency.
- Metadata hygiene: Prove that abandoned branches and objects get garbage collected without manual tickets.
Agentic systems don’t require a marque caller attack to infrastructure. The correct architecture for agents is nan correct architecture for immoderate large-scale modern usage case. But agents are a forcing function.
Data teams don’t person nan luxury of sticking pinch monolithic platforms that are slow to standard and difficult to manage. Agents will bring those aged architectures to their knees. But arsenic nan astir successful information teams person found, if you creation for flexibility, visibility and capacity utilizing nan methods described above, you’ll vessel faster pinch less play occurrence drills, moreover erstwhile your “users” number successful nan millions — and astir of them aren’t human.
YOUTUBE.COM/THENEWSTACK
Tech moves fast, don't miss an episode. Subscribe to our YouTube channel to watercourse each our podcasts, interviews, demos, and more.
Group Created pinch Sketch.
English (US) ·
Indonesian (ID) ·