Platform engineering is gaining traction crossed industries arsenic organizations grapple pinch nan complexity of unreality autochthonal development. Teams want to streamline provisioning, deployment and observability while reducing nan load connected exertion developers. Yet galore find that moving beyond a minimum viable level (MVP) requires a displacement successful approach.
This lawsuit study explores really Cover Whale, a U.S.-based patient specializing successful motortruck insurance, built its internal developer level (IDP) connected Kubernetes and AWS, nan challenges it faced erstwhile scaling beyond an MVP and nan lessons learned on nan way. From managing sprawling Helm charts to integrating systems for illustration NATS, nan travel illustrates nan trade-offs galore enterprises brushwood erstwhile increasing their platform.
The communicative besides highlights wherever level orchestration devices for illustration Kratix provided a mediate crushed betwixt penning civilization operators and stitching together scripts, and what that meant for semipermanent maintainability.
The Platform Engineering Initiative
Recently, Cover Whale initiated a caller security level to amended its customer experience. To streamline nan exertion life rhythm of nan caller platform, an IDP has been implemented to grip provisioning, build, test, deployment and observability of each caller microservice workload.
The IDP is built connected apical of AWS Elastic Kubernetes Service (EKS), which runs a level Kubernetes cluster controlling a fleet of workload Kubernetes clusters.
All nan clusters and underlying infrastructure are provisioned by OpenTofu and Terramate, and are bootstrapped by Argo CD utilizing nan App of Apps pattern.
Cluster bootstrapping is performed by initializing nan cluster pinch shared tooling, specified arsenic Karpenter, external-dns, external-secret usability and an ingress controller. Then, it deploys an Application for each strategy wrong nan IDP, which successful turn, deploys different Application for each workload wrong that system. This full choreography is encapsulated successful a bid of Helm charts loaded by Argo CD into nan clusters.
Challenges Encountered
The setup allowed for nan speedy and businesslike improvement of a minimum viable product.
Deploying and maintaining nan full level of workloads while eliminating nan complexity of creating Kubernetes manifests and build pipelines was a game-changer that proved nan initiative’s worth to some developers and management.
Nevertheless, going beyond an MVP required america to reconsider our attack for respective reasons, which we will item below.
Scalability
Argo CD doesn’t connection overmuch prime for generating Kubernetes manifests. As a result, we utilized Helm charts … a batch of them. To make matters worse, nan Helm building followed nan IDP building of systems and workloads, and had abstracted charts for what sewage deployed successful nan level cluster aliases workload cluster.
A consequence of that is that resources for a circumstantial interest would beryllium dispersed crossed respective Helm charts. As nan IDP grew, this led to a very analyzable codebase pinch related logic dispersed crossed respective charts.
For example, nan external-secret integration was scattered crossed 3 different Helm charts.
As a result, gearing up toward a much scalable setup required a paradigm shift.
Limited Integration With Non-Kubernetes Resources
The MVP relied connected a fewer interactions pinch non-Kubernetes resources. For example, we make Elastic Container Registry (ECR) repositories and related personality and entree guidance (IAM) roles utilizing AWS Controllers for Kubernetes.
However, since our workloads trust connected nan messaging strategy NATS and Synadia Cloud, dynamically generating NATS users was much than desirable, and location isn’t an usability to do so. We could person utilized Helm hooks, but that felt unreliable, and penning a full-blown usability seemed a larger undertaking than we would person liked astatine that time.
Backend-Frontend Integration
We utilized Port.io arsenic a developer portal to coming a accordant frontend for developers and heighten our developer acquisition (DX). The portal provided an overview of deployed workloads, self-service actions that developers could usage to negociate their workloads and links to observability devices and git repositories.
To accurately correspond our systems and workload building successful nan portal, we relied connected labels and annotations to nan Argo CD Applications and ApplicationSet representing nan systems. Although functional, this setup was somewhat messy.
For example, a strategy consists of respective Applications and an ApplicationSet. This led to immoderate disorder erstwhile we had to adhd accusation connected nan frontend regarding wherever nan labels/annotations needed to beryllium added.
Ideally, we would person liked to person a civilization assets meaning (CRD) practice of nan IDP building that Port.io could publication unequivocally. However, maintaining those CRDs solely for that intent felt overrated.
On nan different hand, utilizing those CRDs to thrust nan IDP and trim our reliance connected Helm charts made a full batch of sense!
Introducing Kratix Into Our IDP
Kratix offered a bully mediate crushed betwixt penning our ain usability and hooking up immoderate scripts to lick nan issues mentioned above.
Kratix monitors nan authorities of our resources and allows for periodic reconciliations, sparing america from having a dedicated workload to watch nan kube-apiserver, while enabling america to negociate nan full life rhythm of our resources.
Using nan promises approach, we could besides stitchery circumstantial concerns together without worrying astir nan IDP hierarchy.
The Proof of Concept
During nan improvement of nan caller platform, we had to aggregate APIs from aggregate services and expose them externally successful a accordant manner, frankincense creating a bully abstraction furniture betwixt outer and soul services. This included APIs from some nan bequest and caller customer portals, arsenic good arsenic webhooks for various outer services.
We decided to adhd support for this successful our workload manifest, enabling developers to negociate their APIs successful a distributed mode and person nan IDP aggregate each APIs nether a azygous domain name.
To do so, we created 2 Kratix promises:
- ApiAggregator, which would state nan hostname to which nan APIs are exposed.
 - ApiAggregatorTarget, which attaches to an ApiAggregator and defines a database of paths and nan target work to which requests should beryllium sent.
 
Behind nan scenes, Kratix would create each nan Gateway API resources to fto nan magic happen.
We utilized ytt to make nan manifests wrong our committedness pipeline, which made nan YAML templating process easier and avoided each YAML quirks to guarantee a valid YAML entity arsenic a result.
Argo CD being already successful nan map, deploying Kratix was reasonably easy. We conscionable created a dedicated authorities shop git repository and added workload clusters arsenic a Destination arsenic portion of nan cluster bootstrapping process.
In nary time, we had a moving impervious of concept, and aft a insignificant hole to support larger petition assemblage sizes, we quickly felt comfortable releasing it into production.
Declarative NATS Support
Following this impervious of concept, we decided to usage Kratix’s afloat setup to destruct nan request for manual guidance of NATS users and credentials.
The task was a spot much complex, arsenic nan committedness would person to interact pinch a SaaS level to make users, but besides safely shop their credentials.
We started by processing a book to negociate nan life rhythm of nan NATS users successful Synadia Cloud and make credentials. To simplify relationship pinch nan API, we utilized restish alternatively of earthy curl commands.
Integrating nan book pinch nan Kratix committedness and managing nan assets position to way nan NATS personification ID and credentials expirations was comparatively straightforward. We ensured that nan concerns for redeeming and loading nan state, creating and deleting users, and credentials were intelligibly separated.
Nevertheless, manipulating nan credentials was nan existent challenge. The logic is that Kubernetes manifests are stored successful a git repository, and storing earthy secrets was a clear no-no.
Manipulating Secrets With Kratix
After exploring aggregate options, we settled connected utilizing Sealed Secret. This instrumentality uses an asymmetric encryption algorithm to encrypt nan concealed utilizing a nationalist cardinal known to nan level cluster and decrypt it utilizing a backstage cardinal known only to nan workload cluster.
This setup worked smoothly — for a fewer hours! We quickly noticed that our workloads had predominant restarts.
Kratix expects your committedness to beryllium idempotent. Effectively, aft a fewer hours, Kratix runs a reconciliation job. In our case, this reconciliation tally created a caller NATS authentication token, which updated nan existing authentication secret, triggering a rolling update of nan workload.
To make matters worse, we can’t decrypt nan existing concealed aliases retrieve it from nan Synadia cloud. Additionally, not emitting nan sealed concealed assets would consequence successful nan deletion of nan concealed from nan workload cluster.
To activity astir this issue, we added nan sealedsecrets.bitnami.com/patch note and generated an quiet sealed concealed to forestall concealed deletion.
Conclusion
Cover Whale’s travel reflects a situation that galore enterprises face: building an IDP that tin turn from impervious of conception to a sustainable, scalable platform. An MVP often demonstrates contiguous value, but scaling requires rethinking patterns astir Helm, CRDs and concealed management.
The squad recovered that adopting a “promise”-based attack pinch Kratix offered a pragmatic way forward. However, nan broader instruction is that enterprises should beryllium prepared to germinate their architecture arsenic caller requirements emerge. Key takeaways for different teams include:
- Be wary of Helm sprawl; a CRD-driven attack whitethorn supply clearer boundaries and trim complexity.
 - When integrating pinch SaaS systems, scheme for life rhythm guidance and idempotency early.
 - Secrets guidance remains a thorny area; reconciliation and rotation strategies must beryllium tested nether existent conditions.
 
By reframing nan IDP astir these lessons, Cover Whale created a much accordant and maintainable level foundation, 1 that continues to germinate arsenic developer needs grow.
YOUTUBE.COM/THENEWSTACK
Tech moves fast, don't miss an episode. Subscribe to our YouTube channel to watercourse each our podcasts, interviews, demos, and more.
Group Created pinch Sketch.
            
            
            
            
            
            
            
            
                    English (US)  ·         
                    Indonesian (ID)  ·