Skip to main content

Command Palette

Search for a command to run...

AWS User Group

Updated
5 min read

Stop Guessing Start Testing

Performance Engineering

The discipline to build, architect systems with observability and continuous validation to achieve high performance. Knowing all the trade-offs in those decisions.

Performance Testing

After the whole system is built in each iteration it should have been validated and tested. This acts as assurance policy for the visibility of what is going to happen in certain scenarios.

Resiliency vs Reliability vs Efficiency

Reliable system is the one which works and has very high uptime. Is it performing correctly? No failures? Where as the resiliency comes from the recovery once the system breaks. And every system will.

Efficiency is easy. It is the cost for the system reliability metrics.

Performance Engineering Playbook

The guide of set rules to achieve reliable system.

Performance testing types

  • load - testing system under usual conditions

  • stress - the bottleneck of how much a system can take

  • endurance - answering for how long can the system sustain the load (targeting memory leaks and other problems that occur over time)

  • scalability - ability to scale up and down over longer time when the customer base changes or the demand

  • spike - responsivity to handle rapid change overloads

  • volume - testing of the amount of data to pass through

Testing as Discipline

Unit testing as the base with integration built in and E2E on top.

Discipline across all roles as devs, architects, sre, qa and managers.

Metrics

Metrics set the baselines and find the sweet spot of workload.

Frameworks

Framework Language
JMeter XML
K6 Javascript
Locust Python

Distributed LoadTesting on AWS

  • single tenant

  • open source

  • fully supported by AWS

  • traffic from AWS

  • works as orchestrator for global traffic testing

  • integrateable to CI/CD

  • spins up 10k of containers in a min

  • 200 000mil of requests in 6min

    • all requests split per endpoint for results

    • great for understanding bottlenecks

  • 3.6bil of requests in 5min

  • DDoS attack easily flagged, needs unblocking by docs

  • MCP server capability

Karpenter / Kera - CNCF project

What is Karpenter project split the target focus in those 4 categories of problems

Scaling but slow to respond

Instead of using prometheus to pull metrics and scrape the cluster of each pod they use a sidecar container in each pod to push these. It get to see metrics much more responsively. Supports OTel. One of the keys in responsiveness is perfectly set limitations based on the right metrics. If the gpu node is flat out it does not mean straight away to scale up. Maybe vertically but that would make a disruption. The better indicator would be the job queue.

Scaling but expensive

Fewer nodes is better. Not only count but type is as well, that means the NodePool design. All of this can be set in Karpenter using the specs of Node overlays. extra thing is reserved capacity.

GPU time slicing can be used to parallelize gpu workloads using the same unit.

Scaling but slow to start

Multiple paralelizations have been used to speed up the spin up.

A quote worth mentioning: container is for code not models.

Using model quantization the model size can be lowered and using S3 buckets the image pull is sped up even more.

Scaling but breaks

Everything works but it breaks at some stage. What now? To keep the reliability at certain time due to Node rebuilt we can control disruption. This is to manage the node drift, drifts will happen for example due to updates of control plane.


LinkedIn post

As we move further and further the global systems are becoming very dynamic, elastic. To achieve high reliable systems that can withstand high spikes during black friday or any other high demanding days Luis Guirigay presented a AWS distributed system for testing systems in these scales using global traffic coming from AWS backbone. Over a demo, in numbers it could spin up 10k containers in a minute and create 200 mil requests over 5 minute run. Another run could make 3.6 billion requests in 5min. All request in the result report is being shown per endpoint for better analysis of bottlenecks. This system has significant impact when engineering performance to get an assurance by testing and see the bottlenecks of throughput. Thanks to Christian Mendelez we got a little exposure on Karpenter and how it solves some issues in clusters. Even though it is scaling but it could be not fast to respond, expensive (non efficient node pool scaling), slow to start (slow cold spin ups) or the cluster breaks something during the scaling process.

After this meetup from which appreciation goes to Ronan Guilfoyle for organizing this AWS user group session, I have been able to get a look into Amazon engineering building given by my friend and classmate Matteo Mastore who is an intern in AWS. Thanks for the absolutely stunning evening.