Architecture Musings: Load Testing

This is a simplified recipe on load testing for a web-based app like a web-store.

There are three stages in load testing:

Plan Test and Prepare Environment
Run Test and Measure Variables
Extract Results and Produce Reports

Plan Test and Prepare Environment

Load testing environment should mimic production environment closely. Size of instances and network topology should match. It is not practical to have 100 node clusters for load testing, so 2-4 nodes in a cluster is okay.

Instead of having one client to issue the HTTP requests, it is ideal to have multiple clients.

Request "Mix"

Identify all URLs that will be accessed by load test. If the system is already live, then use access log to identify the URLs.
Identify what percentage of users will be accessing what URLs (request "mix"). Once again, the percentage is best derived from access log.
URLs should be accessed by different users or user sessions, typically identified by browser cookies. A user can access a sequence of URLs that represent page flows. But the same user should not access the flow multiple times.

Data Volume

Data volume should closely match production data
Transactions, sessions, users data should be in millions to properly test data access

Run Test and Measure Variables

As mentioned above, and being repeated here for its importance, the URLs should not be accessed by the same user session.

"Runs"

Clients should access identified URLs simultaneously at identified percentages (request "mix")
Start with a number of requests per second which the system can easily handle, say 10 request per second. Let it run for for certain duration, say 30 minutes.
Reset everything - delete if too much new data created, restart systems, etc.
Run with increased number of requests per second, say 12 requests per second for 30 minutes.
Repeat while gradually increasing number of requests per second until system fails, say 50 requests per second

Measurements

Ignore first 5 minute of of each 30 minute run as warm-up time
For the last 25 minutes, measure average response time and throughput of the requests
For each server the request goes through – load balancer, web server, database server – measure average CPU load, memory utilization, disk IO, network traffic.
For database servers, measure average response time of database accesses

Extract Results and Produce Reports

Produce graphs of response time, throughput, CPU load, disk I/O, etc against number of requests per second

Architecture Musings

Friday, December 6, 2013

Load Testing