When generating load artificially in order to test the scalability of a system, the load-generating client needs to keep sending requests independently of the response time.
Generating load is damn hard. It's hard to get a load that matches normal user patterns. Way harder than you expect.
Consider using actual user load, or recording user load and playing it back at N times the rate.
Once your service is running, if you understand the performance characteristics, it can be a good idea to periodically remove the servers until your service dies an ignoble death -- it lets you know what is going to happen first with a higher load without a lot of work.
This is only useful if your service can trust its dependencies to scale better than it. If you have a SQL server, then you have to do the hard work of generating load.
Brian liked this

· Flag
Brian