In real life, it’s frustrating to wait 30 minutes in a restaurant, but in IT, a user starts to feel frustration in only a few seconds.
This is not a science article. It’s only my take on P & L testing.
No matter how great an application is, if response times are slow users will get frustrated. Therefore, performance and load tests are very important to protect an IT investment. Monitoring of response times on different levels in test and production environments is also important for effective troubleshooting.
P & L tests (performance and load tests) aim to eliminate certain risks. They can help us to make sure that an application has got good enough performance and capacity, is stable an robust, and is possible to monitor and troubleshoot.
With application performance I mean speed or response times, speed in batches and response times in GUI’s.
Speed requirements on batch processes are determined by the time planning at hand, i.e. a payment batch that can start earliest at 3 AM and has to be completed before 9 AM due to a bank deadline.
When it comes to GUI’s those 3 seconds I mentioned earlier qualifies as a good requirement. However, all applications suffer from unpredictable, and often unexplainable, peaks with high response times. From that perspective, a more realistic requirement is that 95 % of the transactions must respond in 3 seconds or less. But for advanced searches, for example, users will anticipate more than 3 seconds.
It is necessary to test if an application has got the capacity to serve the expected number of users (or threads if the user is another system) and the expected number of transactions per second without slower response times. The trick here is to convert number of users to number of transactions. Maybe the best approach is to record the main use cases with reasonable think times, and then go from there.
Additionally, to learning if there’s enough capacity, it is good to know how many users and transactions per second that the application can serve without slower response times.
Another positive side effect of capacity tests, is that it will revealed if scaling has a negative impact on response times. Scaling should occur during a good capacity test.
In order to make sure that the application doesn’t degrade in responsiveness over time, it is necessary to execute stability tests. CPU utilization shouldn’t increase, memory shouldn’t be consumed gradually, and response times shouldn’t slow down over time. A 12 hour test should be sufficient.
What happens if load suddenly increases? Consider Black Friday as an example. Can the application scale fast enough? How much slower will response times get? The application should definitely not crash, even if slower response times for a short period of time might be acceptable.
Another important question to ask, and test, is if response times gets back to a normal level as soon as the load does.
If the application is equipped with redundant back-end solutions, it is good to know what a fail-over does to the users.
In production, an application should throw an alarm if there are performance issues. That is much better than if users (customers) call in complaints.
This aspect is tested either by provoking the application with overload or by un-tuning, i.e. creating intentionally creating internal bottlenecks.
When testing, sufficient monitoring is necessary for effective troubleshooting and identification of bottlenecks.
When load gradually increases there are, generally speaking, four different phases of response time behavior: warmup, normal, overload, and extreme overload.
In the warmup phase, response times are slower as caches gets populated, connections are established and initial scaling takes place. In this phase, response time measurements varies a lot and is not suitable for testing.
Response times are much more stable in the normal phase. This phase is perfect for performance and stability tests.
When the load exceeds the capacity, response times often increases in an exponential way. It’s the transition between the normal and the overload phases that capacity tests aims to identify.
After the overload phase, response times usually stabilized as timeouts kicks in and the application responds with errors.
Different types of tests for different types of changes
Small code change/fix or change in infrastructure
If the test object is a small change or bug fix in the application code, or a change in the infrastructure, a comparison approach is possible. Examples of infrastructure change could be a DBMS upgrade or a switch of hardware.
First, baseline tests of different aspects (performance, capacity, …) are executed. Then, after the upgrade, another series of tests are executed, and finally test results can be compared to the baseline results.
With this type of test (comparison test), load can be simulated according to production statistics, or accordingly to the capacity of the test environment.
New application or new function in an application
A new application or a new function is a bit more tricky to test properly. No production statistics are available and the test results can’t be compared to a baseline. Instead, load patterns has to be estimated and test results has to be evaluated.
In a virtualized world, the idea is that applications should not suffer no matter how its components (middleware servers, virtual linux servers, communication, storage, etc.) are deployed in the background. So it should be possible, in theory, to have one virtualized infrastructure that hosts both production and test environments.
Although it is possible, it could be very risky for production and very problematic for performance and load tests. Production can suffer from heavy tests and test results could be unreliable. Despite the intentions, it always boils down to the capacity of the individual components of an application. It doesn’t matter if components are virtual or not, somewhere there are physical components behind the scenes.
Therefore, it is very important that production and test environments are separated from each other. Either completely separated on a hardware level or by quotas of some sort.
What’s the best possible solution of a successful test? Does a successful test result guarantee successful production? I’m afraid not 🙁
If it possible to play-back user transaction logs from production and the test environment is an exact copy of production the best possible conclusion is that production will work with historical load patterns.
If load patterns are simulated and the test environment is an exact copy of production the best possible conclusion is that production will work with the load patterns tested.
If the test environment is not an exact copy of production, or if the test environment has virtual components in it, the best possible conclusion is that the application works in the test environment and with the load patterns tested. And that the test hasn’t revealed any flaws that would risk production.
The industry is moving towards more agile processes, with an increase in the number of delivered software changes. Performance and load testing methods must adapt and become more efficient than it traditionally has been. Manual test runs, compilation and analysis of results are simply to slow. Sure, tuning the process to make it more effective should always be prioritized, but never at the cost of quality. Agile development has however uncovered the need for more effective P & L tests.
I think the answer to being more effective is nightly tests, or daily if you will 🙂
Automated scheduled tests with compilation of measurements and generated reports. With nightly tests, there are always a couple of baselines at hand and therefore new code and platform changes can be tested every other day or so. Of course, there will be situations where manual tests is necessary and there are prerequisites that has to be fulfilled, such as a functionally working application and flawless installation procedures. But with those in place, testing will be faster, I’m sure.
Good luck! And please don’t hesitate to e-mail any questions that has popped up. I would have preferred a live presentation. A static text is never as good as a conversation.