Abstract
How often we have heard that the system is not working properly even after delivery? What are we leaving out? Did we prepare our people well in advance to ensure our systems work? While all these questions seem pretty simple to answer, they are evidently not. We need to keep track of these issues before we make our systems live. Systems usually fail at the end point—operational acceptance tests. IT managers must harness a metrics-driven approach for Operational Acceptance Testing (OAT) to ensure that the system operates the way it is designed without disrupting the installation, network, or business that uses it.
Recognizing the Face of Risk
A rapidly changing IT landscape has increased the infrastructure complexity and cost due to increase in the number of applications to be rationalized. The applications may work well in isolation but when integrated to work as a service, increase the risk of unplanned downtime, while reducing overall revenue, causing non-conformance to regulatory requirements and reputational damage, and making customers unhappy
Applications must be carefully tested before a system is released. However, the traditional testing lifecycle deals only with the functional and performance aspects. It gives rise to gaps in operational testing activities due to complex requirements in a changing landscape. In order to deal with these challenges, firms need to collectively test infrastructure and applications to uncover issues such as unplanned outage and recovery prior to go-live. According to an Information Technology and Intelligence Corporation survey, companies cannot achieve zero downtime; one out of 10 companies needs greater than 99.999% availability. Operational Acceptance Testing reduces downtime and meets the business goals of faster system delivery at a lower cost.
This white paper talks about how companies can deal with downtime and achieve high availability through a metrics-driven approach for Operational Acceptance Testing.
Early Bird Catches the Worm
Operational Acceptance Testing is done before going live to guarantee that the entire configuration on the production system is done accurately. Database, servers, and code are deployed prior to running a batch program in the pre-production environment to circumvent any hazards in the live environment. Application functionality tests on the base infrastructure are also performed before system go-live stage.
Exhibit 1: Operational Acceptance Tests
OAT as an Operational Metrics-Driven Methodology
According to the IT Process Institute's Visible Ops Handbook, 80% of unplanned outages are due to changes and configurations, and release integration issues. The Enterprise Management Association also reports that 60% of the problems result from poor configurations. Below are some of the factors that result in revenue loss due to unplanned downtime xhibit 2.
Exhibit 2: Factors Influencing Revenue Loss Due to Unplanned Downtime
According to the IT Process Institute's Visible Ops Handbook, 80% of unplanned outages are due to changes and configurations, and release integration issues. The Enterprise Management Association also reports that 60% of the problems result from poor configurations. Below are some of the factors that result in revenue loss due to unplanned downtime xhibit 2.
Mean Time Between Failure (MTBF) = Total # of uptime hours ÷ Total # of failures
MTBF=694 ÷ 6 = 115.66
Mean Time to Repair (MTTR) = Total # of downtime hours ÷ Total # of failures
MTTR = 26 ÷ 6= 4.33 .
IT Operational Metrics
Operational Acceptance Testing facilitates IT managers to baseline operational metrics by examining systems. This improves baseline measures to avoid unplanned
downtimes, to reduce the downtime window, and to minimize the downtime cost.
Exhibit 3: IT Operational Metrics
Measuring Availability
Availability is the amount of time a system is functioning. It is often described in terms of nines with the greater number of nines indicating a higher availability rate (see Table 1 below).
The key metrics involved in measuring the availability of the system are MTBF and MTTR.
Availability = [MTBF/(MTBF+MTTR)]* 100
Availability = [115.66/(115.66+4.33)]*100 = 96.39
So, the availability of the database system in this scenario is 96%.
Measuring Reliability
Reliability is the probability that the system is working properly within the defined time period. Common reliability metrics include:
- Probability of failure—the likelihood that a transaction request will fail
- Rate of fault tolerance, which corresponds to failure intensity
- Maximum downtime
Measuring Disaster Recovery
- Recovery Time Objective (RTO): How quickly should critical services be restored?
- Recovery Point Objective (RPO): Before the system fails, from what point should the data be available? How much data loss can be accommodated?
- Recovery Cost Objective (RCO): What is your budget and can it meet your RTO and RPO? How much are you willing to spend on disaster recovery?
Get Granular with a V-Model Approach for OAT
The V-Model, with its elasticity, condenses the total duration of the proect lifecycle due to the number of activities working in parallel. This model has an edge over other models and enables IT managers to tackle operational requirements right from the initiation phase to the test completion phase. The key phases of the V-Model—verification and validation—execute simultaneously. The verification phase uses documents such as operational requirements, infrastructure design, and OAT test plan. This phase also keeps a check on the progress through inspections, formal reviews, and walkthroughs. In the validation phase, actual implementation of the verification phase is carried out. Practical tests on system components, applications, and data are performed in this phase. The key deliverables of this phase include the OAT defects summary and OAT test completion report.
Exhibit 4: V-Model Approach
Connecting Dots
In order to perform Operational Acceptance Testing, it is important to know which OAT components need to be tested. Once this is determined, components and applications together must meet high quality standards—an essential for the success of service delivery.
The key pre-requisites of OAT include
- Live and Production environments
- Presence of live applications on the production environment while carrying out OAT
- Latest release versions of infrastructure components, hardware, operating system, database, and software patches
- Similar hardware devices and network conditions in live and production environments
- Skilled and experienced technical staff working on various technologies
Apart from these key prerequisites, careful selection of environment, test strategy, and tools ensures cost-effective OAT testing. Factors such as support for applicable platforms, script reusability, and total cost of ownership should also be taken into account. An optimal selection of target infrastructure can maximize test coverage.
How to Achieve Distinction with OAT
Below are some methods that help businesses achieve operational value with OAT:
Operations Focus: Thorough testing of operational aspects enables the OAT team to easily detect critical business operational defects.
Business Readiness Perspective: Testing applications and infrastructure, keeping the business readiness in mind, helps the OAT team to deliver an enhanced user experience post-system rollout.
Map Operational Impact: The OAT team should design tests based on the criticality of operational requirements. Tests ensure that optimal QA coverage is achieved with minimal risk.
Measure Operational Metrics: It is extremely important to track operational metrics instead of testing program metrics as it helps to justify credibility by quantifying and communicating value derived from OAT.
Deliver Reliable and Stable Systems with OAT
According to estimates from studies and surveys performed by IT industry analysts, on an average, businesses lose between $84,000 and $108,000 for every hour of IT system downtime. Industries such as banking and financial services, telecommunications, manufacturing, and energy experience the highest
revenue loss during IT downtime. Operational Acceptance Testing ensures that service is delivered with appropriate and proven maintenance and housekeeping processes and procedures. This enables IT managers to meet SLAs/OLAs, deliver reliable and stable systems, increase customer base, build brand image, reduce revenue loss, and fulfill compliance requirements.
About the Author
Vittal Jadhav is a Certified Agile Tester, CSQA, ISTQB, ITIL, and an experienced IT professional having more than 12 years of rich and insightful experience in Infrastructure and Application Availability Services Testing. He has wide experience in Airlines Technology, Media Broadcasting, and Banking, Financial Services, and Insurance BFSI. In addition to this, he has experience in TL, ata arehouse, Non-Functional Testing, Performance ngineering, System Integration Testing, IT Automation, and Functional, and Regression testing. e has extensive knowledge of BFSI testing with more than six years of experience working with a multinational European bank.