Flaky tests—those intermittent failures that undermine confidence in test suites—are a persistent thorn in the side of development teams. As software complexity grows, so does the challenge of ensuring test reliability. In this episode, Srivishnu Ayyagari, a senior product manager at LambdaTest, offers valuable insights into the root causes of flaky tests and strategies to overcome them.
Edited transcription
The type of application and use case defines the testing strategy, says Srivishnu Ayyagari, senior product manager at cloud-based testing platform LambdaTest. For example, in e-commerce applications, “elements or components stay in the same place”, and consequently, require developers to “run the same test case every day.” In these cases, automated testing has become vital, allowing testers to automate the execution of tests on “different browsers and validate their use cases” through test scripts.
Aware of the potential of test automation, the tooling market continues developing software that further increases the reach of what’s possible to automate. In this way, since Srivishnu arrived at LambdaTest, he has been part of their transition from developing manual to automated testing tools. As he explains, working with “different kinds of open source frameworks like Selenium, Cypress, and Playwright” has allowed LambdaTest to “create a plethora” of valuable tools for QA professionals, including “Hyper Execute, Insight and SmartUI for visual regression testing and real-device mobile app testing.”
Understanding —and addressing— test instability
Flaky tests (tests that intermittently pass or fail without any code changes) are the Achilles heel of test automation. The time burden of determining if a test failure indicates a bug or a false positive collides with the swiftness of automation. Furthermore, flaky tests seem unavoidable in today’s development, due to their tight correlation with current practices and architectural choices.
The rise of flaky tests, Srivishnu explains, goes hand in hand with the popularity of microservices architecture, in which “there’ll be different kinds of service, but there’ll be one single application, client application, which will be talking to different kinds of APIs.” Each of these APIs “might not perform at the same level compared to others;” moreover, different services might have varying reliance on external APIs or systems, which can introduce unpredictability. Besides, asynchronous operations or race conditions can lead to inconsistent results, as well as differences in testing environments (local, CI/CD, cloud).
Another root cause behind flakiness is a combination of asynchronous loading time of page elements and test script execution speed. The time it takes for elements to appear on a webpage can vary due to network conditions, device performance, and other factors —for example, the use of third-party features that have to be retrieved from external websites. Test scripts, in turn, often execute faster than page loading, leading to issues when trying to interact with elements that aren’t fully loaded.
In this case, mitigation strategies involve adding deliberate delays in test scripts to ensure elements are loaded before interacting with them and configuring the browser to wait for a specific amount of time before timing out. Likewise, it also helps to improve website performance and reduce the time it takes for elements to load.
Network latency, or the time it takes for data to travel between different locations, is another significant contributor to flaky tests. When applications and their components are distributed across multiple data centers, network performance can dramatically impact test reliability. “For example, a test might pass when running from a location close to the application’s server but fail when executed from a remote location due to increased network latency,” Srivishnu explains. To mitigate network latency, he recommends isolating tests and ensuring adequate network bandwidth. Besides, retrying failed tests will help differentiate between genuine application issues and network-related flakiness.
In all situations, when a “flaky test is found,” says Srivishnu, “you have to take a retry” to “check whether it is a correct result or is in the false positive result.” Here, repetitive failing after consecutive runs means “it is a hundred percent feature problem” that the developers need to address and fix. In turn, if it fails randomly after several runs, “that means that there’s an issue with the execution of the browser test,” and the solution involves checking “whether that command is being called correctly every time” and adding mitigation steps. In this regard, many testing frameworks now include built-in retries to handle transient failures.
Addressing flaky tests with LambdaTest
Srivishnu encourages proactively addressing flaky tests in a test suite such as LambdaTest. Early detection and mitigation are key to maintaining healthy development cycles. “Once your automation tests start growing, it will be very hard to manage and see how many of the tests are giving false positive results,” he points out.
Beyond simply identifying flaky tests, understanding the underlying causes is part of remediation. To this end, LambdaTest features error categorization insights. “Providing users with just information about flaky tests wasn’t enough,” explains Srivishnu. “We wanted to give them deeper insights into the specific errors causing the flakiness.” Lambda Test’s error categorization feature breaks down flaky tests by the type of error encountered, such as “no such element” or “stale element reference” so developers can learn which are the most prevalent issues and prioritize their efforts.
Additionally, LambdaTest uses AI to identify patterns in test failures and flag potential flaky tests. To find these patterns, the platform tries combinations of different variables, such as system OS, browsers, and resolutions. At this point, LambdaTest provides data on the frequency and severity of flaky tests on these configurations and if they exceed the flaky threshold. LambdaTest also uses HyperExecute to isolate tests and limit the interference of environmental factors and test dependencies.
The effectiveness of flaky test detection can vary depending on the testing framework used. “The approach to flaky test detection is multifaceted,” says Srivishnu. “Different frameworks have their strengths, and it’s essential to choose the right tools and strategies based on project requirements and team preferences.” As such, while there are common strategies, specific implementations differ across tools. Playwright, for example, employs a retry mechanism to handle potential flakiness. If a test fails, it’s automatically retried a specified number of times. If the test passes on subsequent attempts, it’s flagged as potentially flaky. For Selenium-based testing, tools like Report Portal can be integrated to analyze test logs and identify patterns indicating flakiness. Cypress, another popular framework, offers its cloud-based solution for detecting flaky tests and providing insights into the specific commands causing issues.
Emerging testing frameworks and trends
Beyond automated testing, the broader context of UI technologies has implications for QA strategies. While React remains dominant, other frameworks have gained traction, each offering unique advantages. “We have seen adoption starting recently with the launch of Next.js, where most of the front-facing pages, so let’s say customer-facing pages, are mostly driven with Next.js.” Vue.js, for its part, provides a distinct architectural approach that resonates with many developers.
Concurrently, the testing framework ecosystem is also transforming. Playwright has emerged as a strong contender, distinguishing itself with its WebSocket-based communication, which significantly reduces latency compared to traditional HTTP-based approaches. As Srivishnu explains, “Playwright has seen a huge massive adoption being increasing and competing directly with Cypress as well as Selenium, where the major, game-changing thing that they have launched was the WebSocket connection.” Selenium, a long-standing industry stalwart, is responding to these developments by exploring a more interactive, by-directive approach to keep pace with modern testing requirements.
A key question regarding Playwright’s WebSocket-based approach is browser compatibility. Srivishnu clarifies that WebSocket support is indeed provided by the browsers themselves. “These are exposed by the browsers themselves to have a WebSocket-based interaction,” he explains.
Srivishnu strongly advocates for Playwright as the best starting point for teams looking to automate their test suites across various frameworks. “Playwright is JavaScript natively and JavaScript is very easy to code and get to know like prod as well,” he emphasizes.
Beyond its accessibility, Playwright offers a rich set of features for browser manipulation and testing. Its advanced command set and active open-source community further solidify its position as a preferred choice. “The best thing with Playwright is that all the commands or let’s say the APIs are much more advanced and they can do a lot of browser manipulation, DOM manipulation, like capture a screenshot, do a visualization test…” Srivishnu explains.
Another key advantage of Playwright is its speed. Through WebSocket connections, Playwright can significantly reduce test execution time compared to Selenium. “Playwright runs faster than Selenium because of that specific WebSocket feature,” Srivishnu affirms. Faster test execution translates to quicker feedback loops and increased developer productivity.
The bottom line
Visit Lambdatest.com to learn more about the platform and its resources, including a learning hub for testing, webinars, and test automation certifications. Connect with Srivishnu on Linkedin.