Flaky Test Detector
Identify and fix non-deterministic tests that intermittently fail without code changes.
Quick Start
When a user reports flaky tests or asks for test reliability analysis:
- •Identify the approach: Determine if analyzing code patterns or test execution results
- •Analyze for flakiness: Look for common flaky patterns in test code or execution history
- •Report findings: List identified flaky tests with specific issues
- •Suggest fixes: Provide concrete remediation strategies
What Makes Tests Flaky
Flaky tests fail intermittently without code changes due to:
- •Timing issues: Race conditions, fixed sleeps, async/await problems
- •State management: Shared state between tests, improper cleanup
- •External dependencies: Network calls, database connections, file system
- •Randomness: Unseeded random data, UUID generation
- •Time dependencies: Current time/date, timezone assumptions
- •Resource issues: Leaks, insufficient cleanup
- •Test order: Dependencies between tests
- •Environment: Hardcoded paths, missing env vars
Detection Methods
Static Code Analysis
Analyze test code for common flaky patterns.
When to use:
- •Reviewing test code for potential issues
- •Proactive flakiness prevention
- •Code review of new tests
- •Refactoring existing tests
Process:
- •Read test files
- •Search for flaky patterns (see flaky-patterns.md)
- •Identify specific issues with line numbers
- •Suggest fixes (see remediation-strategies.md)
Common patterns to detect:
Timing issues:
- •
time.sleep(),Thread.sleep()- Fixed waits - •Missing
awaitin async functions - •Race conditions with threading
State issues:
- •Class or global variables in test classes
- •Missing setUp/tearDown or fixtures
- •Database operations without cleanup
External dependencies:
- •
requests.get(),http.client- Real network calls - •Database connections to production/external DBs
- •File operations without temp directories
Randomness:
- •
random.without seed - •
UUID.randomUUID()without mocking - •Non-deterministic data generation
Time dependencies:
- •
datetime.now(),System.currentTimeMillis() - •Timezone-dependent assertions
- •Date comparisons without mocking
Test Result Analysis
Analyze test execution history to find inconsistent results.
When to use:
- •Tests are failing intermittently in CI/CD
- •Investigating specific test reliability
- •Analyzing test suite health
- •Tracking flakiness over time
Process:
- •Collect test results from multiple runs
- •Use
scripts/analyze_test_results.pyto analyze patterns - •Review flakiness scores and patterns
- •Investigate high-scoring tests
Script usage:
python scripts/analyze_test_results.py test_results.json
Input format (JSON):
[
{
"test_name": "test_user_login",
"status": "passed",
"timestamp": "2024-01-01T10:00:00",
"duration": 1.23
},
{
"test_name": "test_user_login",
"status": "failed",
"timestamp": "2024-01-01T11:00:00",
"duration": 1.45
}
]
Metrics:
- •Flakiness score: 0-1, higher = more flaky (based on pass rate variance)
- •Pass rate: Percentage of successful runs
- •Pattern: Recent pass/fail sequence (P = pass, F = fail)
- •Alternating: Whether test alternates between pass/fail
- •Duration variance: Inconsistent execution time indicates issues
Framework-Specific Guidance
Python (pytest, unittest)
Common issues:
- •Missing fixtures or improper fixture scope
- •Shared class variables
- •Not using
tmp_pathfor file operations - •Missing
@pytest.mark.django_dbfor database tests - •Unseeded
randommodule usage
Best practices:
- •Use fixtures for test data and cleanup
- •Use
tmp_pathfixture for file operations - •Mock external calls with
pytest-mockorunittest.mock - •Use
freezegunfor time mocking - •Seed random with
random.seed()
Java (JUnit, TestNG)
Common issues:
- •Static variables in test classes
- •Missing
@Before/@Aftercleanup - •Not using
@Transactionalfor database tests - •Fixed
Thread.sleep()calls - •Hardcoded file paths
Best practices:
- •Use
@Before/@Afterfor setup/cleanup - •Use
@Transactionalfor automatic rollback - •Mock with Mockito
- •Use
Clockfor time mocking - •Use try-with-resources for resource management
Workflow
1. Understand the Context
Ask clarifying questions:
- •What tests are flaky?
- •How often do they fail?
- •What's the failure pattern?
- •Any recent changes?
- •CI/CD or local environment?
2. Choose Detection Method
Static analysis if:
- •Reviewing code proactively
- •No test execution history available
- •Want to prevent flakiness
Result analysis if:
- •Have test execution history
- •Tests failing intermittently
- •Need to quantify flakiness
3. Analyze for Flakiness
For static analysis:
- •Read test files
- •Search for patterns from flaky-patterns.md
- •Note specific issues with line numbers
- •Categorize by issue type
For result analysis:
- •Run
analyze_test_results.pyscript - •Review flakiness scores
- •Identify high-risk tests
- •Examine failure patterns
4. Report Findings
Structure the report:
- •Summary: Number of flaky tests found
- •High priority: Tests with highest flakiness scores
- •By category: Group by issue type
- •Specific issues: File paths and line numbers
Example format:
Found 5 potentially flaky tests: HIGH PRIORITY: - test_user_login (flakiness: 0.85) - Line 45: time.sleep(2) - fixed wait - Line 52: Shared class variable 'user_data' MEDIUM PRIORITY: - test_api_call (flakiness: 0.62) - Line 23: requests.get() - unmocked network call
5. Suggest Remediation
For each issue, provide:
- •What's wrong: Explain the flaky pattern
- •Why it's flaky: Describe the non-determinism
- •How to fix: Concrete code example
Reference remediation-strategies.md for detailed fixes.
Example Usage Patterns
User: "Our test_checkout test keeps failing randomly" → Analyze test code for flaky patterns, report findings with fixes
User: "Find all flaky tests in the test suite" → Scan all test files for common flaky patterns
User: "This test has a 60% pass rate, why?" → Analyze test code and suggest specific fixes
User: "Analyze these test results for flakiness" → Use analyze_test_results.py script on provided data
User: "How do I fix this race condition in my test?" → Provide remediation strategy with code examples
User: "Review this test for potential flakiness" → Static analysis of specific test with recommendations
Best Practices
Detection
- •Look for multiple flaky patterns, not just one
- •Consider the test framework's idioms
- •Check both test code and test fixtures/setup
- •Review recent changes that might introduce flakiness
Reporting
- •Prioritize by severity and frequency
- •Provide specific line numbers
- •Group related issues together
- •Include confidence level in assessment
Remediation
- •Suggest framework-appropriate fixes
- •Provide complete code examples
- •Explain why the fix works
- •Consider test maintainability
Prevention
- •Recommend test design patterns
- •Suggest CI/CD improvements (retry policies, test isolation)
- •Encourage test independence
- •Promote proper mocking and fixtures