Flaky Test Detector

Identify and fix non-deterministic tests that intermittently fail without code changes.

Quick Start

When a user reports flaky tests or asks for test reliability analysis:

•Identify the approach: Determine if analyzing code patterns or test execution results
•Analyze for flakiness: Look for common flaky patterns in test code or execution history
•Report findings: List identified flaky tests with specific issues
•Suggest fixes: Provide concrete remediation strategies

What Makes Tests Flaky

Flaky tests fail intermittently without code changes due to:

•Timing issues: Race conditions, fixed sleeps, async/await problems
•State management: Shared state between tests, improper cleanup
•External dependencies: Network calls, database connections, file system
•Randomness: Unseeded random data, UUID generation
•Time dependencies: Current time/date, timezone assumptions
•Resource issues: Leaks, insufficient cleanup
•Test order: Dependencies between tests
•Environment: Hardcoded paths, missing env vars

Detection Methods

Static Code Analysis

Analyze test code for common flaky patterns.

When to use:

•Reviewing test code for potential issues
•Proactive flakiness prevention
•Code review of new tests
•Refactoring existing tests

Process:

•Read test files
•Search for flaky patterns (see flaky-patterns.md)
•Identify specific issues with line numbers
•Suggest fixes (see remediation-strategies.md)

Common patterns to detect:

Timing issues:

•time.sleep(), Thread.sleep() - Fixed waits
•Missing await in async functions
•Race conditions with threading

State issues:

•Class or global variables in test classes
•Missing setUp/tearDown or fixtures
•Database operations without cleanup

External dependencies:

•requests.get(), http.client - Real network calls
•Database connections to production/external DBs
•File operations without temp directories

Randomness:

•random. without seed
•UUID.randomUUID() without mocking
•Non-deterministic data generation

Time dependencies:

•datetime.now(), System.currentTimeMillis()
•Timezone-dependent assertions
•Date comparisons without mocking

Test Result Analysis

Analyze test execution history to find inconsistent results.

When to use:

•Tests are failing intermittently in CI/CD
•Investigating specific test reliability
•Analyzing test suite health
•Tracking flakiness over time

Process:

•Collect test results from multiple runs
•Use scripts/analyze_test_results.py to analyze patterns
•Review flakiness scores and patterns
•Investigate high-scoring tests

Script usage:

bash

python scripts/analyze_test_results.py test_results.json

Input format (JSON):

json

[
  {
    "test_name": "test_user_login",
    "status": "passed",
    "timestamp": "2024-01-01T10:00:00",
    "duration": 1.23
  },
  {
    "test_name": "test_user_login",
    "status": "failed",
    "timestamp": "2024-01-01T11:00:00",
    "duration": 1.45
  }
]

Metrics:

•Flakiness score: 0-1, higher = more flaky (based on pass rate variance)
•Pass rate: Percentage of successful runs
•Pattern: Recent pass/fail sequence (P = pass, F = fail)
•Alternating: Whether test alternates between pass/fail
•Duration variance: Inconsistent execution time indicates issues

Framework-Specific Guidance

Python (pytest, unittest)

Common issues:

•Missing fixtures or improper fixture scope
•Shared class variables
•Not using tmp_path for file operations
•Missing @pytest.mark.django_db for database tests
•Unseeded random module usage

Best practices:

•Use fixtures for test data and cleanup
•Use tmp_path fixture for file operations
•Mock external calls with pytest-mock or unittest.mock
•Use freezegun for time mocking
•Seed random with random.seed()

Java (JUnit, TestNG)

Common issues:

•Static variables in test classes
•Missing @Before/@After cleanup
•Not using @Transactional for database tests
•Fixed Thread.sleep() calls
•Hardcoded file paths

Best practices:

•Use @Before/@After for setup/cleanup
•Use @Transactional for automatic rollback
•Mock with Mockito
•Use Clock for time mocking
•Use try-with-resources for resource management

Workflow

1. Understand the Context

Ask clarifying questions:

•What tests are flaky?
•How often do they fail?
•What's the failure pattern?
•Any recent changes?
•CI/CD or local environment?

2. Choose Detection Method

Static analysis if:

•Reviewing code proactively
•No test execution history available
•Want to prevent flakiness

Result analysis if:

•Have test execution history
•Tests failing intermittently
•Need to quantify flakiness

3. Analyze for Flakiness

For static analysis:

•Read test files
•Search for patterns from flaky-patterns.md
•Note specific issues with line numbers
•Categorize by issue type

For result analysis:

•Run analyze_test_results.py script
•Review flakiness scores
•Identify high-risk tests
•Examine failure patterns

4. Report Findings

Structure the report:

•Summary: Number of flaky tests found
•High priority: Tests with highest flakiness scores
•By category: Group by issue type
•Specific issues: File paths and line numbers

Example format:

code

Found 5 potentially flaky tests:

HIGH PRIORITY:
- test_user_login (flakiness: 0.85)
  - Line 45: time.sleep(2) - fixed wait
  - Line 52: Shared class variable 'user_data'

MEDIUM PRIORITY:
- test_api_call (flakiness: 0.62)
  - Line 23: requests.get() - unmocked network call

5. Suggest Remediation

For each issue, provide:

•What's wrong: Explain the flaky pattern
•Why it's flaky: Describe the non-determinism
•How to fix: Concrete code example

Reference remediation-strategies.md for detailed fixes.

Example Usage Patterns

User: "Our test_checkout test keeps failing randomly" → Analyze test code for flaky patterns, report findings with fixes

User: "Find all flaky tests in the test suite" → Scan all test files for common flaky patterns

User: "This test has a 60% pass rate, why?" → Analyze test code and suggest specific fixes

User: "Analyze these test results for flakiness" → Use analyze_test_results.py script on provided data

User: "How do I fix this race condition in my test?" → Provide remediation strategy with code examples

User: "Review this test for potential flakiness" → Static analysis of specific test with recommendations

Best Practices

Detection

•Look for multiple flaky patterns, not just one
•Consider the test framework's idioms
•Check both test code and test fixtures/setup
•Review recent changes that might introduce flakiness

Reporting

•Prioritize by severity and frequency
•Provide specific line numbers
•Group related issues together
•Include confidence level in assessment

Remediation

•Suggest framework-appropriate fixes
•Provide complete code examples
•Explain why the fix works
•Consider test maintainability

Prevention

•Recommend test design patterns
•Suggest CI/CD improvements (retry policies, test isolation)
•Encourage test independence
•Promote proper mocking and fixtures