Skip to content

Fix tests to accommodate wait for container re-download on downgrade#4368

Merged
sairon merged 1 commit intodevfrom
tests-wait-for-cli
Oct 29, 2025
Merged

Fix tests to accommodate wait for container re-download on downgrade#4368
sairon merged 1 commit intodevfrom
tests-wait-for-cli

Conversation

@sairon
Copy link
Member

@sairon sairon commented Oct 29, 2025

Because the OS downgrade performed in tests now triggers change in container snapshotters, all containers need to be redownloaded. Make sure that CLI container exists and increase the timeout for the time being.

Summary by CodeRabbit

  • Tests
    • Enhanced OS update test reliability with improved container status verification and extended timeout handling.

Because the OS downgrade performed in tests now triggers change in container
snapshotters, all containers need to be redownloaded. Make sure that CLI
container exists and increase the timeout for the time being.
@sairon sairon requested a review from agners October 29, 2025 07:41
@sairon sairon added the build Build and CI related issues label Oct 29, 2025
@coderabbitai
Copy link

coderabbitai bot commented Oct 29, 2025

📝 Walkthrough

Walkthrough

This change enhances test reliability by adding container status polling to verify that hassio_supervisor and hassio_cli containers are running after reactivation. The test decorator timeout is increased from 300 to 600 seconds to accommodate these additional checks.

Changes

Cohort / File(s) Change Summary
Container status polling for test robustness
tests/smoke_test/test_os_update.py
Adds check_container_running() helper function to poll Docker container status via docker inspect. Introduces a waiting loop after target reactivation to confirm container readiness. Increases test timeout from 300 to 600 seconds with a TODO note.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~8 minutes

  • Helper function implementation—verify polling logic and timeout handling are correct
  • Container readiness wait logic—confirm it doesn't mask underlying issues or create flaky behavior
  • Timeout increase justification—validate that 600 seconds is appropriate and the TODO is tracked

Pre-merge checks and finishing touches

❌ Failed checks (1 warning)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. You can run @coderabbitai generate docstrings to improve docstring coverage.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title Check ✅ Passed The PR title "Fix tests to accommodate wait for container re-download on downgrade" directly and accurately reflects the primary changes in the changeset. The modifications add a container status polling helper function, introduce waiting loops to ensure containers are running before proceeding, and increase the test timeout from 300 to 600 seconds—all of which align perfectly with accommodating waits during container re-download on downgrade. The title is concise, specific, and avoids vague language or noise, making it clear to reviewers scanning commit history that this PR addresses test timing issues related to container lifecycle management during downgrade operations.
✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch tests-wait-for-cli

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (1)
tests/smoke_test/test_os_update.py (1)

44-48: Extract duplicated helper to module level.

The check_container_running function is duplicated identically in test_init (lines 13-17). Extract it to module level to eliminate duplication and improve maintainability.

Apply this diff to extract the helper:

 _LOGGER = logging.getLogger(__name__)
 
+
+def check_container_running(container_name):
+    """Check if a container is running via docker inspect."""
+    from labgrid.driver import ShellDriver
+    # Note: This assumes shell context; caller must provide shell.run_check
+    # Better: pass shell as parameter
+    raise NotImplementedError("Needs shell parameter")
+

Actually, since the helper needs access to shell.run_check, a better approach is to accept shell as a parameter:

 _LOGGER = logging.getLogger(__name__)
 
+
+def check_container_running(shell, container_name):
+    """Check if a container is running via docker inspect."""
+    out = shell.run_check(
+        f"docker container inspect -f '{{{{.State.Status}}}}' {container_name} || true"
+    )
+    return "running" in out
+

 @pytest.mark.dependency()
 @pytest.mark.timeout(120)
 def test_init(shell, shell_json):
-    def check_container_running(container_name):
-        out = shell.run_check(
-            f"docker container inspect -f '{{{{.State.Status}}}}' {container_name} || true"
-        )
-        return "running" in out
-
     # wait for important containers first
     while True:
-        if check_container_running("homeassistant") and check_container_running("hassio_supervisor"):
+        if check_container_running(shell, "homeassistant") and check_container_running(shell, "hassio_supervisor"):
             break

         sleep(1)

Then update test_os_update similarly to remove its local definition and call the module-level function.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

Cache: Disabled due to data retention organization setting

Knowledge base: Disabled due to data retention organization setting

📥 Commits

Reviewing files that changed from the base of the PR and between 4695d10 and 2d5e4f1.

📒 Files selected for processing (1)
  • tests/smoke_test/test_os_update.py (2 hunks)
🔇 Additional comments (1)
tests/smoke_test/test_os_update.py (1)

42-42: LGTM! Timeout increase aligns with PR objectives.

The doubled timeout accommodates container re-downloads during the OS downgrade scenario. Ensure the TODO is tracked so the timeout can be reduced after the 17.0 release.

Comment on lines +72 to +77
# temporary needed for OS 17.0 -> 16.x path, where all containers must be re-downloaded
while True:
if check_container_running("hassio_supervisor") and check_container_running("hassio_cli"):
break

sleep(1)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Add explicit timeout to container wait loop.

The wait loop lacks a timeout or maximum iteration limit. While the pytest @pytest.mark.timeout(600) decorator provides an outer bound, explicit timeouts improve test reliability by providing clearer failure messages and more predictable behavior.

Apply this diff to add an explicit timeout:

     # temporary needed for OS 17.0 -> 16.x path, where all containers must be re-downloaded
+    max_wait = 120  # seconds
+    elapsed = 0
     while True:
         if check_container_running("hassio_supervisor") and check_container_running("hassio_cli"):
             break
 
+        if elapsed >= max_wait:
+            raise TimeoutError(
+                f"Containers hassio_supervisor and hassio_cli did not start within {max_wait}s"
+            )
+        elapsed += 1
         sleep(1)

Alternatively, use a more concise approach with a counter:

     # temporary needed for OS 17.0 -> 16.x path, where all containers must be re-downloaded
-    while True:
+    max_attempts = 120
+    for attempt in range(max_attempts):
         if check_container_running("hassio_supervisor") and check_container_running("hassio_cli"):
             break
 
         sleep(1)
+    else:
+        raise TimeoutError(
+            "Containers hassio_supervisor and hassio_cli did not start within 120s"
+        )
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
# temporary needed for OS 17.0 -> 16.x path, where all containers must be re-downloaded
while True:
if check_container_running("hassio_supervisor") and check_container_running("hassio_cli"):
break
sleep(1)
# temporary needed for OS 17.0 -> 16.x path, where all containers must be re-downloaded
max_wait = 120 # seconds
elapsed = 0
while True:
if check_container_running("hassio_supervisor") and check_container_running("hassio_cli"):
break
if elapsed >= max_wait:
raise TimeoutError(
f"Containers hassio_supervisor and hassio_cli did not start within {max_wait}s"
)
elapsed += 1
sleep(1)
Suggested change
# temporary needed for OS 17.0 -> 16.x path, where all containers must be re-downloaded
while True:
if check_container_running("hassio_supervisor") and check_container_running("hassio_cli"):
break
sleep(1)
# temporary needed for OS 17.0 -> 16.x path, where all containers must be re-downloaded
max_attempts = 120
for attempt in range(max_attempts):
if check_container_running("hassio_supervisor") and check_container_running("hassio_cli"):
break
sleep(1)
else:
raise TimeoutError(
"Containers hassio_supervisor and hassio_cli did not start within 120s"
)
🤖 Prompt for AI Agents
In tests/smoke_test/test_os_update.py around lines 72-77, the infinite loop that
waits for hassio_supervisor and hassio_cli to run needs an explicit timeout;
replace the while True loop with a timed-wait using a max_wait (e.g. 600 seconds
or a shorter sensible value) by recording start = time.time(), looping while
elapsed < max_wait and sleeping briefly, and after the loop raise a clear
failure (pytest.fail or AssertionError) if containers never became ready; ensure
time is imported if not already present.

Copy link
Member

@agners agners left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@sairon sairon merged commit af87ada into dev Oct 29, 2025
3 checks passed
@sairon sairon deleted the tests-wait-for-cli branch October 29, 2025 10:06
@github-actions github-actions bot locked and limited conversation to collaborators Nov 6, 2025
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

build Build and CI related issues cla-signed hacktoberfest

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants