Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Host status test #216

Merged
merged 16 commits into from
Mar 11, 2019
Merged

Host status test #216

merged 16 commits into from
Mar 11, 2019

Conversation

fbalak
Copy link
Contributor

@fbalak fbalak commented Nov 30, 2018

This test now fails because tendrl takes a lot of time to notice dead nodes.

This test uses playbooks from usmqe/usmqe-setup#224

@mbukatov
Copy link
Contributor

Is is addressing #199 as well?

Copy link
Contributor

@mbukatov mbukatov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When I run this, it fails on: There is not one cluster which includes node with FQDN error because the playbook which is messing with tendrl daemons is executed before cluster reuse fixture.

Have you noticed this in your environment?

I guess it would be best to wait for https://gitlab.com/mbukatov/pytest-ansible-playbook/issues/4. Then you will be able to specify setup/teardown playbooks in fixture workload_stop_nodes, which could help us to make the order more clear.

Edit: it was failing because of this: Tendrl/node-agent#863 I use mixed naming unless testing requires otherwise to catch issues like this ...

procedure and as `result` is used number of nodes.
"""
# wait for tendrl to notice that nodes are down
time.sleep(240)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's add a log line, which states what you stated in the comment: wait for tendrl to notice that nodes are down. It would make it more clear what's going on when one checks logs while the test is running.

usmqe_tests/conftest.py Show resolved Hide resolved
@mbukatov
Copy link
Contributor

Also, it's failing for me. Will investigate. Is it expected?

usmqe/api/graphiteapi/graphiteapi.py:104: AssumptionFailure
        Data mean should be 4, data mean in Graphite is: 6.0, applicable divergence is 0

usmqe/api/graphiteapi/graphiteapi.py:104: AssumptionFailure
        Data mean should be 0.0, data mean in Graphite is: 2.0, applicable divergence is 0
------------------------------------------------------------
Failed Assumptions: 2, Passed Assumption: 22, Waived Assumption: 0

@fbalak fbalak requested a review from mbukatov January 24, 2019 13:52
@fbalak
Copy link
Contributor Author

fbalak commented Jan 24, 2019

Tested on cluster with nodes that have mixed naming. Should be stable now.

Copy link
Contributor

@mbukatov mbukatov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have 2 problem with this test: logging/reporting needs improvement, and it consistently fails for me:

usmqe/api/graphiteapi/graphiteapi.py:116: AssumptionFailure                     
        Data mean should be 4, data mean in Graphite is: 6.0, applicable divergence is 1
                                                                                
usmqe/api/graphiteapi/graphiteapi.py:116: AssumptionFailure                     
        Data mean should be 0.0, data mean in Graphite is: 2.0, applicable divergence is 1
------------------------------------------------------------                    
Failed Assumptions: 2, Passed Assumption: 22, Waived Assumption: 0

This happens both on mixed and all fqdn named clusters.

(targets_used[0],),
workload_stop_nodes["start"],
workload_stop_nodes["end"],
divergence=1)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The logging needs to be improved a bit. When this or other checks fails, I see:

[12:42:13,855] [ FAIL     ] pytests_test:: usmqe/api/graphiteapi/graphiteapi.py:116: AssumptionFailure
        Data mean should be 4, data mean in Graphite is: 6.0, applicable divergence is 1

which doesn't tell me what value I have a problem with.

@fbalak fbalak requested a review from mbukatov March 11, 2019 10:12
Copy link
Contributor

@mbukatov mbukatov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the improvements in logging. That is crucial for proper debugging. I realized that the failure I noticed previously in my environmnet was caused by missing 2 nodes in particular group of my inventory file.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants