-
Notifications
You must be signed in to change notification settings - Fork 21
CLOUDP-342858: Running search test under OM 8.2 #467
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
94a69a9
to
b053f96
Compare
0e3d326
to
db23821
Compare
b053f96
to
8689e2c
Compare
c882bd5
to
13e1cdb
Compare
6efa62e
to
9d65420
Compare
d70d9e6
to
4363fa2
Compare
4964f58
to
aaa6b41
Compare
6d98958
to
23dc861
Compare
aaa6b41
to
3aa8a72
Compare
23dc861
to
d9f6731
Compare
3aa8a72
to
8f187b9
Compare
d9f6731
to
a415c55
Compare
34a3b86
to
bdabc37
Compare
if ops_manager is not None: | ||
ops_manager.update() | ||
ops_manager.om_status().assert_reaches_phase(Phase.Running, timeout=1200) | ||
ops_manager.appdb_status().assert_reaches_phase(Phase.Running, timeout=600) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would get_ops_manager()
ever return None
provided this step is decorated with skip_if_cloud_manager
? Or maybe I'm not understanding why we're guarding against ops_manager
being None
before calling .update()
but not for the following method calls?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you're right, removed the if. The annotation is sufficient.
|
||
|
||
@mark.e2e_search_enterprise_tls | ||
def test_wait_for_database_resource_ready2(mdb: MongoDB): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do we wait again here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, it looks it's redundant. Removing it.
from kubetester.mongodb_user import MongoDBUser | ||
from kubetester.omtester import skip_if_cloud_manager | ||
from kubetester.phase import Phase | ||
from mypyc.irbuild.function import check_native_override |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This doesn't seem used.
from tests.common.search.search_tester import SearchTester | ||
from tests.conftest import get_default_operator | ||
from tests.conftest import get_default_operator, get_issuer_ca_filepath | ||
from tests.opsmanager.conftest import custom_om_prev_version |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This doesn't seem used.
@mark.e2e_search_enterprise_tlssh | ||
def test_search_assert_search_query_2(mdb: MongoDB): | ||
get_user_sample_movies_helper(mdb).assert_search_query(retry_timeout=60) | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should move this as the last step in TestUpgradeMongod
to make sure it doesn't get moved around.
fa19d3d
to
de8fd69
Compare
bdabc37
to
c771f53
Compare
de8fd69
to
8398705
Compare
a7d5aeb
to
37d09a7
Compare
|
||
resource.set_version(get_custom_om_version()) | ||
resource.set_appdb_version(get_custom_appdb_version()) | ||
resource.allow_mdb_rc_versions() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is 8.2
an RC version? Is it not available without enabling this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh this is not necessary anymore. Thanks for flagging.
37d09a7
to
422790a
Compare
MCK 1.5.0 Release NotesNew Features
Bug Fixes
|
2a7af5e
to
0f77d1f
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good overall, but the tests are not easy to follow. Left some suggestions.
docker/mongodb-kubernetes-tests/tests/search/fixtures/enterprise-replicaset-sample-mflix.yaml
Show resolved
Hide resolved
@mark.e2e_search_enterprise_tls | ||
def test_wait_for_database_resource_ready(mdb: MongoDB): | ||
mdb.assert_abandons_phase(Phase.Running, timeout=300) | ||
mdb.assert_reaches_phase(Phase.Running, timeout=300) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: doesn't look like a very useful test. We do nothing here apart from the assertion. In my mind the test should perform some change and then assert some condition.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it is useful, it's checking the state of the world. Not every test must change the world AND observe it as part of the same test case. Especially when you deal with multiple components deployed asynchronously. In this case we create users, search and mongodb which are reconciled by different controllers
Having smaller test cases makes it easier to develop, as you can retry smaller individual functions if needed. Also we should consider all the functions top to bottom as part of one testing steps and story.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In case of search, after it's deployed, the MongoDB reconciler will pick it up and perform changes to mongod - agents will perform rolling restart of mongods, even without changing to MongoDB CR at all.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it is useful, it's checking the state of the world. Not every test must change the world AND observe it as part of the same test case. Especially when you deal with multiple components deployed asynchronously. In this case we create users, search and mongodb which are reconciled by different controllers
Having smaller test cases makes it easier to develop, as you can retry smaller individual functions if needed. Also we should consider all the functions top to bottom as part of one testing steps and story.
This is an end to end test and we shouldn't be concerned about specific controllers or other implementation details. We should focus on how the system behaves as a whole instead.
Perhaps it is easier to develop/debug the tests like this, but tests structured this way leave the reader guessing. Even if you consider all the functions top to bottom it raises a question - why we do this check here? We already check ready state in test_create_database_resource
- why check it again? You explained this later, but imagine not having this context.
In case of search, after it's deployed, the MongoDB reconciler will pick it up and perform changes to mongod - agents will perform rolling restart of mongods, even without changing to MongoDB CR at all.
Given that explanation I would argue that in the end to end test we should be:
- Deploying everything (the DB resource, user and the search)
- Do the ready state waits and other checks after the waiting.
This is probably how the users will be using it (especially with GitOps). And it eliminates the questions about strange checks.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Your comments made me rethink this and I changed that function to test_wait_for_agents_ready
. After search is deployed, mongod parameters are added to all processes and the agents are doing their work for at least few minutes.
Due to how we report (or rather not doing properly) changes to Pending when reconciling, we cannot see the agents are doing anything by checking the spec.status alone.
Therefore I've added in the test mdb.get_om_tester().wait_agents_ready()
that checks if all the agents reached the goal state (it's exactly the same logic as the operator is waiting on in the reconcile loop).
I hope now it's more clear what's happening.
@mark.e2e_search_enterprise_tls | ||
def test_validate_tls_connections(mdb: MongoDB, mdbs: MongoDBSearch, namespace: str): | ||
validate_tls_connections(mdb, mdbs, namespace) | ||
|
||
|
||
@mark.e2e_search_enterprise_tls | ||
def test_search_restore_sample_database(mdb: MongoDB): | ||
get_admin_sample_movies_helper(mdb).restore_sample_database() | ||
|
||
|
||
@mark.e2e_search_enterprise_tls | ||
def test_search_create_search_index(mdb: MongoDB): | ||
get_user_sample_movies_helper(mdb).create_search_index() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't understand the correlation between test actions and the assertions here and in some other tests. Perhaps it is too granular split into separate tests?
I think it will be easier to follow if there is a test which performs an action and checks the outcome of the action.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is difficult to fix tbh and it would require a more broader alignment how we should write the test...
I don't know how to fix this exact case, but it's just part of the testing story: in order to query the search we need the index in the first place.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not going to insist on changing this. But IMO something like this makes more sense:
def test_can_search(mdb: MongoDB):
get_admin_sample_movies_helper(mdb).restore_sample_database()
get_user_sample_movies_helper(mdb).create_search_index()
get_user_sample_movies_helper(mdb).assert_search_query(retry_timeout=60)
You can immediately see that the user should be able to search after creating a search index.
custom_roles = mdb.get_automation_config_tester().automation_config.get("roles", []) | ||
assert len(custom_roles) > 0 | ||
assert "searchCoordinator" in [role["role"] for role in custom_roles] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not clear to me - why we assert it here?
I had to look at the controller code to understand it better. I see that for DB versions prior to 8.2 we create a polyfill role. Would it then make more sense to assert this after creating the resource with MDB_VERSION_WITHOUT_BUILT_IN_ROLE
? E.g. somewhere in test_create_database_resource
?
And then in the upgrade test we can test that there is no polyfill role after the upgrade.
This way test actions are bundled together with a relevant assertions which makes it easier to understand. Also I think a comment with an explanation about the polyfill role wouldn't hurt somewhere around role assertions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've added some more comments around and reordered the steps a bit - PTAL.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
After looking at this again I now feel like ideally we should not be checking for implementation details in an E2E test: we should not be concerned about polyfill role. It might be an useful unit/integration test, but on the E2E level we shouldn't wory about it. We care about an answer to the question - "does search work after upgrade?".
Not requesting you to change this (but I'll welcome it) as there is no harm in the additional checks apart from them adding a bit of extra complexity to the test.
def test_mongod_version(self, mdb: MongoDB): | ||
mdb.tester(ca_path=get_issuer_ca_filepath(), use_ssl=True).assert_version(MDB_VERSION_WITHOUT_BUILT_IN_ROLE) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure what is the value of this test as part of the upgrade test.
Perhaps it is also should be an assertion after initial creation in test_create_database_resource
?
And then in the upgrade test we will be:
- Taking action: upgrading to a new version
- Making assertions: checking running state, custom roles, version, etc
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree that this assertion alone is of little value as it stands as we deploy this version at the beginning and the assertion could be there.
OTOH, I've prepared this TestUpgradeMongod to be a potentially reusable bit of upgrade test/steps. If you look at this assertion as part of the steps grouped under a single TestClass, then it's OK to test whether we run the test under specific circumstances.
Let me think about the overall structure, perhaps some comments would be needed here.
docker/mongodb-kubernetes-tests/tests/search/search_enterprise_tls.py
Outdated
Show resolved
Hide resolved
docker/mongodb-kubernetes-tests/tests/search/search_enterprise_tls.py
Outdated
Show resolved
Hide resolved
except client.ApiException as e: | ||
if e.status == 404: | ||
logger.debug("project config map is not specified, trying my-project as the source") | ||
src_cm = read_configmap(self.namespace, "my-project", api_client=api_client) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The whole except
section is redundant due to default value assigned to src_project_config_map_name
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I refactored this part a bit.
2d1d13b
to
d8329a6
Compare
d8329a6
to
e3fa3e7
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Left a few comments. I'll be happy if you address them, but I'll be ok with the PR merged as is because I do not consider these comments critical.
def test_wait_for_mongod_parameters(mdb: MongoDB): | ||
# After search CR is deployed, MongoDB controller will pick it up | ||
# and start adding searchCoordinator role and search-related | ||
# parameters to the automation config. | ||
def check_mongod_parameters(): | ||
parameters_are_set = True | ||
pod_parameters = [] | ||
for idx in range(mdb.get_members()): | ||
mongod_config = yaml.safe_load( | ||
KubernetesTester.run_command_in_pod_container( | ||
f"{mdb.name}-{idx}", mdb.namespace, ["cat", "/data/automation-mongod.conf"] | ||
) | ||
) | ||
set_parameter = mongod_config.get("setParameter", {}) | ||
parameters_are_set = parameters_are_set and ( | ||
"mongotHost" in set_parameter and "searchIndexManagementHostAndPort" in set_parameter | ||
) | ||
pod_parameters.append(f"pod {idx} setParameter: {set_parameter}") | ||
|
||
return parameters_are_set, f'Not all pods have mongot parameters set:\n{"\n".join(pod_parameters)}' | ||
|
||
run_periodically(check_mongod_parameters, timeout=200) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think both this and test_wait_for_agents_ready
are testing implementation details. I understand that test_wait_for_agents_ready
is necessary (at least for now, while we are not reporting Pending
). But do we need to do test_wait_for_mongod_parameters
?
I'm trying to look at the flow from the user perspective. For me as a user it is important that I can:
- Deploy
MongoDB
- Deploy
MongoDBSearch
- Get some indication that everything is ready (we don't have that at the moment unfortunately as you explained in the comment for
test_wait_for_agents_ready
). - Create search index
- And start searching.
Can we just drop test_wait_for_mongod_parameters
and verify "I can search" functionality (you already do that later in the test)?
@mark.e2e_search_enterprise_tls | ||
def test_validate_tls_connections(mdb: MongoDB, mdbs: MongoDBSearch, namespace: str): | ||
validate_tls_connections(mdb, mdbs, namespace) | ||
|
||
|
||
@mark.e2e_search_enterprise_tls | ||
def test_search_restore_sample_database(mdb: MongoDB): | ||
get_admin_sample_movies_helper(mdb).restore_sample_database() | ||
|
||
|
||
@mark.e2e_search_enterprise_tls | ||
def test_search_create_search_index(mdb: MongoDB): | ||
get_user_sample_movies_helper(mdb).create_search_index() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not going to insist on changing this. But IMO something like this makes more sense:
def test_can_search(mdb: MongoDB):
get_admin_sample_movies_helper(mdb).restore_sample_database()
get_user_sample_movies_helper(mdb).create_search_index()
get_user_sample_movies_helper(mdb).assert_search_query(retry_timeout=60)
You can immediately see that the user should be able to search after creating a search index.
custom_roles = mdb.get_automation_config_tester().automation_config.get("roles", []) | ||
assert len(custom_roles) > 0 | ||
assert "searchCoordinator" in [role["role"] for role in custom_roles] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
After looking at this again I now feel like ideally we should not be checking for implementation details in an E2E test: we should not be concerned about polyfill role. It might be an useful unit/integration test, but on the E2E level we shouldn't wory about it. We care about an answer to the question - "does search work after upgrade?".
Not requesting you to change this (but I'll welcome it) as there is no harm in the additional checks apart from them adding a bit of extra complexity to the test.
Summary
This PR adds testing for running search with Ops Manager 8.0.14 and with mongod 8.2.
Changes to search/search_enterprise_tls.py:
Proof of Work
EVG passing for search_enterprise_tls.py in both cloud-qa and om80 variants: evg link
Also the test is only run in om80 variant and not for earlier OM.
Checklist
skip-changelog
label if not needed