Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Updating force merge operation to force merge only track indices by default #865

Merged
merged 3 commits into from
Jan 23, 2020

Conversation

rohitnair
Copy link
Contributor

This change fixes #835

I have run the tests and ensured they pass. Also tested locally to ensure it does what we expect it to do.

Copy link
Member

@danielmitterdorfer danielmitterdorfer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for your PR! The change itself looks good already but I left some pointers to add docs and one test. Can you please have a look?

@@ -580,6 +580,21 @@ def percent_completed(self):
return self.current_bulk / self.total_bulks


class ForceMergeParamSouce:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you please document the new parameter in https://github.com/elastic/rally/blob/master/docs/track.rst#force-merge?

IMHO it should be the first parameter in the list and the docs should say something like:

* ``index`` (optional, defaults to the indices defined in the ``indices`` section or ``_all`` if no indices are defined there): The name of the index that should be force-merged.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated the docs.

@@ -580,6 +580,21 @@ def percent_completed(self):
return self.current_bulk / self.total_bulks


class ForceMergeParamSouce:
def __init__(self, track, params, **kwargs):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you please add tests in tests/track/params_test.py? You can have a look at DeleteIndexParamSourceTests for inspiration. IMHO it should check all three cases how the index parameter can be set (via the track, fallback via _all and explicitly set via the index parameter on the track).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the pointer, added tests to cover the three cases as you mentioned.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

make precommit is failing now with:


08:13:11 + make precommit
08:13:19 ************* Module esrally.track.params
08:13:19 esrally/track/params.py:584:4: W0231: __init__ method from base class 'ParamSource' is not called (super-init-not-called)

can you please add the following in the first line of the constructor:

super().__init__(track, params, **kwargs)

@danielmitterdorfer danielmitterdorfer added :Load Driver Changes that affect the core of the load driver such as scheduling, the measurement approach etc. enhancement Improves the status quo labels Jan 13, 2020
@danielmitterdorfer
Copy link
Member

@elasticmachine test this please

@@ -580,6 +580,21 @@ def percent_completed(self):
return self.current_bulk / self.total_bulks


class ForceMergeParamSouce:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Furthermore, can you please derive this class from the class ParamSource? All Rally-internal parameter sources are derived from it and it provides a default implementation for required methods. Currently, the following invocation:

esrally --test-mode --distribution-version=7.4.2 --challenge=append-no-conflicts-index-only --on-error=abort

is failing with:

[ERROR] Cannot race. Error in load generator ('ForceMergeParamSouce' object has no attribute 'partition')

This can be solved by deriving your class from ParamSource as I've outlined above.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

@rohitnair
Copy link
Contributor Author

Thanks for taking a look @danielmitterdorfer . I've addressed your comments and pushed an update. Let me know if you have any other feedback.

@danielmitterdorfer danielmitterdorfer self-requested a review January 22, 2020 08:06
@danielmitterdorfer
Copy link
Member

@elasticmachine test this please

@danielmitterdorfer
Copy link
Member

Thanks for iterating! I'm gonna have another look soon. Meanwhile I've triggered a fresh CI build for your changes.

Copy link
Member

@danielmitterdorfer danielmitterdorfer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for iterating! The changes look pretty good and we're almost there. We had some minor linter errors in CI (see my review comments for details). I also spotted an issue that we don't pass additionally supported parameters to the runner and suggested an implementation which you can just apply. I think with the next review cycle we should be ready to merge.

@@ -580,6 +580,21 @@ def percent_completed(self):
return self.current_bulk / self.total_bulks


class ForceMergeParamSouce:
def __init__(self, track, params, **kwargs):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

make precommit is failing now with:


08:13:11 + make precommit
08:13:19 ************* Module esrally.track.params
08:13:19 esrally/track/params.py:584:4: W0231: __init__ method from base class 'ParamSource' is not called (super-init-not-called)

can you please add the following in the first line of the constructor:

super().__init__(track, params, **kwargs)

@@ -341,7 +341,9 @@ Throughput will be reported as number of indexed documents per second.
force-merge
~~~~~~~~~~~

With the operation type ``force-merge`` you can call the `force merge API <http://www.elastic.co/guide/en/elasticsearch/reference/current/indices-forcemerge.html>`_. On older versions of Elasticsearch (prior to 2.1), Rally will use the ``optimize API`` instead. It supports the following parameter:
With the operation type ``force-merge`` you can call the `force merge API <http://www.elastic.co/guide/en/elasticsearch/reference/current/indices-forcemerge.html>`_. On older versions of Elasticsearch (prior to 2.1), Rally will use the ``optimize API`` instead. It supports the following parameters:

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: Remove this empty line.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.


self.assertEqual("_all", p["index"])


Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

make precommit is failing with:

08:13:19 ************* Module tests.track.params_test
08:13:19 tests/track/params_test.py:1718:0: C0305: Trailing newlines (trailing-newlines)

This is because you added two trailing lines but only one is allowed. Can you please remove this last line?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed this, and made sure make precommit passes.

@@ -580,7 +580,7 @@ def percent_completed(self):
return self.current_bulk / self.total_bulks


class ForceMergeParamSouce:
class ForceMergeParamSouce(ParamSource):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I noticed there is a typo: It says ForceMergeParamSouce but it should be ForceMergeParamSource (the "r" is missing in the word "source").

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤦‍♂ sorry about that.

else:
default_index = "_all"

self._index_name = params.get("index", default_index)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I noticed that we'd lose support for the other two parameters max-num-segments and request-timeout (see the ForceMerge runner class). Could you please add the following lines here in the constructor?

        self._max_num_segments = params.get("max-num-segments")
        self._request_timeout = params.get("request-timeout")

and the params method needs to change to:

    def params(self):
        return {
            "index": self._index_name,
            "max-num-segments": self._max_num_segments,
            "request-timeout": self._request_timeout
        }

you might wonder why this needs to be changed now. Previously there was a generic parameter source assigned for the ForceMerge runner which would pass any parameters that are defined on the operation but now that we have a custom parameter source we need to be explicit about it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for explaining, makes sense. I've updated it as per above, and also added a test that validates all the params.

@ghost
Copy link

ghost commented Jan 23, 2020

Hi @rohitnair, we have found your signature in our records, but it seems like you have signed with a different e-mail than the one used in your Git commit. Can you please add both of these e-mails into your Github profile (they can be hidden), so we can match your e-mails to your Github profile?

@danielmitterdorfer
Copy link
Member

@elasticmachine test this please

Copy link
Member

@danielmitterdorfer danielmitterdorfer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me now and I'll merge this soon. Your change will be part of the next Rally release (most likely 1.4.1). Thanks for your contribution!

@danielmitterdorfer danielmitterdorfer merged commit 21347e1 into elastic:master Jan 23, 2020
@danielmitterdorfer danielmitterdorfer added this to the 1.4.1 milestone Jan 23, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement Improves the status quo :Load Driver Changes that affect the core of the load driver such as scheduling, the measurement approach etc.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Update force-merge operation to only force-merge track specific indices
2 participants