-
-
Notifications
You must be signed in to change notification settings - Fork 32k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
InfluxDB send retry after IOError #7247
Conversation
Please rebase it |
Rebased. :) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't like that. I think this functionality need to handle on API site and not inside home-assistant. I think that will be the wrong place. The code looks okay but make things complexer they should be
@pvizeli So do you suggest to maintain use a fork of the official influxDB api? I don't think that this will go upstream. Especially as this would mean, that the API has to be aware of home-assistants scheduling system. |
I don't think this belongs in the influxdb component, either. It's a matter of connection reliability not specific to influxdb. Would it be possible to abstract it into a helper decorator, similar to how we have a throttle decorator, but for retries? That would then be reusable all over home assistant and have the same potential configuration. |
It would be easy. It just takes the possibility to have specific log messages - and I'm not sure if its really worth having an decorator for that. Anyway, it could look like that: in the influx component: @RetryOnError(hass, retry_limit=max_tries, retry_delay=20)
def _write_data(json_body):
try:
influx.write_points(json_body)
except exceptions.InfluxDBClientError:
_LOGGER.exception("Error saving event %s to InfluxDB", json_body) In util: class RetryOnError(object):
def __init__(self, hass, retry_limit=0, retry_delay=20):
self.hass = hass
self.retry_limit = retry_limit
self.retry_delay = datetime.timedelta(seconds=retry_delay)
def __call__(self, method):
@functools.wraps(method)
def wrapper(*args, **kwargs):
def scheduled(retry=0, event=None):
try:
method(*args, **kwargs)
except Exception:
target = dt_util.utcnow() + self.retry_delay
if retry == self.retry_limit:
raise
track_point_in_utc_time(self.hass,
functools.partial(scheduled,
retry + 1),
target)
scheduled()
return wrapper Looks still not very nice, is more code and lacks specific log messages. |
I'll like the helper decorator approach better. But I think @balloob should have a say. |
My suggestion is that we have ca. 300-400 component/platform they could be the same problem. So If we need to have 30 lines per component/platform give that 10'500 lines that we need to maintenance and fill our bus and job handler with data and can make new problems. At the end we can not handle that and a API can handle that better and if the API will not support that could be that have a reason... |
tests/util/test_init.py
Outdated
self.assertEqual(mock_method.call_count, 1) | ||
|
||
|
||
mock_method.side_effect = Exception() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
too many blank lines (2)
homeassistant/components/influxdb.py
Outdated
@@ -97,7 +103,7 @@ def influx_event_listener(event): | |||
state = event.data.get('new_state') | |||
if state is None or state.state in ( | |||
STATE_UNKNOWN, '', STATE_UNAVAILABLE) or \ | |||
state.entity_id in blacklist: | |||
state.entity_id in blacklist: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
continuation line over-indented for visual indent
homeassistant/components/influxdb.py
Outdated
|
||
import functools |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
'functools' imported but unused
homeassistant/components/influxdb.py
Outdated
@@ -5,13 +5,16 @@ | |||
https://home-assistant.io/components/influxdb/ | |||
""" | |||
import logging | |||
import datetime |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
'datetime' imported but unused
Rebased and refactored the retry to a |
This adds an optional max_retries parameter to the InfluxDB component to specify if and how often the component should try to send the data if the connection failed due to an IOError. The sending will be scheduled for a retry in 20 seconds as often as the user specified. This can be handy for flaky getwork connections between the DB and Homeassistant or outages like daily DSL reconnects. Signed-off-by: Jan Losinski <losinski@wh2.tu-dresden.de>
Signed-off-by: Jan Losinski <losinski@wh2.tu-dresden.de>
Signed-off-by: Jan Losinski <losinski@wh2.tu-dresden.de>
Signed-off-by: Jan Losinski <losinski@wh2.tu-dresden.de>
This replaces the scheduling logic in the InfluxDB component with the RetryOnError decorator from homeassistant.util Signed-off-by: Jan Losinski <losinski@wh2.tu-dresden.de>
... and fixed another rebase error. |
Signed-off-by: Jan Losinski <losinski@wh2.tu-dresden.de>
Sorry but that is to wired. I.e. you should only schedule new stuff is not a other schedule penden and you will go back to old problem. If you endless schedule stuff we run into a mem problem in some case. Let's make a PR here: https://github.com/influxdata/influxdb-python and fix that on base. But for now I'm silent and wait what Paulus mean. |
@pvizeli No, I schedule only "max retries" times. I'm not seeing this in any for going upstream, because hey don't have any scheduling logic - so blocking would be the only thing they could do - which is not an option I think. Of course, we can add fancy logic lo limit the number of "pending" calls - but I think that would add unnecessary overhead. I use the solution from here since a few weeks without any problems. As the calls are always dropped after some retries its also not a problem that it grows somehow in memory. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So I don't think that it should perse have to go upstream into influx Python. It's up to the caller of the API to implement retry logic as the caller knows what the current conditions are.
I do however share the concern with Pascal is that these retry chains are getting orphaned - we do not track them. I think that having a generic decorator is not the right solution because we will promote the wrong solution of just fire and forget including a retry. This will hurt us in the long term.
For now I would say that the retry logic should stay in the InfluxDB component. Probably also keep track of how many retries are going on and make sure to start dropping events when the list of pending events in delay are getting too big.
# pylint: disable=broad-except | ||
try: | ||
method(*args, **kwargs) | ||
except Exception: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should never catch generic exceptions. It should always be a specific one. Add it as an argument to the decorator and default it to IOError
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can change that. I thought it might be good to act on all exceptions that are not handled within the decorated method.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Catching Exception
is never a good idea. So many things will go unnoticed, including a user pressing ctrl+c !
@@ -48,6 +49,7 @@ | |||
vol.Optional(CONF_DB_NAME, default=DEFAULT_DATABASE): cv.string, | |||
vol.Optional(CONF_PORT): cv.port, | |||
vol.Optional(CONF_SSL): cv.boolean, | |||
vol.Optional(CONF_RETRY_COUNT): cv.positive_int, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is no default so this value can be 0
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, so I just set a default here too?
in seconds as arguments. | ||
""" | ||
|
||
def __init__(self, hass, retry_limit=0, retry_delay=20): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
20 seems very high, I think by default it should be 0.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And since 0 is the default, you can make hass
an optional argument.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
0 is actually "higher" - as this tells how long to wait before the next retry. So 0 would meas "instant retry" - which makes no sense most of the time.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If hass is optional, shall we throw then an exception if someone uses this decorator with a value other than 0 and don't pass a hass instance?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, so I change it back from the prior request to make a generic decorator and just put it in the component. Is this the final decision?
To argue against the orphaned requests: They do not happen. There is a retry limit. So if you exceed the retry limit, the request gets dropped (as it was before without any retrying). So if you have 10 requests per second, 20 secs of retry delay and 4 retries, you have never more than 1000 requests queues, as after 100 seconds the first request gets dropped.
So there is never a "getting too big". Of course we can add some heuristics to guess when it is necessary to stop retrying and instead ping? if the server is available again. But I think this would be way to complex for this simple usecase.
in seconds as arguments. | ||
""" | ||
|
||
def __init__(self, hass, retry_limit=0, retry_delay=20): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
0 is actually "higher" - as this tells how long to wait before the next retry. So 0 would meas "instant retry" - which makes no sense most of the time.
in seconds as arguments. | ||
""" | ||
|
||
def __init__(self, hass, retry_limit=0, retry_delay=20): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If hass is optional, shall we throw then an exception if someone uses this decorator with a value other than 0 and don't pass a hass instance?
@@ -48,6 +49,7 @@ | |||
vol.Optional(CONF_DB_NAME, default=DEFAULT_DATABASE): cv.string, | |||
vol.Optional(CONF_PORT): cv.port, | |||
vol.Optional(CONF_SSL): cv.boolean, | |||
vol.Optional(CONF_RETRY_COUNT): cv.positive_int, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, so I just set a default here too?
# pylint: disable=broad-except | ||
try: | ||
method(*args, **kwargs) | ||
except Exception: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can change that. I thought it might be good to act on all exceptions that are not handled within the decorated method.
The requests are still orphaned because influxdb integration has no clue how many requests are still in flight or how many requests will hit the DB in the next X time. As long as you keep it in influxdb, I'm fine with it. |
This PR seems to have gone stale. Closing it. You can reopen it when you're ready to finish it. |
I can't reopen it by myself. |
Closing this again as there hasn't been any activity since re-opening. Also is this solved by #8209? |
It is not solved. I'll work on this if its open as soon as I've time. At
the moment it looks like I can get this done in the second half of August.
I don't understand why unsolved problems that are fixed and just need some
polishing are closed.
…On Sat, Jul 8, 2017 at 11:49 AM Lewis Juggins ***@***.***> wrote:
Closed #7247 <#7247>.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#7247 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AAF3krw_R5x8lBIxgmIQtyRUztlzKiupks5sL1DFgaJpZM4NFbdd>
.
|
So we know what's waiting for review, or ready to be merged. A pull request signifies "My changes are ready, take my changes!", while this is not the case for this PR. I'd suggest continuing to work on the branch, and poke us when you're ready for review. Perspective: If everyone raised a PR for their unfinished change, it would be a nightmare to triage. |
Ok, For me it's dismissing work: "Go away". Especially as I cannot reopen it. And I don't want to work on closed PRs. It works well. It is tested and can be merged I use it every day. It is in the state "My changes are ready, take my changes!". Is this the new policy, to close all working PRs where small changes are requested? |
@janLo In general stale PR's are closed to keep the code base and PR's incoming maintainable. There's quite a few PR's to all the Home Assistant repositories and the people who review and merge are limited in numbers and are all doing it in their free time, with limited limited tools(Github Gui). Taking it as "go away" is the last thing you should do since you've gotten valuable feedback on what some of the core maintainers think needs changing to integrate this into HA. There's nothing that prevents you from working on this in your own fork until the requested changes have been made and just simply put a reply on this PR to have this reopened at that time. |
As I said, The PR is working, tested and ready to merge. Only cosmetic
changes where requested that I cannot provide at the moment doe to limited
time. To close it means to me, that you're not interested in the so-far
provided work and don't want to wait for the requested polishing. How about
a flag "polishing in progress" - that can be filtered out until the Author
requests a new review?
Closing without merge means reject to me. I'm maintaining my own fork
anyway, so it works for well for me without a merge to upstream.
If someone else wants to pick it up - feel free.
…On Mon, Jul 17, 2017 at 7:09 PM Fredrik Lindqvist ***@***.***> wrote:
@janLo <https://github.com/janlo> In general stale PR's are closed to
keep the code base and PR's incoming maintainable. There's quite a few PR's
to all the Home Assistant repositories and the people who review and merge
are limited in numbers and are all doing it in their free time, with
limited limited tools(Github Gui).
Taking it as "go away" is the last thing you should do since you've gotten
valuable feedback on what some of the core maintainers think needs changing
to integrate this into HA. There's nothing that prevents you from working
on this in your own fork until the requested changes have been made and
just simply put a reply on this PR to have this reopened at that time.
It's really the minimal amount of effort for everyone involved (submitters
and maintainers) that can be done with the limited tools available.
If you have a better way of keeping a project the size of HA organized I'm
sure that any feedback on that process would be greatly appreciated(but not
in this PR). 😄
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#7247 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AAF3kvx0AsGqUpP3p_4KWOvzufMv6um3ks5sO5RagaJpZM4NFbdd>
.
|
Description:
This adds an optional max_retries parameter to the InfluxDB component
to specify if and how often the component should try to send the data
if the connection failed due to an IOError.
The sending will be scheduled for a retry in 20 seconds as often as the
user specified. This can be handy for flaky getwork connections between
the DB and Homeassistant or outages like daily DSL reconnects.
Pull request in home-assistant.github.io with documentation (if applicable): home-assistant/home-assistant.io#2485
Example entry for
configuration.yaml
(if applicable):Checklist:
If user exposed functionality or configuration variables are added/changed:
If the code communicates with devices, web services, or third-party tools:
tox
run successfully. Your PR cannot be merged unless tests pass