Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

new gems are not loaded properly causing td-agent service reload/restart failure #75

Open
nick-james opened this issue Jan 28, 2016 · 7 comments

Comments

@nick-james
Copy link

When trying to install td-agent together with the fluent-plugin-elasticsearch plugin, the fluent-plugin-elasticsearch is successfully installed, but the initial restart/reload of the server during the install procedure fails - so after the chef-client has run td-agent is not running, but should be. However when the td-agent server is manually restarted following the installation then it starts fine ( which shows the elasticsearch gem was installed and the td-agent config is OK).

Here is the recipe I am running to install td-agent:

# dependency for elasticsearch-fluentd gem for fluentd plugin                                                                                                                                                       
case node['platform_family']
when 'debian'
  package 'libcurl3-dev' do
    action :install
  end
when 'rhel'
  package 'libcurl-devel' do
    action :install
  end
else
  Chef::Log.fatal("platform family #{node['platform_family']} not supported")
end

include_recipe 'td-agent::default'

# Add a listen to http for curl test POSTs                                                                                                                                                                          
td_agent_source 'test_in_http' do
  type 'http'
  params(port: '8080')
end

# Store incoming to elasticsearch                                                                                                                                                                                   
td_agent_match 'test_out_elasticsearch' do
  type 'elasticsearch'
  tag 'test.*'
  params(host: 'localhost',
         port: '9200',
         logstash_format: 'true',
         type_name: 'test_elasticsearch',
         flush_interval: 1)
end

Here are the relevant attributes:

# tell td-agent to use the includes dir                                                                                                                                         
default['td_agent']['includes'] = true

# tell td-agent not to use the default config                                                                                                                                                                       
default['td_agent']['default_config'] = false

default['td_agent']['plugins'] = ['elasticsearch']

I am running this using test kitchen with the latest chef client (12.6.0-1) on ubuntu 14.04

And on the ubuntu 14.04 machine itself this is the output I see from the td-agent server log (all from during chef run):

cat /var/log/td-agent/td-agent.log 
2016-01-28 13:01:57 +0000 [info]: reading config file path="/etc/td-agent/td-agent.conf"
2016-01-28 13:01:57 +0000 [info]: starting fluentd-0.12.19
2016-01-28 13:01:57 +0000 [info]: gem 'fluent-mixin-config-placeholders' version '0.3.0'
2016-01-28 13:01:57 +0000 [info]: gem 'fluent-mixin-plaintextformatter' version '0.2.6'
2016-01-28 13:01:57 +0000 [info]: gem 'fluent-plugin-mongo' version '0.7.11'
2016-01-28 13:01:57 +0000 [info]: gem 'fluent-plugin-rewrite-tag-filter' version '1.5.3'
2016-01-28 13:01:57 +0000 [info]: gem 'fluent-plugin-s3' version '0.6.4'
2016-01-28 13:01:57 +0000 [info]: gem 'fluent-plugin-scribe' version '0.10.14'
2016-01-28 13:01:57 +0000 [info]: gem 'fluent-plugin-td' version '0.10.28'
2016-01-28 13:01:57 +0000 [info]: gem 'fluent-plugin-td-monitoring' version '0.2.1'
2016-01-28 13:01:57 +0000 [info]: gem 'fluent-plugin-webhdfs' version '0.4.1'
2016-01-28 13:01:57 +0000 [info]: gem 'fluentd' version '0.12.19'
2016-01-28 13:01:57 +0000 [info]: using configuration file: <ROOT>
</ROOT>
2016-01-28 13:02:36 +0000 [info]: restarting
2016-01-28 13:02:36 +0000 [info]: reading config file path="/etc/td-agent/td-agent.conf"
2016-01-28 13:02:36 +0000 [info]: shutting down fluentd
2016-01-28 13:02:36 +0000 [info]: process finished code=0
2016-01-28 13:02:36 +0000 [error]: fluentd main process died unexpectedly. restarting.
2016-01-28 13:02:36 +0000 [info]: starting fluentd-0.12.19
2016-01-28 13:02:36 +0000 [info]: gem 'fluent-mixin-config-placeholders' version '0.3.0'
2016-01-28 13:02:36 +0000 [info]: gem 'fluent-mixin-plaintextformatter' version '0.2.6'
2016-01-28 13:02:36 +0000 [info]: gem 'fluent-plugin-mongo' version '0.7.11'
2016-01-28 13:02:36 +0000 [info]: gem 'fluent-plugin-rewrite-tag-filter' version '1.5.3'
2016-01-28 13:02:36 +0000 [info]: gem 'fluent-plugin-s3' version '0.6.4'
2016-01-28 13:02:36 +0000 [info]: gem 'fluent-plugin-scribe' version '0.10.14'
2016-01-28 13:02:36 +0000 [info]: gem 'fluent-plugin-td' version '0.10.28'
2016-01-28 13:02:36 +0000 [info]: gem 'fluent-plugin-td-monitoring' version '0.2.1'
2016-01-28 13:02:36 +0000 [info]: gem 'fluent-plugin-webhdfs' version '0.4.1'
2016-01-28 13:02:36 +0000 [info]: gem 'fluentd' version '0.12.19'
2016-01-28 13:02:36 +0000 [info]: adding match pattern="test.*" type="elasticsearch"
2016-01-28 13:02:36 +0000 [error]: config error file="/etc/td-agent/td-agent.conf" error="Unknown output plugin 'elasticsearch'. Run 'gem search -rd fluent-plugin' to find plugins"
2016-01-28 13:02:36 +0000 [info]: process finished code=256
2016-01-28 13:02:36 +0000 [warn]: process died within 1 second. exit.

td-agent is not running after the chef run:

sudo service td-agent status 
 * td-agent is not running

But if I manually restart it then it works fine:

sudo service td-agent restart
Restarting td-agent:  * td-agent

And there are now no errors in the td-agent log:

2016-01-28 13:10:00 +0000 [info]: reading config file path="/etc/td-agent/td-agent.conf"
2016-01-28 13:10:00 +0000 [info]: starting fluentd-0.12.19
2016-01-28 13:10:00 +0000 [info]: gem 'fluent-mixin-config-placeholders' version '0.3.0'
2016-01-28 13:10:00 +0000 [info]: gem 'fluent-mixin-plaintextformatter' version '0.2.6'
2016-01-28 13:10:00 +0000 [info]: gem 'fluent-plugin-elasticsearch' version '1.3.0'
2016-01-28 13:10:00 +0000 [info]: gem 'fluent-plugin-mongo' version '0.7.11'
2016-01-28 13:10:00 +0000 [info]: gem 'fluent-plugin-rewrite-tag-filter' version '1.5.3'
2016-01-28 13:10:00 +0000 [info]: gem 'fluent-plugin-s3' version '0.6.4'
2016-01-28 13:10:00 +0000 [info]: gem 'fluent-plugin-scribe' version '0.10.14'
2016-01-28 13:10:00 +0000 [info]: gem 'fluent-plugin-td' version '0.10.28'
2016-01-28 13:10:00 +0000 [info]: gem 'fluent-plugin-td-monitoring' version '0.2.1'
2016-01-28 13:10:00 +0000 [info]: gem 'fluent-plugin-webhdfs' version '0.4.1'
2016-01-28 13:10:00 +0000 [info]: gem 'fluentd' version '0.12.19'
2016-01-28 13:10:00 +0000 [info]: adding match pattern="test.*" type="elasticsearch"
2016-01-28 13:10:00 +0000 [info]: adding source type="http"
2016-01-28 13:10:00 +0000 [info]: using configuration file: <ROOT>
  <source>
    type http
    port 8080
  </source>
  <match test.*>
    type elasticsearch
    host localhost
    port 9200
    logstash_format true
    type_name test_elasticsearch
    flush_interval 1
  </match>
</ROOT>
@nick-james
Copy link
Author

Some more info on this - it appears to be the service reload feature that is not working - which may be a problem with the td-agent init script.
If I issue a service restart command in the chef recipe following the td_agent_source or td_agent_match command then the restart succeeds but the subsequent reload action fails. td_agent_match and d_agent_source are configured to call reload when available in a delayed manner, so it happens at the end of the chef run. So the td-agent service restarts successfully during the chef run but then fails on the reload at the end, the outcome of which is that the td-agent server is left not running at the end of the chef run.

A workaround solution to this is to add a delayed restart notify in the td_agent_source or td_agent_match call, like so:

td_agent_match 'test_out_elasticsearch' do
  type 'elasticsearch'
  tag 'test.*'
  params(host: 'localhost',
         port: '9200',
         logstash_format: 'true',
         type_name: 'test_elasticsearch',
         flush_interval: 1)
  notifies :restart, "service[td-agent]", :delayed
end

So in summary I think the issue is that the service td-agent reload command is not picking up td-agent gems that were installed after the td-agent server was started.

@ghTravis
Copy link

I also ran into this issue yesterday and today.

The reload command seems to simply reload the config, and not load plugins. I did what you did above and put a notifies :restart action in my plugin install resource

@petetnt
Copy link

petetnt commented Aug 4, 2016

I think I am running into this same issue, using fluentd-plugin-kinesis.

Everything I try fails with

Error executing action `reload` on resource 'service[td-agent]'

both on 2.2.0 as it does on 2.2.1. @nick-james's workaround sadly did nothing for me.

@nick-james
Copy link
Author

Hi @petetnt, yes I also noticed that the original workaround stopped working for us a month or two ago. I'm not entirely sure why this was, as it stopped working in 2.2.0 which was the version I used in the original workaround above - so presumably some underlying dependencies(s) have changed. Anyway, here is how I got it working again (not pretty I'm afraid!):

td_agent_match 'test_out_elasticsearch' do
  type 'elasticsearch'
  tag 'test.*'
  params(host: 'localhost',
         port: '9200',
         logstash_format: 'true',
         type_name: 'test_elasticsearch',
         flush_interval: 1)
  notifies :run, 'execute[stop_td_agent]', :delayed
end

# More extended workaround to get td-agent running after reloading the td-agent config                                                                                              
execute 'stop_td_agent' do
  command 'service td-agent stop'
  action :nothing
  notifies :run, 'execute[start_td_agent]', :delayed
  ignore_failure true
end

execute 'start_td_agent' do
  action :nothing
  command 'service td-agent start'
end

I also have this in my metadata file, for other reasons, but it could well be one of the underlying dependencies that affects the above.

depends 'ohai', '< 4.0.0'

@repeatedly
Copy link
Contributor

Hmm... which chef-td-agent code should be fixed?
Could anyone send a patch?

@masutaka
Copy link

I'm not sure, td-agent v2.3.0 might not have the problem. However v2.3.2 might have it.

For example my recipe was the following (I don't use treasure-data/chef-td-agent). This has the problem with v2.3.2.

service 'td-agent' do
  action [:enable, :start]
  supports restart: true, reload: true, status: true
end

cookbook_file '/etc/td-agent/conf.d/app.rb' do
  notifies :reload, 'service[td-agent]'
end

gem_package 'fluent_plugin-zabbix' do
  gem_binary 'td-agent-gem'
  notifies :reload, 'service[td-agent]'
end

I have changed 'reload' to 'restart'. It has no problem with v2.3.2.

service 'td-agent' do
  action [:enable, :start]
  supports restart: true, reload: true, status: true
end

cookbook_file '/etc/td-agent/conf.d/app.rb' do
  notifies :restart, 'service[td-agent]'
end

gem_package 'fluent-plugin-zabbix' do
  gem_binary 'td-agent-gem'
  notifies :restart, 'service[td-agent]'
end

@josqu4red
Copy link

I can confirm the issue is still there. Any plans to solve it ?
A possible fix/workaround would be to completely remove reload function from the cookbook.

Issue seems fixed by redefining the reload_available? method this way:

def reload_available?
  false
end

Should it be removed ?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants