- Overview
- Module Description - What the module does and why it is useful
- Setup - The basics of getting started with this module
- Usage - Configuration options and additional functionality
- Reference - An under-the-hood peek at what the module is doing and how
- Limitations - OS compatibility, etc.
- Development - Guide for contributing to the module
This Puppet module manages the installation and configuration of Cloudera Manager, a management application for Apache Hadoop, on the Cloudera official supported operating systems.
This module manages the installation of Cloudera Manager, a management application for Apache Hadoop. It follows the standards written in the Cloudera Manager Installation Guide "Installation Path B - Installation Using Your Own Method". By default, this module assumes that parcels will be used to deploy Cloudera's Distribution of Apache Hadoop (CDH) and related software. If parcels are not desired, this module can also manage the installation of CDH including HDFS & MapReduce, Impala, Sentry, Search, Spark, HBase, and LZO compression. The module can also configure TLS security of the Cloudera Manager communications channels, and set up Cloudera Manager to use an alternative to the embedded database.
This module is certified on Cloudera 5.
- Installs the Cloudera software repository for CM.
- Installs Oracle Java Development Kit (JDK) 7.
- Optionally installs the Oracle Java Cryptography Extensions.
- Installs the CM agent.
- Configures the CM agent to talk to a CM server.
- Starts the CM agent.
- Sets the kernel vm.swappiness to 0.
- Disables the kernel transparent hugepage compaction.
- Separately installs the CM server and database connectivity (by default to the embedded database server).
- Separately starts the CM server.
- Optionally installs the Cloudera software repository for CDH.
- Optionally installs most components of CDH 5 including HBase, Impala, Search, and Spark.
- Optionally installs GPL Extras (LZO).
Please read through the Cloudera Manager Requirements document in order to discover all of the entities (ie operating systems, databases, and browsers) supported by Cloudera Manager. Pay close attention to the Resource Requirements and Networking and Security Requirements sections. There are a number of requirements that this module cannot easily configure for your environment (ie No blocking by Security-Enhanced Linux (SELinux)) and which you must ensure are correct on your platform.
Most nodes that will be a part of a Hadoop cluster will use this declaration.
class { '::cloudera':
cm_server_host => 'smhost.localdomain',
}
The node that will be the CM server (ie smhost.localdomain) will use this declaration. This should only be included on one node of your environment. By default it will install the embedded PostgreSQL database on the same node. With the correct parameters, it can instead connect to local or remote MySQL, PostgreSQL, or Oracle RDBMS databases.
class { '::cloudera':
cm_server_host => 'smhost.localdomain',
install_cmserver => true,
}
- The default for
use_parcels
will switch totrue
before the 1.0.0 release.
This:
class { '::cloudera':
cm_server_host => 'smhost.localdomain',
}
would become this:
class { '::cloudera':
cm_server_host => 'smhost.localdomain',
use_parcels => false,
}
-
The puppetlabs/mysql dependency will update to version 2 before the 1.0.0 release. Make sure to review its changelog in the case of an upgrade.
-
The class
::cloudera::repo
will be renamed to::cloudera::cdh::repo
and the Impala repository will be split out into::cloudera::impala::repo
before the 1.0.0 release.
This:
class { '::cloudera::repo':
cdh_version => '4.1',
cm_version => '4.1',
}
would become this:
class { '::cloudera::cdh::repo':
version => '4.1',
}
class { '::cloudera::impala::repo':
version => '4.1',
}
- The class parameters and variables
yumserver
andyumpath
have been renamed toreposerver
andrepopath
respectively for the 2.0.0 release. This makes the name more generic as it applies to APT and Zypprepo as well as YUM package repositories.
This:
class { 'cloudera':
cm_yumserver => 'http://packageserver.localdomain',
cm_yumpath => '/gplextras/',
}
would become this:
class { 'cloudera':
cm_reposerver => 'http://packageserver.localdomain',
cm_repopath => '/gplextras/',
}
- The
use_gplextras
parameter has been renamed toinstall_lzo
for the 2.0.0 release.
This:
class { 'cloudera':
cm_server_host => 'smhost.example.com',
use_gplextras => true,
}
would become this:
class { 'cloudera':
cm_server_host => 'smhost.example.com',
install_lzo => true,
}
-
The puppetlabs/postgresql dependency will update to version 3 or newer for the 3.0.0 release. Make sure to review its changelog in the case of an upgrade.
-
The herculesteam/augeasproviders modules will replace domcleal/augeasproviders for the 3.0.0 release.
All interaction with the cloudera module can be done through the main cloudera class. This means you can simply toggle the options in ::cloudera
to have full functionality of the module.
Level 1: Configuring TLS Encryption only for Cloudera Manager
Level 2: Configuring TLS Authentication of Server to Agents
Level 3: Configuring TLS Authentication of Agents to Server
This module's deployment of TLS provides both level 1 and level 2 configuration (encryption and authentication of the server to the agents). Level 3 is not presently implemented. You will need to provide a TLS certificate and the signing certificate authority for the CM server. See the File resources in the below example for where the files need to be deployed.
There are some settings inside CM that can only be configured manually. See the Level 1 instructions for the details of what to change in the WebUI and use the below values:
Setting Value
Use TLS Encryption for Agents (check)
Path to TLS Keystore File /etc/cloudera-scm-server/keystore
Keystore Password The value of server_keypw in Class['::cloudera::cm5::server'].
Use TLS Encryption for (check)
Admin Console
The node that will be the CM agent may use this declaration:
class { '::cloudera':
server_host => 'smhost.localdomain',
use_tls => true,
install_jce => true,
}
file { '/etc/pki/tls/certs/cloudera_manager.crt': }
The node that will be the CM agent+server may use this declaration:
class { '::cloudera':
server_host => 'smhost.localdomain',
install_cmserver => true,
use_tls => true,
install_jce => true,
server_keypw => 'myPassWord',
}
file { '/etc/pki/tls/certs/cloudera_manager.crt': }
file { '/etc/pki/tls/certs/cloudera_manager-ca.crt': }
file { "/etc/pki/tls/certs/${::fqdn}-cloudera_manager.crt": }
file { "/etc/pki/tls/private/${::fqdn}-cloudera_manager.key": }
If you decide not to use the embedded database, the Cloudera Manager server database configuration can be completed by configuring this module to call the scm_prepare_database.sh
script. The external database must be configured and ready for connection with the supplied credentials via some method outside of this module.
class { '::cloudera':
cm_server_host => 'smhost.localdomain',
install_cmserver => true,
db_type => 'postgresql',
db_host => 'dbhost.localdomain',
db_port => '5432',
db_user => 'root',
db_pass => 'SeCrEt',
}
Parcel is an alternative binary distribution format supported by Cloudera Manager 4.5+ that simplifies distribution of CDH and other Cloudera products. By default, this module assumes software deployment of CDH via parcel. To allow Cloudera Manager to install CDH via RPMs (or DEBs) instead of parcels, just set use_parcels => false
.
Nodes that will be cluster members will use this declaration:
class { '::cloudera':
cm_server_host => 'smhost.localdomain',
use_parcels => false,
}
For more advanced use cases, nodes that will be gateways may use this declaration to install extra parts of CDH:
class { '::cloudera':
cm_server_host => 'smhost.localdomain',
use_parcels => false,
}
class { '::cloudera::cdh5::mahout': }
class { '::cloudera::cdh5::kite': }
# Install Oozie WebUI support (optional):
class { '::cloudera::cdh5::oozie::ext': }
# Install MySQL support (optional):
class { '::cloudera::cdh5::hue::mysql': }
class { '::cloudera::cdh5::oozie::mysql': }
For more advanced use cases, the node that will be just the CM server may use this declaration: (This will skip installation of the CDH software as it is not required.)
class { '::cloudera::cm5::repo': } ->
class { '::cloudera::java5': } ->
class { '::cloudera::java5::jce': } ->
class { '::cloudera::cm5': } ->
class { '::cloudera::cm5::server': }
Hadoop-specific LZO compression libraries are available in the Cloudera GPL Extras repository. To deploy the Hadoop-specific and also the native libraries on a non-parcel system just add install_lzo => true
to the class declaration. Additional configuration in Cloudera Manager will be required to activate the functionality (ignore the mention of parcels in the link to the documentation).
class { '::cloudera':
cm_server_host => 'smhost.localdomain',
use_parcels => false,
install_lzo => true,
}
To deploy the native LZO compression libraries on a parcel system just add install_lzo => true
to the class declaration. Additional configuration in Cloudera Manager will be required to activate the functionality.
class { '::cloudera':
cm_server_host => 'smhost.localdomain',
use_parcels => true,
install_lzo => true,
}
cloudera
: Installs and configures Cloudera Manager. Includes most other classes.
cloudera::java5
: Installs the Oracle Java Development Kit (JDK) from the Cloudera Manager repository.cloudera::java5::jce
: Installs the Oracle Java Cryptography Extension (JCE) unlimited strength jurisdiction policy files.cloudera::cm5
cloudera::cm5::repo
cloudera::cm5::server
cloudera::cdh5
cloudera::cdh5::repo
cloudera::gplextras5
cloudera::gplextras5::repo
cloudera::java
: Installs the Oracle Java Development Kit (JDK) from the Cloudera Manager repository.cloudera::java::jce
: Installs the Oracle Java Cryptography Extension (JCE) unlimited strength jurisdiction policy files.cloudera::cm
cloudera::cm::repo
cloudera::cm::server
cloudera::cdh
cloudera::cdh::repo
cloudera::gplextras
cloudera::gplextras::repo
cloudera::impala
cloudera::impala::repo
cloudera::search
cloudera::search::repo
cloudera::lzo
Ensure if present or absent. Default: present
Upgrade package automatically, if there is a newer version. Default: false
Ensure if service is running or stopped. Default: running
Start service at boot. Default: true
URI of the YUM server. Default: http://archive.cloudera.com
The path to add to the $cdh_reposerver URI. Only set this if your platform is not supported or you know what you are doing. Default: auto-set, platform specific
The version of Cloudera's Distribution, including Apache Hadoop to install. Default: 5
URI of the YUM server. Default: http://archive.cloudera.com
The path to add to the $cm_reposerver URI. Only set this if your platform is not supported or you know what you are doing. Default: auto-set, platform specific
The version of Cloudera Manager to install. Default: 5
The path to add to the $cm_reposerver URI. Only set this if your platform is not supported or you know what you are doing. Default: auto-set, platform specific
URI of the YUM server. Default: http://archive.cloudera.com
The path to add to the $ci_reposerver URI. Only set this if your platform is not supported or you know what you are doing. Default: auto-set, platform specific
The version of Cloudera Impala to install. Default: 1
URI of the YUM server. Default: http://archive.cloudera.com
The path to add to the $cs_reposerver URI. Only set this if your platform is not supported or you know what you are doing. Default: auto-set, platform specific
The version of Cloudera Search to install. Default: 1
URI of the YUM server. Default: http://archive.cloudera.com
The path to add to the $cg_reposerver URI. Only set this if your platform is not supported or you know what you are doing. Default: auto-set, platform specific
The version of Cloudera Search to install. Default: 5
Hostname of the Cloudera Manager server. Default: localhost
Port to which the Cloudera Manager server is listening. Default: 7182
Whether to enable TLS on the Cloudera Manager server and agent. Default: false
The file holding the public key of the Cloudera Manager server as well as the chain of signing certificate authorities. PEM format. Default: /etc/pki/tls/certs/cloudera_manager.crt or /etc/ssl/certs/cloudera_manager.crt
Whether to install CDH software via parcels or packages. Default: true
Whether to install the native LZO compression library packages. If use_parcels is false, then also install the Hadoop-specific LZO compression library packages. You must configure and deploy the GPLextras parcel repository if use_parcels is true. Default: false
Whether to install the Cloudera supplied Oracle Java Development Kit. If this is set to false, then an Oracle JDK will have to be installed prior to applying this module. Default: true
Whether to install the Oracle Java Cryptography Extension unlimited strength jurisdiction policy files. This requires manual download of the zip file. See files/README_JCE.md for download instructions. Default: false
Whether to install the Cloudera Manager Server. This should only be set to true on one host in your environment. Default: false
Name of the database to use for Cloudera Manager. Default: scm
Name of the user to use to connect to database_name. Default: scm
Password to use to connect to database_name. Default: scm
Host to connect to for database_name. Default: localhost
Port on db_host to connect to for database_name. Default: 3306
Administrative database user on db_host. Default: root
Administrative database user db_user password. Default:
Which type of database to use for Cloudera Manager. Valid options are embedded, mysql, oracle, or postgresql. Default: embedded
The file holding the PEM public key of the Cloudera Manager server certificate authority. Default: /etc/pki/tls/certs/cloudera_manager-ca.crt or /etc/ssl/certs/cloudera_manager-ca.crt
The file holding the PEM public key of the Cloudera Manager server. Default: /etc/pki/tls/certs/${::fqdn}-cloudera_manager.crt or /etc/ssl/certs/${::fqdn}-cloudera_manager.crt
The file holding the PEM private key of the Cloudera Manager server. Default: /etc/pki/tls/private/${::fqdn}-cloudera_manager.key or /etc/ssl/private/${::fqdn}-cloudera_manager.key
The file holding the PEM public key(s) of the Cloudera Manager server intermediary certificate authority. Default: none
The password used to protect the keystore. Default: none
The URL to the proxy server for the YUM repositories. Default: absent
The username for the YUM proxy. Default: absent
The password for the YUM proxy. Default: absent
The directory where parcels are downloaded and distributed. Default: /opt/cloudera/parcels
Cloudera official supported operating systems for CM4 and supported operating systems for CM5.
- RedHat family - tested on CentOS 5.9, CentOS 6.4
- SuSE family - tested on SLES 11SP3
- Debian family - tested on Debian 6.0.7, Debian 7.0, Ubuntu 10.04.4 LTS, and Ubuntu 12.04.2 LTS
- Cloudera Manager - tested with 4.1.2, 4.8.0, and 5.0.0beta2
- CDH - tested with 4.1.2 and 4.5.0, 5.0.0beta2
- Cloudera Impala - tested with 1.0 and 1.2.3
- Cloudera Search - tested with 1.1.0
- Cloudera GPL Extras - tested with 4.3.0 and 5.0.0
- Supports Top Scope variables (i.e. via Dashboard) and Parameterized Classes.
- Based on the Cloudera Manager 5.0.0 Beta 2 Installation Guide
- TLS certificates must be in PEM format and are not deployed by this module.
- When using parcels, the CDH software is not deployed by Puppet. Puppet will only install the Cloudera Manager server/agent. You must then configure Cloudera Manager to deploy the parcels.
- When installing packages and not parcels on SLES, SP2 is required as the hadoop-2.0.0+1518-1.cdh4.5.0.p0.24.sles11.x86_64 package requires netcat-openbsd which is not available on SLES 11SP1.
- Osfamily RedHat 5 requires the EPEL YUM repository when installing LZO support.
- This module does not support upgrading from CDH4 to CDH5 packages, including Impala, Search, and GPL Extras.
- Need external module support for the Oracle Instant Client JDBC.
- When using an external PostgreSQL server that is on the same host as the CM server, PostgreSQL must be configured to accept connections with md5 password authentication.
- Osfamily RedHat 5 requires Python 2.6 from the EPEL YUM repository when installing the Hue service.
See TODO.md for more items.
Please see CONTRIBUTING.md for information on how to contribute.
Copyright (C) 2013 Mike Arnold mike@razorsedge.org
Licensed under the Apache License, Version 2.0.