This project is all about installing a CDSW cluster using Director on any of the Big Three Cloud Providers.
Basic idea is to have a single definition of a cluster which is shared across multiple cloud providers and to make it very simple for a user to say ‘I want this cluster to be on cloud provider X, or cloud provider Y’, confident that the cluster definition is the same; i.e. to separate out the cluster configuration that is independent of cloud providers from that which is unique to each provider and to make it easy for the user to indicate which cloud provider to use.
This project is focused on making this easy; not in exposing the end user to every possible configuration alternative.
We support and test on three cloud providers (AWS, Azure and GCP), and
the user choose which cloud provider to use by choosing the top level or
provider conf file (aws.conf
, azure.conf
or gcp.conf
).
Building a cluster this way takes about an hour, AFTER which it can take up to an HOUR after the cluster is ready for CDSW to also be ready. I’ve seen this on Azure. CDSW on AWS and GCP seems to take around 10-20 mins to get ready to deliver service.
If you want to struggle then stop reading right here, and just whack at it. Lots of people have gone that route and enjoyed the exercise.
If you’d prefer to get your job done and a CDSW cluster constructed, continue reading …
This project is based upon the following workflow. See subsequent sections for details:
- Install pre-requisites (i.e. Director and optionally an MIT KDC on the Director instance)
- Get this repository onto the Director instance (git, scp … however)
- Edit the appropriate files to reflect your environment
- Add the necessary SECRET and ssh key files
- Bootstrap the cluster
It will make more sense if you read the Project Structure section.
I simply setup a vm (4 CPUs, 16G RAM) and then use install_director.sh to install director.
I simply setup a vm (4 CPUs, 16G RAM) and then use install_director.sh to install director.
For GCP you will need to ensure that the plugin supports rhel7. Do this
by adding the following line to your google.conf
file. This file
should be located in the provider directory:
/var/lib/cloudera-director-plugins/google-provider-*/etc
(where the
*
matches the version - something like 1.0.4
- of your plugins). You
will likely have to create your own copy of google.conf by copying
google.conf.example
located in the same directory. Note that the exact
path to the relevant image is obtained by navigating to GCP’s ‘Images’
section and finding the corresponding OS/URL pair.
rhel7 = "https://www.googleapis.com/compute/v1/projects/rhel-cloud/global/images/rhel-7-v20171025"
Assuming gcloud is on your path in that director instance, then this script will do exactly what you need:
sudo tee /var/lib/cloudera-director-plugins/google-provider-*/etc/google.conf 1>/dev/null <<EOF google { compute { imageAliases { centos6="$(gcloud compute images list --filter='name ~ centos-6-v.*' --uri)", centos7="$(gcloud compute images list --filter='name ~ centos-7-v.*' --uri)", rhel6="$(gcloud compute images list --filter='name ~ rhel-6-v.*' --uri)", rhel7="$(gcloud compute images list --filter='name ~ rhel-7-v.*' --uri)" } } } EOF
Create a Cloudera Director by using the Microsoft Marketplace. In keeping with minimizing what you have to do this project assumes you have chosen the defaults whenever possible (e.g. networking etc)
You’ll need to note:
- the Resource Group that the director instance is created in
- The Region that the Resource Group is setup in
- the publicdomain name prefix of the director instance. (i.e. the hostname and instance name of the director VM)
- the host fqdn suffix (aka Private DNS domain name). This is the DNS zone in which the Director and cluster will be constructed.
- the private IP address of the director instance that is created (if you’re going to put an MIT KDC on the Director instance)
If I choose to use MIT Kerberos I install the MIT KDC on the Director VM, no matter which cloud provider I’m using.
I do that using install_mit_kdc.sh to install an mit kdc. (There’s also install_mit_client.sh to create a client for testing purposes.).
- Choose the cloud provider you’re going to work with and edit the
$PROVIDER/*.properties
and$PROVIDER/SECRET
files appropriately. - Ensure that all the files (including the SSH key file) is available to director (i.e copy or clone as necessary to the director server machine).
- Ensure that the SECRET files are in place
- Ensure that the
$PROVIDER/kerberos.properties
file is either absent (you don’t want a kerberized cluster) or is present and correct (in particular you want to ensure that theKDC_HOST_IP
property is set to the ip address of the KDC server host (which should also be the Director host). Note that its the ip address that you should use here because of a CDSW/Kubernetes defect: DSE-1796
- Execute a director bootstrap command using the cloud provider you
chose, but make sure you do it from the top directory (i.e. the one where the
common.conf
file is located).
cloudera-director bootstrap-remote $PROVIDER.conf --lp.remote.username=admin --lp.remote.password=admin
See No provider for what happens if you’re in the wrong directory.
- Once completed, use your cloud provider’s console to find the public
IP (e.g.
104.92.37.53
) address of the CDSW instance. Its name in the cloud provider’s console will begin withcdsw-
. - You can reach the CDSW at
cdsw.104.92.37.53.nip.io
. See NIP.io tricks for details about how thatnip.io
stuff works.
All nodes in the cluster will contain the user cdsw
. That user’s
password is Cloudera1
. (If you used my mit kdc installation scripts
from below then you’ll also find that this user’s kerberos username and
password are cdsw
and Cloudera1
also).
The system comprises a set of files, some common across cloud providers, and some specific to a particular cloud provider. The common files (and those which indicate which cloud provider to user) are all in the top level directory; the cloud provider specific files are cloud provider specific directories.
There are three kinds of files:
- Property Files - You are expected to modify these. They match the
*.properties
shell pattern and use the Java Properties format - Conf files - You are not expected to modify these. They match the
*.conf
shell pattern and use the HOCON format (a superset of JSON). - SECRET files - these have the prefix
SECRET
and are used to hold secrets for each provider. The exact format is provider specific.
The intent is that those items that you need to edit are in a format
(i.e. *.properties
files) that is easy to edit, whereas those items
that you don’t need to touch are in the harder to edit HOCON format
(i.e. *.conf
files).
The top level directory contains the main conf
files (aws.conf
,
azure.conf
& gcp.conf
). These are the files that indicate which
cloud provider is to be used.
The aws
, azure
and gcp
directories contain the files relevant to
each cloud provider. We’ll reference the general notion of a provider
directory using the $PROVIDER
nomenclature, where $PROVIDER
takes
the value aws
, azure
or gcp
.
The main configuration file is $PROVIDER.conf
. This file itself
includes the files needed for the specific cloud provider. We will only
describe the properties files here:
$PROVIDER/provider.properties
- a file containing the provider configuration for Amazon Web Services$PROVIDER/ssh.properties
- a file containing the details required to configure passwordless ssh access into the machines that director will create.$PROVIDER/kerberos.properties
- an optional file containing the details of the Kerberos Key Distribution Center (KDC) to be used for kerberos authentication. (See Kerberos Tricks below for details on how to easily setup an MIT KDC and use it). Ifkerberos.properties
is provided then a secure cluster is set up. Ifkerberos.properties
is not provided then an insecure cluster will be setup.
SECRET files are ignored by GIT and you must construct them yourself. We recommend setting their mode to 600, although that is not enforced anywhere.
The secret file for AWS is aws/SECRET.properties
. It is in Java
Properties format and contains the AWS secret access key:
AWS_SECRET_ACCESS_KEY=
Mine, with dots hiding characters from the secret key, looks like:
AWS_SECRET_ACCESS_KEY=53Hrd................r0wiBbKn3
If you fail to set up the AWS_SECRET_KEY
then you’ll find that
cloudera-director silently fails, but grepping for AWS_SECRET_KEY
in
the local log file will reveal all:
[centos@ip-10-0-0-239 ~]$ unset AWS_ACCESS_KEY_ID #just to make sure its undefined!
[centos@ip-10-0-0-239 ~]$ cloudera-director bootstrap-remote filetest.conf --lp.remote.username=admin --lp.remote.password=admin
Process logs can be found at /home/centos/.cloudera-director/logs/application.log
Plugins will be loaded from /var/lib/cloudera-director-plugins
Java HotSpot(TM) 64-Bit Server VM warning: ignoring option MaxPermSize=256M; support was removed in 8.0
Cloudera Director 2.4.0 initializing ...
[centos@ip-10-0-0-239 ~]$
Looks like its failed, right, because it doesn’t continue on. No error message! But if you execute:
[centos@ip-10-0-0-239 ~]$ grep AWS_SECRET ~/.cloudera-director/logs/application.log com.typesafe.config.ConfigException$UnresolvedSubstitution: filetest.conf: 28: Could not resolve substitution to a value: ${AWS_SECRET_ACCESS_KEY}
You’ll discover the problem! (Or there’s another problem, and you should look in that log file for details).
Within Azure, applications requiring access to an account are registered in the tenant, and are assigned an authentication key, otherwise known as a client secret. This is documented in the Use portal to create an Azure Active Directory application and service principal that can access resources document. Within that document the section Get application ID and authentication key provides the details to get the application ID and authentication key, or client secret.
The secet file for Azure is called SECRET.properties
. It contains a
single key value pair, where the key is CLIENTSECRET
.
Here’s my azure/SECRET.properties
file:
CLIENTSECRET=jhwf4Gf+ ... zD+e3k=
The secret file for GCP is called SECRET.json
. It contains the full
Google Secret Key, in JSON format, that you obtained when you made your
google account.
Mine, with characters of the private key id and lines of the private key replaced by dots, looks like:
{ "type": "service_account", "project_id": "gcp-se", "private_key_id": "b27f..................66fea", "private_key": "-----BEGIN PRIVATE KEY-----\nMIIEvgIBADANBgkqhkiG9w0BAQEFAASCBKgwggSkAgEAAoIBAQDMUKtOk000wkvJ\np/ZdwfkbpowUGMqpn2a0oQ9eTwIaLnPvrTIP3JcibWU7xkzoPXlD4hiANlkSqDqy . . . . . . UC2sMUZ1rtLCv14qg4iiXuA/RExTs1zRaZZ0r4c\nTDiZwBJEbs0flCAziv7mJ4TZ3LfGKCtrTOhUWRw/jfDHP+uJOpH2isGmytZ7uWVN\ndfllnxLITzHEQEMh0rbc/g3n\n-----END PRIVATE KEY-----\n", "client_email": "tobys-service-account@gcp-se.iam.gserviceaccount.com", "client_id": "108988546221753267035", "auth_uri": "https://accounts.google.com/o/oauth2/auth", "token_uri": "https://accounts.google.com/o/oauth2/token", "auth_provider_x509_cert_url": "https://www.googleapis.com/oauth2/v1/certs", "client_x509_cert_url": "https://www.googleapis.com/robot/v1/metadata/x509/tobys-service-account%40gcp-se.iam.gserviceaccount.com" }
There are two logs of interest:
- client log: $HOME/.cloudera-director/logs/application.log on client machine
- server log: /var/log/cloudera-director-server/application.log on server machine
If the cloudera-director client fails before communicating with the server you should look in the client log. Otherwise look in the server log.
The server log can be large - I truncate it frequently (i.e. echo >
/var/log/cloudera-director-server/application.log
) while the Director
server is running; especially before using a new conf file! Don’t
simply delete it; doing so will mess up the Director (unless the
Director server is stopped)
If you see this:
* No provider configuration block found
then you’ve likely executed cloudera-bootstrap
in the PROVIDER directory. You need to be in the top directory (where the common.conf
file is) and execute cloudera-bootstrap
there.
If the client fails with this message:
* ErrorInfo{code=PROVIDER_EXCEPTION, properties={message=Mapping for image alias 'rhel7' not found.}, causes=[]}
then you’ve not configured the plugin for GCP, as detailed in the GCP Director Configuration section.
If the client fails thus:
* Requesting an instance for Cloudera Manager ............ done * Installing screen package (1/1) .... done * Suspended due to failure ...
and the server log contains something like this:
peers certificate marked as not trusted by the user
then you’ve got a plugin configured, but its out of date. Update is, as per the GCP Director Configuration section.
Relies on NIP.io tricks to make it work.
Requires that the CDSW port be on the public internet.
NIP.io is a public bind server that uses the FQDN
given to return an address. A simple explanation is if you have your kdc
at IP address 10.3.4.6
, say, then you can refer to it as
kdc.10.3.4.6.nip.io
and this name will be resolved to 10.3.4.6
(indeed, foo.10.3.4.6.nip.io
will likewise resolve to the same actual
IP address). (Note that earlier releases of this project used xip.io
,
but that’s located in Norway and for me in the USA nip.io
, located in
the Eastern US, works better.)
This technique is used in two places: + In the director conf file to
specify the IP address of the KDC - instead of messing around with bind
or /etc/hosts
in a bootstrap script etc. simply set the KDC\_HOST to
kdc.A.B.C.D.xip.io
(choosing appropriate values for A, B, C & D as per
your setup) + When the cluster is built you will access the CDSW at the
public IP address of the CDSW instance. Lets assume that that address is
C.D.S.W
(appropriate, some might say) - then the URL to access that
instance would be http://ec2.C.D.S.W.xip.io
This is great for hacking around with ephemeral devices such as VMs and Cloud images!
In /var/kerberos/krb5kdc/kdc.conf
:
[kdcdefaults] kdc_ports = 88 kdc_tcp_ports = 88 [realms] HADOOPSECURITY.LOCAL = { acl_file = /var/kerberos/krb5kdc/kadm5.acl dict_file = /usr/share/dict/words admin_keytab = /var/kerberos/krb5kdc/kadm5.keytab supported_enctypes = aes256-cts-hmac-sha1-96:normal aes128-cts-hmac-sha1-96:normal arcfour-hmac-md5:normal max_renewable_life = 7d }
In /var/kerberos/krb5kdc/kadm5.acl
I setup any principal with the
/admin
extension as having full rights:
*/admin@HADOOPSECURITY.LOCAL *
I then execute the following to setup the users etc:
sudo kdb5_util create -P Passw0rd!
sudo kadmin.local addprinc -pw Passw0rd! cm/admin
sudo kadmin.local addprinc -pw Cloudera1 cdsw
systemctl start krb5kdc
systemctl enable krb5kdc
systemctl start kadmin
systemctl enable kadmin
Note that the CM username and credentials are
cm/admin@HADOOPSECURITY.LOCAL
and Passw0rd!
respectively.
In /etc/krb5.conf
I have this:
[libdefaults] default_realm = HADOOPSECURITY.LOCAL dns_lookup_realm = false dns_lookup_kdc = false ticket_lifetime = 24h renew_lifetime = 7d forwardable = true default_tgs_enctypes = aes256-cts-hmac-sha1-96 aes128-cts-hmac-sha1-96 arcfour-hmac-md5 default_tkt_enctypes = aes256-cts-hmac-sha1-96 aes128-cts-hmac-sha1-96 arcfour-hmac-md5 permitted_enctypes = aes256-cts-hmac-sha1-96 aes128-cts-hmac-sha1-96 arcfour-hmac-md5 [realms] HADOOPSECURITY.LOCAL = { kdc = 10.0.0.82 admin_server = 10.0.0.82 }
(Note that the IP address used is that of the private IP address of the director server; this is stable over reboot so works well)
(Deprecated - I found this image to be unstable. It would just stop
working after 3 days or so.) I use a public ActiveDirectory ami setup by
Jeff Bean: ami-a3daa0c6
to create an AD instance.
The username/password to the image are Administrator/Passw0rd!
Allow at least 5, maybe 10 minutes for the image to spin up and work properly.
The kerberos settings (which you’d put into kerberos.conf
) are:
krbAdminUsername: "cm@HADOOPSECURITY.LOCAL" krbAdminPassword: "Passw0rd! KDC_TYPE: "Active Directory" KDC_HOST: "hadoop-ad.hadoopsecurity.local" KDC_HOST_IP: # WHATEVER THE INTERNAL IP ADDRESS IS FOR THIS INSTANCE SECURITY_REALM: "HADOOPSECURITY.LOCAL" AD_KDC_DOMAIN: "OU=hadoop,DC=hadoopsecurity,DC=local" KRB_MANAGE_KRB5_CONF: true KRB_ENC_TYPES: "aes256-cts-hmac-sha1-96 aes128-cts-hmac-sha1-96 arcfour-hmac-md5"
(Don’t forget to drop the aes256 encryption if your images don’t have the Java Crypto Extensions installed)
I use the following to create standard users and groups, running this on each machine in the cluster:
sudo groupadd supergroup
sudo useradd -G supergroup -u 12354 hdfs_super
sudo useradd -G supergroup -u 12345 cdsw
echo Cloudera1 | sudo passwd --stdin cdsw
And then adding the corresponding hdfs directory from a single cluster machine:
kinit cdsw
hdfs dfs -mkdir /user/cdsw