The DataRobot automated machine learning platform helps data scientists and business analysts discover the best predictive models for every situation, and then deploy them so they can consistently make smarter and faster business decisions that impact their company's bottom line.
DataRobot brings the power of auto-modeling to SageMaker users allowing them to quickly determine and use the best machine learning model for their problem. Within minutes DataRobot can iterate on thousands of combinations of models, data preparation steps and parameters that would take days or weeks to do manually.
To experience the power of DataRobot+SageMaker you’ll need a DataRobot account. If your company already deployed DataRobot please get an account from your administrator. Otherwise, please contact us here: https://www.datarobot.com/contact-us/
-
While logged in the DataRobot interface, click on the profile icon on the top right corner of the screen.
-
Select
Profile
from the drop down menu: -
Your API Token will be in the top section of your profile, copy to insert in your notebooks.
Statistics on whether a flight was delayed and for how long are available from government databases for all the major carriers. It would be useful to be able to predict before scheduling a flight whether or not it was likely to be delayed. In the example notebooks below, we will use DataRobot to try to model whether a flight will be delayed, based on information such as the scheduled departure time and whether it rained the day of the flight.
Before beginning with these notebooks, make sure you have read through the Installing Dependencies section of this document.
-
Basic Introduction walks the user through the basics of using DataRobot from a SageMaker notebook instance. This includes covering topics such as: data preparation, uploading the dataset to DataRobot, kicking off auto-modeling and finally getting predictions from the top ranking model.
-
Diving Deeper into Modeling shows the user how to explore the models created by the auto-modeling process in more detail. For example, we will see how the models are performing against the training data.
-
Exploring Reasons for Prediction Results examines enhanced functionality supported by DataRobot to provide more insights into prediction results. For certain project types, DataRobot supports producing explanations about a its prediction output on a per-row basis.
To use these notebooks we require extra dependencies to be installed into the Notebook Instance. While it is possible to install packages directly inside the running Jupyter instance, this is not ideal as everytime the instance is restarted, all modifications will be lost. To better support customization of the Notebook Instance environment Amazon provides Lifecycle Configurations, which is a shell script that can be configured to run each time a notebook instance runs. To learn more, see Amazon's documentation or their blog post on the subject.
Below we provide some simple steps on how to create a new notebook instance with a lifecycle configuration that will prepare the instance to work with DataRobot. Unfortunately, it SageMaker does not allow you to attach a lifecycle configuration to an existing notebook instance so we will launch a new one:
-
Click the
Create notebook instance
button -
Fill in all the appropriate fields until you get to the
Lifecycle configuration
drop-down and selectCreate a new lifecycle configuration
and this will pop open a new modal panel -
Give the configuration a descriptive name (e.g. DataRobot-Standard) and then click on the
Start notebook
tab under the Scripts section: -
Paste the script below into the script editor:
#!/bin/bash
set -e
export PATH=/home/ec2-user/anaconda3/bin/:$PATH
# Install DataRobot client package for Python 2
conda install -n python2 -c conda-forge datarobot -y -q
# Install DataRobot client package for Python 3
conda install -n python3 -c conda-forge datarobot -y -q
-
Click the
Create configuration
button at the bottom of the page to save a new Lifecycle Configuration. -
Finish filling in the rest of the options for your instance and click the
Create notebook instance
button at the bottom of the page to begin launching your new instance.
In the future, if you need to launch new notebook instances, you can reuse the Lifecycle Configuration created in the steps above rather than creating a new configuration. Note, scripts cannot run for longer than 5 minutes. If a script runs for longer than 5 minutes, it fails and the notebook instance is not created or started.