The Retrieve and Rank demo application shows how to use a machine learning model on top of Solr to provide improved results. Here's a quick demo.
The Retrieve and Rank service helps users find the most relevant information for their query by using a combination of search and machine learning algorithms to detect "signals" in the data. Built on top of Apache Solr, developers load their data into the service, train a machine learning model based on known relevant results, then leverage this model to provide improved results to their end users based on their question or query.
The application is configured to use the "Cranfield data set" which is a public domain data set.. Further text here to explain the data, and how the ranker is trained..
Ensure that you have the following prerequisites before you start:
- An IBM Bluemix account. If you don't have one, sign up for it here. For more information about the process, see Developing Watson applications with Bluemix.
- Java Development Kit 1.7 or later releases
- Eclipse IDE for Java EE Developers
- Apache Maven 3.1 or later releases
- Git
- Websphere Liberty Profile server, if you want to run the app in your local environment.
In order to run the Retrieve and Rank demo app, you need to have a configure an instance of the Retrieve and Rank service. The following steps will guide you through the process. The instructions use Eclipse, but you can use the IDE of your choice.
-
Clone the retrieve-and-rank-java repository from GitHub by issuing one of the following commands in your terminal:
git clone https://github.com/watson-developer-cloud/retrieve-and-rank-java.git
git clone git@github.com:watson-developer-cloud/retrieve-and-rank-java.git
-
Add the newly cloned repository to your local Eclipse workspace.
- Log in to Bluemix and navigate to the Dashboard on the top panel.
- Create your app.
- Click CREATE AN APP.
- Select WEB.
- Select the starter Liberty for Java, and click CONTINUE.
- Type a unique name for your app, such as
rnr-sample-app
, and click Finish. - Select CF Command Line Interface. If you do not already have it, click Download CF Command Line Interface. This link opens a GitHub repository. Download and install it locally.
Complete one of the following sets of steps to add an instance of the Retrieve and Rank service. Bluemix allows you to create a new service instance to bind to your app or to bind to an existing instance. Choose one of the following ways:
Creating a new service instance to bind to your app
- Log in to Bluemix and navigate to the Dashboard on the top panel. Find the app that you created in the previous section, and click it.
- Click ADD A SERVICE OR API.
- Select the Watson category, and select the Retrieve and Rank service (note, initially at least the Retrieve and Rank service is housed in the 'Labs' section which requires you to click on the "Bluemix Labs Catalog" link at the bottom of the service selection page).
- Ensure that your app is specified in the App dropdown on the right-hand side of the pop-up window under Add Service.
- Type a unique name for your service in the Service name field, such as
rnr-sample-service
. - Click CREATE. The Restage Application window is displayed.
- Click RESTAGE to restart your app. If the app is not started, click START.
Binding to an existing service instance
- Log in to Bluemix and navigate to the Dashboard on the top panel. Locate and click on the app you created in the previous section.
- Click BIND A SERVICE OR API.
- Select the existing Retrieve and Rank service that you want to bind to your app, and click ADD. The Restage Application window is displayed.
- Click RESTAGE to restart your app.
Now that we have a Retrieve and Rank instance bound to the app, we can use the credentials we received in the previous step to configure and train the service. The following document, Getting started with the Retrieve and Rank service, explains how to configure the service, create a document collection, upload a corpus of data, and create and train a ranker. Please make sure to carefully follow the steps in the document, taking note of the following artifacts:
- Cluster ID
- Collection name
- Ranker ID
In order to run the Retrieve and Rank demo application on Bluemix three environment variables are required:
- CLUSTER_ID: This is the Apache Solr Cluster ID that is generated by the service. The cluster ID should have been noted while configuring the service in Stage 2 of the Training the Service section above.
- COLLECTION_NAME: This is the Apache Solr collection name that is provided to the service. The collection name should have been noted while configuring the service in Stage 3 of the Training the Service section above.
- RANKER_ID: This is the Ranker ID that is returned by the service. The Ranker ID should have been noted while configuring the service in Stage 4 of the Training the Service section above.
Navigate to the application dashboard in Bluemix. Locate and click on the application you created previously. Navigate to the Environment Variables section of the UI. Switch to the USER-DEFINED tab within the UI. Add three new environment variables as specified above, CLUSTER_ID as the key for one, with its value being the cluster ID assigned to the Solr Cluster. COLLECTION_NAME as the key for the second variable, its value being that of the Solr Document Collection and the final key being RANKER_ID, its value being the system generated ID of the trained ranker.
To view the home page of the app, open https://yourAppName.mybluemix.net where yourAppName is the name of your app.
This project is configured to be built with Maven. To deploy the app, complete the following steps in order:
- In your Eclipse window, expand the retrieve-and-rank-java project that you cloned from GitHub.
- Right-click the project and select
Maven -> Update Project
from the context menu to update Maven dependencies. - Keep the default options, and click OK.
- Navigate to the location of your default deployment server. For Websphere Liberty, it would be something like ../LibertyRuntime/usr/servers/. Open the
server.env
file(create one if it doesn't exist), and update the following entries: * VCAP_SERVICES. This entry should contain a JSON object obtained from the Environment Variables section of your application on Bluemix. When entering the JSON in the server.env file make sure it is formatted to be in one line. * CLUSTER_ID. Specify the ID value that corresponds to your Solr cluster created as a part of the Training the Service section above(the cluster id is a long alpha-numeric string).
* COLLECTION_NAME. Specify the name value that corresponds to your Solr document collection as a part of the Training the Service section above.
* RANKER_ID. Specify the ID value that corresponds to your Ranker created as a part of the Training the Service section above(the ranker id is a long alpha-numeric string).
Finally, the server.env should look something like this:
```
VCAP_SERVICES={"retrieve_and_rank": [{"name": "Watson Retrieve and Rank","label": "retrieve_and_rank","plan": "standard","credentials": {"url": "https://gateway.watsonplatform.net/retrieveandrank/api", "username": "system_generated_username", "password": "system_generated_password" } } ] }
CLUSTER_ID=system_generated_cluster_id
COLLECTION_NAME=user_provided_collection_name
RANKER_ID=system_generated_ranker_id
```
5. Switch to the navigator view in Eclipse, right-click the pom.xml
, and select `Run As -> Maven Install`. Installation of Maven begins. During the installation, the following tasks are done:
* The JS code is compiled. That is, the various Angular JS files are aggregated, uglified, and compressed. Various other pre-processing is performed on the web code, and the output is copied to the `retrieve-and-rank-java/src/main/webapp/dist` folder in the project.
* The Java code is compiled, and JUnit tests are executed against the Java code. The compiled Java and JavaScript code and various other artifacts that are required by the web project are copied to a temporary location, and a `.war` file is created.
This WAR file that resides in /retrieve-and-rank-java/target directory will be used to deploy the application on Bluemix in the next section.
You can run the application on a local server or on Bluemix. Choose one of the following methods, and complete the steps:
- Start Eclipse, and click
Window -> Show View -> Servers
. - In the Servers view, right-click and select
New -> Server
. The Define a New Server window is displayed. - Select the WebSphere Application Server Liberty Profile, and click Next.
- Configure the server with the default settings.
- In the Available list in the Add and Remove dialog, select the retrieve-and-rank-java project, and click Add >. The project is added to the runtime configuration for the server in the Configured list.
- Click Finish.
- Copy the server.env file which was edited previously from retrieve-and-rank-java/src/it/resources/server.env to the root folder of the newly defined server (i.e. wlp/usr/defaultserver/server.env).
- Start the new server, and open http://localhost:serverPort/rnr-demo/dist/index.html#/ in your favorite browser, where yourAppName is the specific name of your app.
- Execute the queries against the service!
Deploy the WAR file that you built in the previous section by using Cloud Foundry commands.
-
Open the command prompt.
-
Navigate to the directory that contains the WAR file you that you generated by running the following command in the terminal:
cd retrieve-and-rank-java/target
-
Connect to Bluemix by running the following command:
cf api https://api.ng.bluemix.net
-
Log in to Bluemix by running the following command,
cf login -u <yourUsername> -o <yourOrg> -s <yourSpace>
where yourUsername is your Bluemix id, yourOrg is your organization name in Bluemix and yourSpace is your space name in Bluemix. 5. Deploy the app to Bluemix by running the following command.
cf push <yourAppName> -p rnr-demo.war
where, yourAppName is the name of your app. 6. Navigate to Bluemix to make sure the app is started. If not, click START. 7. To view the home page of the app, open https://yourAppName.mybluemix.net/rnr-demo/dist/index.html#/ where yourAppName is the specific name of your app. 8. Execute the queries against the service!
- Retrieve and Rank service documentation
- Configuring the Retrieve and Rank service
- Retrieve and Rank API reference