Skip to content
kevinwinahradsky edited this page May 13, 2013 · 23 revisions

Copyright 2013 State University of New York at Oswego

Licensed under the Apache License, Version 2.0 (the "License");  you may not use this file except in compliance with the License.  You may obtain a copy of the License at

http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software  distributed under the License is distributed on an "AS IS" BASIS,  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.  See the License for the specific language governing permissions and  limitations under the License.

Overview

hViewO is a visualization tool and alternate interface for H2O. It can be used to show data, random forest information, and tree visualizations.

UI

The application utilizes a split view display to concurrently present a dataset and the confusion matrix data for a random forest model. The provided confusion matrix indicates the quality of the associated model.

Application Overview

In addition to the confusion matrix, the operator can view the individual trees which make up the model's forest. The decision trees presented in this view each show one possible path through the model's data.

Application Tree View

Architecture

Architecture Overview

hViewO is a browser based application that is built using the Google Web Toolkit (GWT). We use GWT for rendering our pages as well as for managing communication between the browser and the web server. In the browser we also use the Sigma.js JavaScript library for rendering the tree visualizations. On the server side we use the the Google GSON library for converting H2O json responses into valid java objects.

H2O Overview

H2O utilizes Hadoop to provide access to public and private data sources. Once a dataset is uploaded to an H2O server, the REST interface can be used to produce and view a random forest model of the data.

In Depth

The GWT generated javascript uses GIN for dependency injection and implements the Model-View-Presenter (MVP) pattern using GWT's UIBinder classes. Additionally, GWT's activities and places are used throughout the implementation.

Unit testing for the underlying Java code is implemented using JUnit and Mockito.

Seperation of Model and View - Model, View, Presenter (MVP) and GWT UIBinder

The UI implementation makes use of the GWT UiBinder framework. UiBinder allows seperation of the UI layout and the program logic which sends and receives data to the layout. This is desirable because it allows the look of the UI to be more easily changed, even by those with less programming experience. UI elements which are implemented with UiBinder consist of the following:

  1. A Presenter interface with an inner View interface.
  2. A Presenter implementation.
  • The presenter class should be used to pass data to the view, which will then update its view fields. The presenter should process data in a way which minimizes data processing in the view class.
  • Presenter interfaces and classes are located in package edu.oswego.csc480_hci521_2013.client.presenters .
  1. A View implementation class which implements the View interface from the Presenter (no. 1, above). The implementation extends a GWT widget.
  • The View implementation contains @UiField annotated fields which correspond to elements in the XML view file. These fields are also GWT widgets.
  • The View implementation contains the logic for setting and getting data in its fields.
  1. A View ui.xml file.
  • The View is layed out in an XML file similar to an HTML layout.
  • Special GWT elements are used to represent the fields corresponding to the View implementation (no. 3, above). For example, <gwt:ListBox ui:field="classVars" width="100%" visibleItemCount="1" multipleSelect="false" />, represents a GWT ListBox widget corresponding the the field called "classVars" in the implementation.
  • Fields of the widget can be set directly in the XML file as shown in the example.
  • View implementations and XML files are located in package edu.oswego.csc480_hci521_2013.client.ui .

The application is split to present a pair of panels; the data is on the left, with the model results to the right. The associated view and presenter classes for these panels are denoted in the diagram below.

MVP Code Model

Custom H2O Modifications

The first modification we made to H2O was to alter the RFTreeView API call. We added a "tree" element to the json response that includes the basic structure of the tree. Here is an example of the basic structure of the tree element.

{
   "field":"name",
   "condition":"<=",
   "value":81.5,
   "children":[
      {
         "label":"Class 1"
      },
      {
         "field":"economy (mpg)",
         "condition":"<=",
         "value":30.8,
         "children":[
            {
               "label":"Class 4"
            },
            {
               "label":"Class 1"
            }
         ]
      }
   ]
}

The second modification we made was to add a ColumnEnumValues API call. This call is used to get a list of enum values for an enum data column so that we can more easily show string values rather than index values for enum data within the ui.

GWT / H2O Interaction

All interaction with H2O is done through the GWT Async service, H2OServiceAsync. This service class proxies all requests from hViewO to H2O. This is needed to compensate for same origin policy security restrictions in web browsers.

Inside the servlet the H2O json response is parsed and converted into pojos using the google gson json conversion library. This library was chosen because it was found to be more robust than the built in GWT json functionality. The GWT json functionality seemed to have problems with the variations in the possible responses from H2O, it seems to work best with very well defined and consistent json responses. Specifically the biggest issue was its inability to deal with the tabular data responses that can contain quite a bit of variability.

Once the response is received from H2O and parsed, some minor error handling is done and any H2O errors are converted into exceptions. Then the pojo generated from the response or the exception created from the H2O error is serialized back to the UI using the built in GWT RPC functionality and handled in the onSuccess() or onFailure() methods of GWT's AsyncCallback class.

UI / H2OService Interaction

H2O requests are transmitted by the UI components to the local H2OService via the a standard GWT eventbus. The UI passes data and model requests, and likewise receives replies, via the eventbus. The remaining GWT architecture utilized by hViewO is implemented using the MVP pattern discussed earlier. The presenter classes in this pattern issue the relevant request events to the eventbus and receive the replies as asynchronous events. The presenters are initiated by activity classes which are created by the application's entry method. The application's package segmentation delineates these relationships.

Diagrams

Future Enhancement Opportunities

  • The random forest algorithm is parameterized using the RfParametersViewImpl.java and RfParametersViewImpl.ui.xml files. A basic implementation is provided using a subset of the full collection of algorithm parameters. Additional parameters may be added if necessary. The associated user interface changes should be made to the RfParametersViewImpl files to facilitate such additions.

  • Use Case: Use model to classify new samples

    • Primary Actor: Data Analyst or Secondary User
    • Precondition: Model with acceptable predictive ability has been produced.
    • Main Success Scenarios:
      1. User obtains a new sample with a subset of features available in the existing model.
      2. User submits the new sample against the existing model.
      3. The application classifies the example.
    • The RFScore API has been coded and can be used to implement this use case.

Known Issues

  • Customized version of H2O is required.
  • The connection parameters for the H2O instance are not configurable so H2O must be available on localhost:54321.
  • Data sets to process must be loaded via H2O interface.
  • Interface is reset after browser refresh; all models are removed.