-
Notifications
You must be signed in to change notification settings - Fork 14
Code Development Notes
jpatanooga edited this page Nov 22, 2012
·
9 revisions
- Found some research to support our plan for parallelizing SGD
- Sketched out the algorithm on paper
- Implemented algorithm in unit test "simulation"
- Transposed test algorithm into Iterative Reduce primitives
- Which Gave Us
-
com.cloudera.knittingboar.sgd
- Contains most of the core logic for running the SGD process
- Based on core code from the Mahout Multinomial Logistic Regression / SGD implementation
- Interesting Classes
-
ParallelOnlineLogisticRegression
- Modified Mahout SGD / LR
-
POLRMasterDriver
- Simulated Master node - used in unit test simulations to craft the initial parallel SGD logic
-
POLRWorkerDriver
- Simulated Worker node - used in unit test simulations to craft the initial parallel SGD logic
-
ParallelOnlineLogisticRegression
-
com.cloudera.knittingboar.sgd.iterativereduce
- Contains the Iterative Reduce based nodes to run on YARN
- Interesting Classes
-
POLRMasterNode
- Master Node Iterative Reduce implementation for SGD
- Processes the "super step"
-
POLRWorkerNode
- Processes a shard/split of input data per epoch/iteration
- Sends updates to master after each iteration
-
POLRMasterNode
- We ended up writing quite a few unit tests
- Not all unit tests are for regression testing, some were used to hash out chunks of parallel SGD functionality
- Specifically this namespace: