[SPARK-2199] [mllib] topic modeling #1269

akopich · 2014-06-30T14:30:31Z

I have implemented Probabilistic Latent Semantic Analysis (PLSA) and Robust PLSA with support of additive regularization (that actually means that I've implemented Latent Dirichlet Allocation too).

mengxr · 2014-07-14T03:45:21Z

Jenkins, add to whitelist.

mengxr · 2014-07-14T03:45:28Z

Jenkins, test this please.

SparkQA · 2014-07-15T13:12:48Z

QA tests have started for PR 1269. This patch merges cleanly.
View progress: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/16673/consoleFull

SparkQA · 2014-07-15T13:12:54Z

QA results for PR 1269:
- This patch FAILED unit tests.
- This patch merges cleanly
- This patch adds the following public classes (experimental):
class Document(val tokens: SparseVector[Int], val alphabetSize: Int) extends Serializable
class DocumentParameters(val document: Document, val theta: Array[Float],
class GlobalCounters(val wordsFromTopics: Array[Array[Float]], val alphabetSize: Int)
class GlobalParameters(val phi : Array[Array[Float]], val alphabetSize : Int)
class PLSA(@transient protected val sc: SparkContext,
class RobustDocumentParameters(document: Document,
class RobustGlobalCounters(wordsFromTopic: Array[Array[Float]],
class RobustGlobalParameters(phi : Array[Array[Float]],
class RobustPLSA(@transient protected val sc: SparkContext,
trait SparseVectorFasterSum {
trait DocumentOverTopicDistributionRegularizer extends Serializable with MatrixInPlaceModification {
trait MatrixInPlaceModification {
class SymmetricDirichletDocumentOverTopicDistributionRegularizer(protected val alpha: Float)
trait SymmetricDirichletHelper {
class SymmetricDirichletTopicRegularizer(protected val alpha: Float) extends TopicsRegularizer
trait TopicsRegularizer extends MatrixInPlaceModification {
class UniformDocumentOverTopicRegularizer extends DocumentOverTopicDistributionRegularizer {
class UniformTopicRegularizer extends TopicsRegularizer {
class TObjectIntHashMapSerializer extends Serializer[TObjectIntHashMap[Object]] {

For more information see test ouptut:
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/16673/consoleFull

SparkQA · 2014-07-15T13:22:47Z

QA tests have started for PR 1269. This patch merges cleanly.
View progress: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/16674/consoleFull

SparkQA · 2014-07-15T14:59:05Z

QA results for PR 1269:
- This patch PASSES unit tests.
- This patch merges cleanly
- This patch adds the following public classes (experimental):
class Document(val tokens: SparseVector[Int], val alphabetSize: Int) extends Serializable
class DocumentParameters(val document: Document, val theta: Array[Float],
class GlobalCounters(val wordsFromTopics: Array[Array[Float]], val alphabetSize: Int)
class GlobalParameters(val phi : Array[Array[Float]], val alphabetSize : Int)
class PLSA(@transient protected val sc: SparkContext,
class RobustDocumentParameters(document: Document,
class RobustGlobalCounters(wordsFromTopic: Array[Array[Float]],
class RobustGlobalParameters(phi : Array[Array[Float]],
class RobustPLSA(@transient protected val sc: SparkContext,
trait SparseVectorFasterSum {
trait DocumentOverTopicDistributionRegularizer extends Serializable with MatrixInPlaceModification {
trait MatrixInPlaceModification {
class SymmetricDirichletDocumentOverTopicDistributionRegularizer(protected val alpha: Float)
trait SymmetricDirichletHelper {
class SymmetricDirichletTopicRegularizer(protected val alpha: Float) extends TopicsRegularizer
trait TopicsRegularizer extends MatrixInPlaceModification {
class UniformDocumentOverTopicRegularizer extends DocumentOverTopicDistributionRegularizer {
class UniformTopicRegularizer extends TopicsRegularizer {
class TObjectIntHashMapSerializer extends Serializer[TObjectIntHashMap[Object]] {

For more information see test ouptut:
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/16674/consoleFull

mengxr · 2014-07-21T19:47:23Z

@akopich Thanks for working on PLSA! This is a big feature and it introduces many public traits/classes. Could you please summarize the public methods? Some of them may be unnecessary to expose to end users and we should hide them. The other issue is about the code style. Please follow the guide at https://cwiki.apache.org/confluence/display/SPARK/Spark+Code+Style+Guide and update the PR, for example:

remove "created by ..." comments generated by intellij
organize imports into groups: java, scala, 3rd party, and spark
doc for every trait/class. Some only contain doc for parameters but miss the summary.
indentation

SparkQA · 2014-07-22T12:48:12Z

QA tests have started for PR 1269. This patch merges cleanly.
View progress: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/16961/consoleFull

akopich · 2014-07-22T13:02:24Z

Thank you for your recommendations.

Sorry for code style -- i believed it's ok since sbt/sbt scalastyle finds no issue. Hope, it's ok now.

I've removed "created by ..." comments generated by intellij, organized imports, added docs for every public class/trait and method, fixed indentation, hidden all the classes/traits that end-user does not need.

SparkQA · 2014-07-22T13:33:31Z

QA tests have started for PR 1269. This patch merges cleanly.
View progress: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/16965/consoleFull

SparkQA · 2014-07-22T14:26:53Z

QA results for PR 1269:
- This patch FAILED unit tests.
- This patch merges cleanly
- This patch adds the following public classes (experimental):
class Document(val tokens: SparseVector[Int], val alphabetSize: Int) extends Serializable
class DocumentParameters(val document: Document,
class GlobalParameters(val phi : Array[Array[Float]], val alphabetSize : Int)
class PLSA(@transient protected val sc: SparkContext,
class RobustDocumentParameters(document: Document,
class RobustGlobalParameters(phi : Array[Array[Float]],
class RobustPLSA(@transient protected val sc: SparkContext,
trait DocumentOverTopicDistributionRegularizer extends Serializable with MatrixInPlaceModification {
class SymmetricDirichletDocumentOverTopicDistributionRegularizer(protected val alpha: Float)
class SymmetricDirichletTopicRegularizer(protected val alpha: Float) extends TopicsRegularizer
trait TopicsRegularizer extends MatrixInPlaceModification {
class UniformDocumentOverTopicRegularizer extends DocumentOverTopicDistributionRegularizer {
class UniformTopicRegularizer extends TopicsRegularizer {
class TObjectIntHashMapSerializer extends Serializer[TObjectIntHashMap[Object]] {

For more information see test ouptut:
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/16961/consoleFull

SparkQA · 2014-07-22T15:12:50Z

QA results for PR 1269:
- This patch FAILED unit tests.
- This patch merges cleanly
- This patch adds the following public classes (experimental):
class Document(val tokens: SparseVector[Int], val alphabetSize: Int) extends Serializable
class DocumentParameters(val document: Document,
class GlobalParameters(val phi : Array[Array[Float]], val alphabetSize : Int)
class PLSA(@transient protected val sc: SparkContext,
class RobustDocumentParameters(document: Document,
class RobustGlobalParameters(phi : Array[Array[Float]],
class RobustPLSA(@transient protected val sc: SparkContext,
trait DocumentOverTopicDistributionRegularizer extends Serializable with MatrixInPlaceModification {
class SymmetricDirichletDocumentOverTopicDistributionRegularizer(protected val alpha: Float)
class SymmetricDirichletTopicRegularizer(protected val alpha: Float) extends TopicsRegularizer
trait TopicsRegularizer extends MatrixInPlaceModification {
class UniformDocumentOverTopicRegularizer extends DocumentOverTopicDistributionRegularizer {
class UniformTopicRegularizer extends TopicsRegularizer {
class TObjectIntHashMapSerializer extends Serializer[TObjectIntHashMap[Object]] {

For more information see test ouptut:
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/16965/consoleFull

akopich · 2014-07-22T15:23:57Z

@mengxr

I probably need your help. I have no idea why tests fail somewhere at spark.streaming. Could you please have a look at jenkins' log and give me a hint?

mengxr · 2014-07-30T00:13:31Z

Jenkins, retest this please.

witgo · 2014-07-30T00:15:50Z

...test/scala/org/apache/spark/mllib/clustering/topicmodeling/topicmodels/RobustPLSASuite.scala

+    testPLSA(plsa)
+  }
+
+}


Add a blank line

Is this a problem? You mentioned it several times but I don't think a trailing newline is required in Java or Scala.

It seems that git need to blank line end.

SparkQA · 2014-07-30T00:18:54Z

QA tests have started for PR 1269. This patch merges cleanly.
View progress: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/17396/consoleFull

mengxr · 2014-07-30T00:19:58Z

@akopich The failed tests might be irrelevant to this PR. It would be nice if you can make the public interfaces minimal and provide a summary of them. For example, You can make "robust" a parameter of PLSA and then hide RobustPLSA implementation from users.

SparkQA · 2014-07-30T01:11:30Z

QA results for PR 1269:
- This patch PASSES unit tests.
- This patch merges cleanly
- This patch adds the following public classes (experimental):
class Document(val tokens: SparseVector[Int], val alphabetSize: Int) extends Serializable
class DocumentParameters(val document: Document,
class GlobalParameters(val phi : Array[Array[Float]], val alphabetSize : Int)
class PLSA(@transient protected val sc: SparkContext,
class RobustDocumentParameters(document: Document,
class RobustGlobalParameters(phi : Array[Array[Float]],
class RobustPLSA(@transient protected val sc: SparkContext,
trait DocumentOverTopicDistributionRegularizer extends Serializable with MatrixInPlaceModification {
class SymmetricDirichletDocumentOverTopicDistributionRegularizer(protected val alpha: Float)
class SymmetricDirichletTopicRegularizer(protected val alpha: Float) extends TopicsRegularizer
trait TopicsRegularizer extends MatrixInPlaceModification {
class UniformDocumentOverTopicRegularizer extends DocumentOverTopicDistributionRegularizer {
class UniformTopicRegularizer extends TopicsRegularizer {
class TObjectIntHashMapSerializer extends Serializer[TObjectIntHashMap[Object]] {

For more information see test ouptut:
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/17396/consoleFull

witgo · 2014-08-04T14:58:16Z

...e/spark/mllib/clustering/topicmodeling/utils/serialization/TObjectIntHashMapSerializer.scala

+
+import com.esotericsoftware.kryo.io.{Input, Output}
+import com.esotericsoftware.kryo.{Kryo, Serializer}
+import gnu.trove.map.hash.TObjectIntHashMap


This can be replaced with breeze.util.Index

Thank you very much! I wish I knew earlier about breeze.util.Index.

chazchandler · 2014-08-05T21:36:30Z

it looks like the higher the requested number of topics, the larger the documents have to be or else Perplexity goes to NaN shortly after the first iteration (and phi, theta as well). in testing against real data, this would happen even if there was only one short outlier in the corpus. is this expected?
i'll see if i can throw together a test to illustrate this if it would help, but you could see it happen by changing numberOfTopics to 4 and re-running this test

SparkQA · 2014-08-18T13:45:15Z

QA tests have started for PR 1269 at commit 97eefc6.

This patch merges cleanly.

akopich · 2014-08-18T14:29:56Z

@chazchandler,
As far as I can see, you run RobustPLSA. Behavious like this may take place if every word from a document is explained by background or noise.

Probably, it's possible to treat this situation in a better manner. I'll think about this. All suggestions are welcome.

SparkQA · 2014-08-18T14:31:39Z

QA tests have finished for PR 1269 at commit 97eefc6.

This patch fails unit tests.
This patch merges cleanly.
This patch adds the following public classes (experimental):
- class Document(val tokens: SparseVector[Int], val alphabetSize: Int) extends Serializable
- class DocumentParameters(val document: Document,
- class GlobalParameters(val phi : Array[Array[Float]], val alphabetSize : Int)
- class PLSA(@transient protected val sc: SparkContext,
- class RobustDocumentParameters(document: Document,
- class RobustGlobalParameters(phi : Array[Array[Float]],
- class RobustPLSA(@transient protected val sc: SparkContext,
- trait DocumentOverTopicDistributionRegularizer extends Serializable with MatrixInPlaceModification
- class SymmetricDirichletDocumentOverTopicDistributionRegularizer(protected val alpha: Float)
- class SymmetricDirichletTopicRegularizer(protected val alpha: Float) extends TopicsRegularizer
- trait TopicsRegularizer extends MatrixInPlaceModification
- class UniformDocumentOverTopicRegularizer extends DocumentOverTopicDistributionRegularizer
- class UniformTopicRegularizer extends TopicsRegularizer

akopich · 2014-08-18T14:33:36Z

@mengxr
I have tried my best to make interfaces minimal and comment them in understandable way.
Please, could you mention what exactly is unclear?

Unfortunately, I'm not sure if I understand what you suggest for robust plsa. Do you mean there should be some kind of facade for PLSA and RobustPLSA?

SparkQA · 2014-08-18T14:40:23Z

QA tests have started for PR 1269 at commit d480c6e.

This patch merges cleanly.

SparkQA · 2014-08-18T15:20:16Z

QA tests have started for PR 1269 at commit 839fed4.

This patch merges cleanly.

SparkQA · 2014-08-18T15:34:32Z

QA tests have finished for PR 1269 at commit d480c6e.

This patch passes unit tests.
This patch merges cleanly.
This patch adds the following public classes (experimental):
- class Document(val tokens: SparseVector[Int], val alphabetSize: Int) extends Serializable
- class DocumentParameters(val document: Document,
- class GlobalParameters(val phi : Array[Array[Float]], val alphabetSize : Int)
- class PLSA(@transient protected val sc: SparkContext,
- class RobustDocumentParameters(document: Document,
- class RobustGlobalParameters(phi : Array[Array[Float]],
- class RobustPLSA(@transient protected val sc: SparkContext,
- trait DocumentOverTopicDistributionRegularizer extends Serializable with MatrixInPlaceModification
- class SymmetricDirichletDocumentOverTopicDistributionRegularizer(protected val alpha: Float)
- class SymmetricDirichletTopicRegularizer(protected val alpha: Float) extends TopicsRegularizer
- trait TopicsRegularizer extends MatrixInPlaceModification
- class UniformDocumentOverTopicRegularizer extends DocumentOverTopicDistributionRegularizer
- class UniformTopicRegularizer extends TopicsRegularizer

SparkQA · 2014-08-18T16:14:15Z

QA tests have finished for PR 1269 at commit 839fed4.

This patch passes unit tests.
This patch merges cleanly.
This patch adds the following public classes (experimental):
- class Document(val tokens: SparseVector[Int], val alphabetSize: Int) extends Serializable
- class DocumentParameters(val document: Document,
- class GlobalParameters(val phi : Array[Array[Float]], val alphabetSize : Int)
- class PLSA(@transient protected val sc: SparkContext,
- class RobustDocumentParameters(document: Document,
- class RobustGlobalParameters(phi : Array[Array[Float]],
- class RobustPLSA(@transient protected val sc: SparkContext,
- trait DocumentOverTopicDistributionRegularizer extends Serializable with MatrixInPlaceModification
- class SymmetricDirichletDocumentOverTopicDistributionRegularizer(protected val alpha: Float)
- class SymmetricDirichletTopicRegularizer(protected val alpha: Float) extends TopicsRegularizer
- trait TopicsRegularizer extends MatrixInPlaceModification
- class UniformDocumentOverTopicRegularizer extends DocumentOverTopicDistributionRegularizer
- class UniformTopicRegularizer extends TopicsRegularizer

chazchandler · 2014-08-18T16:17:36Z

Is there a straightforward way to get the topics themselves back out of the infer step? They seem to be effectively private inside of infer.

Also, once infer has been run, what is the next expected step in the workflow to evaluate relevancy of additional documents to the corpus and/or train the corpus on new documents? That is, how would one take the results of infer and use them for these purposes?

SparkQA · 2014-08-19T12:05:22Z

QA tests have started for PR 1269 at commit 05c60fb.

This patch merges cleanly.

jkbradley · 2014-12-18T01:15:33Z

I've been looking at the various topic modeling PRs (3 currently) to try to get a sense of how they compare in terms of accuracy and speed. By "scaling," I really meant speed, or comparing running times across implementations to get a sense of what is fastest & why. I'm envisioning the comparison on a small cluster at least; I'm hoping to run some such tests myself. Computing scaling curves as the # of machines increases would be awesome but should probably come later.

For the test failures, I'll wait a little bit and then re-run the tests.

akopich · 2014-12-18T14:38:50Z

How do you compare accuracy? Perplexity means nothing but perplexity -- topic models can be reliably compared only via application task (e.g. classification, recommendation... ).

Should I add the dataset for "perplexity sanity check" to the repo? I am about to use 1000 arxiv papers. This dataset is about 20 MB (5.5 MB zipped).

jkbradley · 2014-12-18T18:57:08Z

Yes, "accuracy" meant some kind of metric like perplexity. I agree perplexity does not correlate exactly with human perception, but it's as good as it gets (assuming no one here has the resources to run real experiments right now). You don't need to add a dataset to the repo, but posting any results you get would be helpful. I'll post some results once I have some.

akopich · 2014-12-19T14:24:58Z

I've performed sanity check on the dataset i've described above.

PLSA: tm project obtains perplexity of 2358 and this implementation ends up with 2311.

RobustPLSA: 1871 and 1849 respectively.

Seems to be sane.

SparkQA · 2014-12-19T15:57:15Z

Test build #24647 has finished for PR 1269 at commit 0d7469b.

This patch fails Spark unit tests.
This patch merges cleanly.
This patch adds no public classes.

akopich · 2014-12-19T15:59:36Z

And tests fail again in an obscure manner...

JoshRosen · 2014-12-19T16:21:18Z

@akopich I've filed a JIRA to investigate that test failure, since it looks like a flaky streaming test: https://issues.apache.org/jira/browse/SPARK-4905

akopich · 2014-12-19T16:36:42Z

I've fixed perplexity for robust plsa and updates perplexity value in the comment above. Now they are almost the same.

akopich · 2014-12-19T16:58:56Z

By the way. May be it's off top, but this is related to initial approximation generation.

Suppose, one has indxs : RDD[Int] and is about to create an RDD of random ints rnd : RDD[Int] s.t. rnd.count == indxs.count. The obvious way leads to smth as follows

val random = new java.util.Random()
val rnd = indx.map(random.nextInt)

But random object is going to be serialized and workers will recieve random with the same seed, thus this solution produces sequences of the same ints.

What's a proper way?

collection length computation was wrong

SparkQA · 2014-12-19T17:58:53Z

Test build #24650 has finished for PR 1269 at commit e0fcc6f.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

jkbradley · 2014-12-30T21:36:18Z

@akopich The right way to do pseudo-randomness is to do:

val randomSeed = ... // if you want to pass in a seed
indx.mapPartitionsWithIndex { case (partitionIndex, iterator) =>
  // Use the partition index to make a different but deterministic seed for each partition.
  val random = new java.util.Random(partitionIndex + randomSeed)
  iterator.map(random.nextInt)
}

You can find examples in, e.g., mllib/tree/impl/BaggedPoint.scala

jkbradley · 2015-01-08T19:43:37Z

@akopich I had hoped to get this into MLlib, but after more consideration, I believe it is too complex. Currently, what would be ideal is a simple implementation of LDA since that is all that most users need. While generalizations like robust PLSA may outperform LDA with proper tuning, it’s somewhat of a research area, and it may be better to go with LDA since it has been very widely tested and used.

However, I am sure some users would want to use your implementation of Robust PLSA, so it would be valuable for you to make it available as a package for Spark.

The best path right now, I believe, will be to create a simple PR with a minimal public API, where that API should be extensible with (a) extra parameters/features and (b) alternate optimization/learning algorithms. I've posted a public design doc on the LDA JIRA here, and I’m going to submit such a PR. I would of course appreciate your feedback on it. Thanks very much for your understanding.

When we merge the initial LDA PR, @mengxr will be sure to include all of those who have participated as authors of Spark LDA PRs: @akopich @witgo @yinxusen @dlwh @EntilZha @jegonzal

CC: @mengxr

akopich · 2015-01-12T12:00:28Z

@jkbradley, @mengxr, please, include @IlyaKozlov as author too. He's helped a lot with the implementation.

jkbradley · 2015-01-13T19:19:38Z

@akopich We'll make sure to do that. Thanks for letting us know.

jkbradley · 2015-02-03T00:00:47Z

@IlyaKozlov Would you like your email included in the git commit for the initial LDA PR? If so, please let me (or @mengxr ) know ASAP. Thanks!

renchengchang · 2015-03-10T09:50:55Z

@akopich how to assign document id?

akopich · 2015-03-10T13:17:23Z

@renchengchang

Hi.
Don't use code from this PR. Use either LDA (which is merged with mllib) or https://github.com/akopich/dplsa which is a further development of this PR.
I do not employ the concept of document id.

renchengchang · 2015-03-11T01:34:00Z

Thanks.
I have a question:
if there is not document id ,how can I know the relation between topic vector and raw text?

发件人: Avanesov Valeriy [mailto:notifications@github.com]
发送时间: 2015年3月10日 21:18
收件人: apache/spark
抄送: 任成常
主题: [营销类邮件] Re: [spark] [SPARK-2199] [mllib] topic modeling (#1269)

@renchengchanghttps://github.com/renchengchang

Hi.
Don't use code from this PR. Use either LDA (which is merged with mllib) or https://github.com/akopich/dplsa which is a further development of this PR.
I do not employ the concept of document id.

—
Reply to this email directly or view it on GitHubhttps://github.com//pull/1269#issuecomment-78050367.

akopich · 2015-03-11T01:40:17Z

@renchengchang
What do you mean by "topic vector"? A vector of p(t|d) \forall t? If so, you can find these vectors in RDD[DocumentParameters] which is returned by infer(documents: RDD[Document], ...) method. DocumentParameters stores a document a vector of p(t|d) \forall t which is referred as theta. BTW, the order of documents is left the same.

jkbradley · 2015-03-11T02:47:56Z

@akopich Since this is no longer an active PR, could you please close it?

It was very helpful to have this PR as a major basis for the initial LDA PR. If you do end up using the merged LDA or future versions which may be added, it would be great to get your input about further improvements, especially if they can be added incrementally. There are people actively working on online variational Bayes and on Gibbs sampling, which should have very different behavior from EM.

* add ZORDER syntaxsupport * fix style * pass columns list in zOrder * address comment

witgo reviewed Jul 30, 2014
View reviewed changes

witgo reviewed Aug 4, 2014
View reviewed changes

akopich closed this Dec 17, 2014

akopich reopened this Dec 17, 2014

[SPARK-2199][mllib] colt dependency removed

b6f852e

[SPARK-2199] a bug in perplexity computation fixed

0d7469b

collection length computation was wrong

[SPARK-2199] perplexity for robust plsa fixed

e0fcc6f

jkbradley mentioned this pull request Dec 31, 2014

SPARK-4156 [MLLIB] EM algorithm for GMMs #3022

Closed

akopich closed this Mar 11, 2015

akopich deleted the master branch August 9, 2017 16:49

sunchao pushed a commit to sunchao/spark that referenced this pull request Dec 8, 2021

rdar://84771625: add ZORDER syntax support (apache#1269)

afb8833

* add ZORDER syntaxsupport * fix style * pass columns list in zOrder * address comment

wangyum pushed a commit that referenced this pull request May 26, 2023

[CARMEL-6608] Increase bucket table scan partitions (#1269)

c3f17a5

[SPARK-2199] [mllib] topic modeling #1269

[SPARK-2199] [mllib] topic modeling #1269

Conversation

akopich commented Jun 30, 2014

mengxr commented Jul 14, 2014

mengxr commented Jul 14, 2014

SparkQA commented Jul 15, 2014

SparkQA commented Jul 15, 2014

SparkQA commented Jul 15, 2014

SparkQA commented Jul 15, 2014

mengxr commented Jul 21, 2014

SparkQA commented Jul 22, 2014

akopich commented Jul 22, 2014

SparkQA commented Jul 22, 2014

SparkQA commented Jul 22, 2014

SparkQA commented Jul 22, 2014

akopich commented Jul 22, 2014

mengxr commented Jul 30, 2014

witgo Jul 30, 2014

Choose a reason for hiding this comment

srowen Jul 30, 2014

Choose a reason for hiding this comment

witgo Aug 4, 2014

Choose a reason for hiding this comment

SparkQA commented Jul 30, 2014

mengxr commented Jul 30, 2014

SparkQA commented Jul 30, 2014

witgo Aug 4, 2014

Choose a reason for hiding this comment

akopich Aug 18, 2014

Choose a reason for hiding this comment

chazchandler commented Aug 5, 2014

SparkQA commented Aug 18, 2014

akopich commented Aug 18, 2014

SparkQA commented Aug 18, 2014

akopich commented Aug 18, 2014

SparkQA commented Aug 18, 2014

SparkQA commented Aug 18, 2014

SparkQA commented Aug 18, 2014

SparkQA commented Aug 18, 2014

chazchandler commented Aug 18, 2014

SparkQA commented Aug 19, 2014

jkbradley commented Dec 18, 2014

akopich commented Dec 18, 2014

jkbradley commented Dec 18, 2014

akopich commented Dec 19, 2014

SparkQA commented Dec 19, 2014

akopich commented Dec 19, 2014

JoshRosen commented Dec 19, 2014

akopich commented Dec 19, 2014

akopich commented Dec 19, 2014

SparkQA commented Dec 19, 2014

jkbradley commented Dec 30, 2014

jkbradley commented Jan 8, 2015

akopich commented Jan 12, 2015

jkbradley commented Jan 13, 2015

jkbradley commented Feb 3, 2015

renchengchang commented Mar 10, 2015

akopich commented Mar 10, 2015

renchengchang commented Mar 11, 2015

akopich commented Mar 11, 2015

jkbradley commented Mar 11, 2015