Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cannot use vector as input struct type due to: java.lang.ClassCastException: scala.collection.convert.Wrappers$JListWrapper cannot be cast to ml.combust.mleap.tensor.Tensor #3

Open
make opened this issue Nov 30, 2018 · 7 comments

Comments

@make
Copy link

make commented Nov 30, 2018

I am trying to deploy bundled a Spark ML NaiveBayesModel with sagemaker-sparkml-serving-container.

I am running sagemaker-sparkml-serving-container with following command:

SCHEMA='{"input":[{"name":"features","type":"double","struct":"vector"}],"output":{"name":"prediction","type":"double"}}'
BUNDLE=/tmp/naivebayes_bundle
docker run -p 8080:8080 -e SAGEMAKER_SPARKML_SCHEMA="$SCHEMA" -v $BUNDLE:/opt/ml/model sagemaker-sparkml-serving:2.2 serve

When calling /invocations with:

curl -i -H "content-type:application/json" http://localhost:8080/invocations -d '{"data":[[1.0,2.0,3.0]]}'

Following error is thrown:

java.lang.ClassCastException: scala.collection.convert.Wrappers$JListWrapper cannot be cast to ml.combust.mleap.tensor.Tensor
	at ml.combust.mleap.runtime.transformer.classification.NaiveBayesClassifier$$anonfun$1.apply(NaiveBayesClassifier.scala:19) ~[sparkml-serving-2.2.jar:2.2]
	at ml.combust.mleap.runtime.frame.Row$class.udfValue(Row.scala:241) ~[sparkml-serving-2.2.jar:2.2]
	at ml.combust.mleap.runtime.frame.ArrayRow.udfValue(ArrayRow.scala:17) ~[sparkml-serving-2.2.jar:2.2]
	at ml.combust.mleap.runtime.frame.Row$class.withValues(Row.scala:225) ~[sparkml-serving-2.2.jar:2.2]
	at ml.combust.mleap.runtime.frame.ArrayRow.withValues(ArrayRow.scala:17) ~[sparkml-serving-2.2.jar:2.2]
	at ml.combust.mleap.runtime.frame.DefaultLeapFrame$$anonfun$withColumns$1$$anonfun$apply$3$$anonfun$4.apply(DefaultLeapFrame.scala:79) ~[sparkml-serving-2.2.jar:2.2]
	at ml.combust.mleap.runtime.frame.DefaultLeapFrame$$anonfun$withColumns$1$$anonfun$apply$3$$anonfun$4.apply(DefaultLeapFrame.scala:79) ~[sparkml-serving-2.2.jar:2.2]
	at scala.collection.immutable.Stream.map(Stream.scala:418) ~[sparkml-serving-2.2.jar:2.2]
	at ml.combust.mleap.runtime.frame.DefaultLeapFrame$$anonfun$withColumns$1$$anonfun$apply$3.apply(DefaultLeapFrame.scala:79) ~[sparkml-serving-2.2.jar:2.2]
	at ml.combust.mleap.runtime.frame.DefaultLeapFrame$$anonfun$withColumns$1$$anonfun$apply$3.apply(DefaultLeapFrame.scala:78) ~[sparkml-serving-2.2.jar:2.2]
	at scala.util.Success$$anonfun$map$1.apply(Try.scala:237) ~[sparkml-serving-2.2.jar:2.2]
	at scala.util.Try$.apply(Try.scala:192) ~[sparkml-serving-2.2.jar:2.2]
	at scala.util.Success.map(Try.scala:237) ~[sparkml-serving-2.2.jar:2.2]
	at ml.combust.mleap.runtime.frame.DefaultLeapFrame$$anonfun$withColumns$1.apply(DefaultLeapFrame.scala:77) ~[sparkml-serving-2.2.jar:2.2]
	at ml.combust.mleap.runtime.frame.DefaultLeapFrame$$anonfun$withColumns$1.apply(DefaultLeapFrame.scala:72) ~[sparkml-serving-2.2.jar:2.2]
	at scala.util.Success.flatMap(Try.scala:231) ~[sparkml-serving-2.2.jar:2.2]
	at ml.combust.mleap.runtime.frame.DefaultLeapFrame.withColumns(DefaultLeapFrame.scala:71) ~[sparkml-serving-2.2.jar:2.2]
	at ml.combust.mleap.runtime.frame.MultiTransformer$class.transform(Transformer.scala:121) ~[sparkml-serving-2.2.jar:2.2]
	at ml.combust.mleap.runtime.transformer.classification.NaiveBayesClassifier.transform(NaiveBayesClassifier.scala:13) ~[sparkml-serving-2.2.jar:2.2]
	at com.amazonaws.sagemaker.utils.ScalaUtils.transformLeapFrame(ScalaUtils.java:44) ~[sparkml-serving-2.2.jar:2.2]
	at com.amazonaws.sagemaker.controller.ServingController.processInputData(ServingController.java:176) ~[sparkml-serving-2.2.jar:2.2]
	at com.amazonaws.sagemaker.controller.ServingController.transformRequestJson(ServingController.java:118) ~[sparkml-serving-2.2.jar:2.2]

Created bundle with following dependencies:

org.apache.spark:spark-core_2.11:2.4.0
org.apache.spark:spark-mllib_2.11:2.4.0
ml.combust.mleap:mleap-spark_2.11:0.12.0

Kotlin code that creates the bundle:

val model = NaiveBayes()
        .setModelType("multinomial")
        .fit(data)
SimpleSparkSerializer().serializeToBundle(model, "file:/tmp/naivebayes_bundle", model.transform(data))
@orchidmajumder
Copy link
Contributor

orchidmajumder commented Dec 1, 2018

Hey, thanks for using the sagemaker-sparkml-serving. From the stack-trace you mentioned, it looks like for some reason, your model is returning output of type Array instead of a single value.

Please change the schema to output an Array instead of a single value and see if it gives you a valid output. You may need to extract some information out from the response depending on your underlying use-case.

Schema should be changed like this:

SCHEMA='{"input":[{"name":"features","type":"double","struct":"vector"}],"output":{"name":"prediction","type":"double",struct:"array"}}'

@make
Copy link
Author

make commented Dec 3, 2018

Thanks for fast response. Your suggestion doesn't fix the problem. It throws exactly the same exception and stack trace.

It seems that input data features are given as JListWrapper for prediction instead of Tensor.
https://github.com/combust/mleap/blob/master/mleap-runtime/src/main/scala/ml/combust/mleap/runtime/transformer/classification/NaiveBayesClassifier.scala#L19

@orchidmajumder
Copy link
Contributor

It looks like your bundle is created with Spark 2.4 and MLeap 0.12.0. At this point, MLeap does not support beyond Spark 2.3 and this container is only tested with Spark 2.2 and MLeap 0.9.6.

As NaiveBayes is available in Spark 2.2 as well, it'll be easier for me to replicate if you can switch to Spark 2.2.1, MLeap 0.9.6 and try to reproduce the same error again.

@jorgeglezlopez
Copy link

@make I encountered the same problem you did and after a lot of debugging I figured it out. It has nothing to do with the version of Spark or MLeap, it is produced because inside DataConversionHelper the function convertInputDataToJavaType assumes that whenever the DataStructureType is not empty or BASIC, it will be an array.

Therefore, the code as it is right now, will never create a Vector and will not work with any pipeline that requires as an entry point a Vector (such as any trained estimator that requires features). I fixed the code and will try to create a pull request over the weekend.

@hdamani09
Copy link

hdamani09 commented Nov 4, 2019

@jorgeglezlopez Hi there, I'm using MLeap 0.14.0 with Spark 2.4.3. I deployed a model to sagemaker endpoint and still am facing the same issue. Do you know by when will the changes with the updater Docker image for 2.4 support will be pushed? Thanks

@timf-bonobos
Copy link

There is a fix for this that should be merged into master #11

@prashantprakash
Copy link

I have been trying to use the latest code here and getting similar error.

Commands.

git clone https://github.com/aws/sagemaker-sparkml-serving-container.git

cd sagemaker-sparkml-serving-container

docker build -t sagemaker-sparkml-serving:2.4 .

docker run -p 8080:8080 -e SAGEMAKER_SPARKML_SCHEMA='{"input":[{"name":"features","type":"double","struct":"vector"}],"output":{"name":"probability","type":"double","struct":"vector"}}' -v /Users/prasprak/mldocker/open_models/mleap_model/tar/logreg/:/opt/ml/model sagemaker-sparkml-serving:2.4 serve

Note: My input is of type vector and output is also of type vector.

For Invocations.

curl -i -H "Accept: application/jsonlines;data=text" -H "content-type:application/json" -d "{"data":[[-1.0, 1.5, 1.2]]}" http://localhost:8080/invocations

java.lang.ClassCastException: scala.collection.convert.Wrappers$JListWrapper cannot be cast to ml.combust.mleap.tensor.Tensor
at ml.combust.mleap.runtime.transformer.classification.LogisticRegression$$anonfun$1.apply(LogisticRegression.scala:19) ~[sparkml-serving-2.4.jar:2.4]
at ml.combust.mleap.runtime.frame.Row$class.udfValue(Row.scala:241) ~[sparkml-serving-2.4.jar:2.4]

Also i see the fix which is done here is not merged to master. I tried to pull the branch where the fix is provided but getting different error with that.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants