-
Notifications
You must be signed in to change notification settings - Fork 311
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
How to filter genotypeRDD on sample names? org.apache.spark.SparkException: Task not serializable? #891
Comments
The full error message is:
|
Interesting. I'm not seeing this on my side:
I'll play around with this on our cluster later. You may not want to explicitly instantiate an |
Hi Frank. Thank you for the information. I am running spark 1.5.2. I'll try tomorrow without explicitly instantiating the AdamContext. |
Hi @fnothaft . Not explicitly instantiating the I don't fully understand how this work, not instantiating |
Closes bigdatagenomics#891. In org.bdgenomics.adam.rdd.ADAMContext, the val sparkContext: SparkContext is not marked as @transient, which causes serialization issues. SparkContexts are not serializable and should only be called from the driver.
Hi @NeillGibson ! I've opened a PR that should allow your old code to work: #894. The singleton object for |
Hi,
I am trying to filter a genotypeRDD based on a set of sample_names.
The error I get is
with the following code
Somehow it seems that the adamContext/sparkContext is included in the filter statement.
First I thought trough the list of sample names but maybe it is included trough the genotype objects?
But why does it only show up when I define an external list of names in the filter statement?
The text was updated successfully, but these errors were encountered: