We read every piece of feedback, and take your input very seriously.
To see all available qualifiers, see our documentation.
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
In Scala, I can read a single VCF file as follows:
val allGenotypes: RDD[Genotype] = sc.loadGenotypes(genotypeFile).rdd
However, I'm wondering if I can read multiple VCF files to create a single RDD[Genotype] object?
The text was updated successfully, but these errors were encountered:
Hello @rezacsedu!
You may either use globbed paths on load
val genotypes = sc.loadGenotypes("hdfs://data/sample*.vcf") val rdd: RDD[Genotype] = genotypes.rdd
or union multiple GenotypeRDDs together
GenotypeRDD
val genotypes0 = sc.loadGenotypes("sample0.vcf") val genotypes1 = sc.loadGenotypes("sample1.vcf") val union = genotypes0.union(genotypes1) val rdd: RDD[Genotype] = union.rdd
Sorry, something went wrong.
@heuermh thanks so much for prompt reply. Really appreciated!
No branches or pull requests
In Scala, I can read a single VCF file as follows:
val allGenotypes: RDD[Genotype] = sc.loadGenotypes(genotypeFile).rdd
However, I'm wondering if I can read multiple VCF files to create a single RDD[Genotype] object?
The text was updated successfully, but these errors were encountered: