-
Notifications
You must be signed in to change notification settings - Fork 311
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
adam2vcf -sort_on_save flag broken #940
Comments
I think I know a fix (and the fix should be related to #933). Do you have a VCF on the cluster that reproduces this? |
Yup it's in hdfs in my home directory. It's called |
@massie is looking at this |
@andrewmchen I'm looking at this now. Thanks for sending the script and files for your job. When running the script, I get the following error from Parquet..
Looking at the meta data for the file, I see
which is version Parquet 1.7.0 Whereas the other adam file in the directory (minus the "filtered" suffix), has the following creator:
We switched from Parquet 1.7.0 -> 1.8.1 in July last year. How hard would it be to regenerate that adam file using a newer version of ADAM? It might be worth a try as I debug the root cause of the exception. |
@andrewmchen I just checked the Avro and Parquet schemas and they are identical so there's likely little use in recreating that file (unless it's trivial to do). |
The file meaning .filtered? I can recreate it without any hassle and I'll do it when I get a chance to. It seems very peculiar that they'd have different parquet versions because I built the .filtered file like a month ago. Could it be because avocado is on a different version of adam/parquet? |
Sorry. I can see why that wasn't clear. Yes, the "*.filtered" file was created using Parquet 1.7.x. That's odd. As long as Avocado is using ADAM version 0.17.1 or newer, it should be writing Parquet 0.18.x files. Avocado started using ADAM 0.17.1 in August of last year. As long as you have a relatively recent version of Avocado, you should be fine. |
@andrewmchen Can you verify the version of Avocado that you're using? If it less than six months old, it shouldn't be saving in Parquet 1.7.x format as far as I can tell looking at the pom files. |
That makes a ton of sense. I should probably rebase my avocado. The commit hash I branched off on was 2e6504f01004cd13c22f36198e6aea490bb94130. |
@andrewmchen I just submitted a pull request #949 that fixes this issue. When you have a moment, can you verify that it fixes your problem? I've run your test case but it's always good to have more than one set of eyes. |
Sure. I'll do it later tonight. Thanks for resolving this issue so quickly! |
This seems to have solved it. Just curious, how did this line work in the past anyways? https://github.com/bigdatagenomics/adam/pull/949/files#diff-514d6d86034c4dd8aa9ee737c8637a7eL130 |
Fixed by commit 0975e30 |
Hi all. I tried to run adam2vcf with the sort_on_save flag and got this error.
The adam2vcf worked without the flag so I suspect it's only when I sort. I attached the full log as well..
log.txt
The text was updated successfully, but these errors were encountered: