Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Assertion Error in '_check_column_types' #79

Closed
j0n-a opened this issue Jan 26, 2023 · 6 comments · Fixed by #80
Closed

Assertion Error in '_check_column_types' #79

j0n-a opened this issue Jan 26, 2023 · 6 comments · Fixed by #80
Labels
bug Something isn't working user-query User queries & requests

Comments

@j0n-a
Copy link

j0n-a commented Jan 26, 2023

I am trying to run pgsc_calc on all of the data from PGS catalog but I get the same error for a number of PGS (PGS002619, PGS000908, PGS002704, PGS002678, etc.). The code has run fully on over 300 other PGSs so I'm not sure what is uniquely wrong with these sets or how to fix it. There are also no details about the assertion error, so it is hard to know where to start.

Any advice would be greatly appreciated.

Here is the code I submit:

nextflow run pgscatalog/pgsc_calc
-profile singularity
--input <file_path>/pgsc_calc_samplesheet.csv
--target_build GRCh38
--outdir <output_directory_path>
--pgs_id PGS002678
--min_overlap 0

Note: the minimum overlap is zero to generate files that I can filter out later if the overlap is too small.

Here is the error:

pgscatalog_utils.match.filter: 2023-01-25 19:20:48 DEBUG Score PGS002678_hmPOS_GRCh38 passes minimum matching threshold (98.75% variants match)
pgscatalog_utils.match.write: 2023-01-25 19:20:48 DEBUG Checking column types
Traceback (most recent call last):
File "/venv/bin/combine_matches", line 8, in
sys.exit(combine_matches())
File "/venv/lib/python3.10/site-packages/pgscatalog_utils/match/combine_matches.py", line 36, in combine_matches
log_and_write(matches=matches, scorefile=scorefile, dataset=dataset, args=args)
File "/venv/lib/python3.10/site-packages/pgscatalog_utils/match/match_variants.py", line 92, in log_and_write
write_scorefiles(valid_matches, args.split, dataset)
File "/venv/lib/python3.10/site-packages/pgscatalog_utils/match/write.py", line 27, in write_scorefiles
_check_column_types(matches)
File "/venv/lib/python3.10/site-packages/pgscatalog_utils/match/write.py", line 71, in _check_column_types
assert col_types == correct_schema
AssertionError
INFO: Cleaning up image...

@smlmbrt smlmbrt added the user-query User queries & requests label Jan 26, 2023
@smlmbrt
Copy link
Member

smlmbrt commented Jan 26, 2023

Going to try a couple of these scores on our own datasets and see if I can replicate the error. Could you let us know the version of the pipeline you're using?

@j0n-a
Copy link
Author

j0n-a commented Jan 26, 2023

I am using NetFlow version 22.10.4 and pgsc_calc is version 1.3.0.

@smlmbrt
Copy link
Member

smlmbrt commented Jan 26, 2023

@j0n-a : I've been able to replicate the bug and will try to figure out what's causing it.

@j0n-a
Copy link
Author

j0n-a commented Jan 26, 2023

Thank you for looking into this!

@smlmbrt smlmbrt linked a pull request Jan 26, 2023 that will close this issue
@smlmbrt
Copy link
Member

smlmbrt commented Jan 29, 2023

Hi @j0n-a, if you add the -latest flag it should update and run correctly with the new patch (v1.3.2):

nextflow run pgscatalog/pgsc_calc -latest
-profile singularity 
--input <file_path>/pgsc_calc_samplesheet.csv 
--target_build GRCh38 
--outdir <output_directory_path> 
--pgs_id PGS002678 
--min_overlap 0

@j0n-a
Copy link
Author

j0n-a commented Jan 29, 2023

Thank You! That solution worked!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working user-query User queries & requests
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants