guppy v6 #64

aafshinfard · 2022-07-27T18:11:13Z

Just wanted to ask if there are any plans on releasing a guppy >= v6 base calling of the reads?
Thanks.

The text was updated successfully, but these errors were encountered:

skoren · 2022-08-01T20:09:27Z

No immediate plans since we're not actively working on CHM13 and we've not found much benefit going to guppy 6+ with our hybrid assembly method.

aafshinfard · 2022-08-02T06:12:05Z

Thanks for the response @skoren

hasindu2008 · 2022-08-11T03:45:48Z

Given that I recently downloaded the whole raw signal dataset, I am planning to do a Guppy 6 rebasecall. If it succeeds (and not sure how much time it will take) and if your AWS storage can host more data @skoren , I can share it to be shared.

aafshinfard · 2022-08-11T15:45:30Z

@hasindu2008 That would be awesome!

hasindu2008 · 2022-08-29T14:23:56Z

@aafshinfard I have recently converted all the raw data to bloe5 format and have basecalled using Guppy 6.1.3 hac model. Given the large size of the files, I am not sure how I could share, Any suggestions?

aafshinfard · 2022-08-29T17:04:55Z

@hasindu2008 Nice to hear you did it. How large are the files?

aafshinfard · 2022-08-29T17:25:24Z

@hasindu2008 Would be nice if the T2T team can host this (@skoren), but another option would be Zenodo. I heard they support up to 50GB and even more in special cases...
https://www.youtube.com/watch?v=S1qK_TA52e4&t=251s

arangrhie · 2022-08-29T17:54:32Z

@aafshinfard how big is the total file size?

aafshinfard · 2022-08-31T19:46:30Z

@arangrhie, I opened the issue and @hasindu2008 kindly did the job; waiting for them to respond about the size of the dataset.

hasindu2008 · 2022-09-01T04:08:31Z

@arangrhie @aafshinfard

The basecalled fastq files gzipped are relatively small and I think can be easily hosted.
288G hg2_merged_pass.fastq.gz
39G hg2_merged_fail.fastq.gz

The raw signal data converted to BLOW5 are 3.4 TB. I had to convert that 5TB+ FAST5 compressed tarballs to BLOW5; otherwise, base-calling using FAST5 would have taken a few weeks. It would be useful for the future if those BLOW5 can be hosted to allow direct base-calling from S3 storage mounted locally, as well as partial download of certain genomic regions when necessary (see #63). Compressed tarballs of FAST5 for this kind of large dataset is not easily accissible and diminishes the value of a useful dataset like this in my opinion.

hasindu2008 · 2022-11-14T05:27:35Z

@aafshinfard You may download the merged Guppy 6 basecalls for the whole dataset here:

https://slow5test.s3.amazonaws.com/tmp/chm13_merged_pass.fastq.gz
https://slow5test.s3.amazonaws.com/tmp/chm13_merged_fail.fastq.gz

Note that this is not a free S3 storage like the one used for hosting CHM13, so I will be grateful if you can let me know after you download it so that I can delete it then. Otherwise, AWS keeps on charging.

@skoren CHM13 maintainers feel free to copy this file into their free S3 storage if you think it will be useful to anyone in future.

Software and versions used for the basecalling are explained below:
Nanopore raw signal data were downloaded, extracted and then converted to BLOW5 format using slow5tools. Then, they were basecalled using buttery-eel under Guppy 6.3.7 high accuracy mode. Qscore 7 was used for pass and fail cut-off.

Base-calling commands:

#basecall gridION data

buttery-eel  -i  min_grid.blow5  --guppy_bin /install/ont-guppy-6.3.7/bin/  --config dna_r9.4.1_450bps_hac.cfg -x cuda:all -q 7 -o reads_min_grid.fastq --port 5555  --use_tcp

#basecall promethION data
buttery-eel  -i  prom.blow5  --guppy_bin /install/ont-guppy-6.3.7/bin/  --config dna_r9.4.1_450bps_hac_prom.cfg -x cuda:all -q 7 -o reads_prom.fastq --port 5556  --use_tcp

aafshinfard · 2022-11-15T04:25:43Z

@hasindu2008 Awesome, thank you so much!

aafshinfard · 2022-11-23T01:57:11Z

@hasindu2008 Just started downloading; should be done tonight. Will confirm after it has finished. Thanks again.

aafshinfard · 2022-11-28T23:42:53Z

@hasindu2008 Just confirming that my download was completed. Thank you so much for your help.

hasindu2008 · 2022-11-29T23:09:50Z

@aafshinfard
No problem, glad to help. If this becomes useful in your work please consider citing BLOW5 which allowed us to do this basecalling with very little budget, which otherwise would require to spend a fortune.

aafshinfard · 2022-11-29T23:54:59Z

Sure thing, thank you @hasindu2008

skoren · 2024-06-11T17:51:58Z

Thanks for contributing these, sorry this dropped of my radar. I put a link to the NCBI hosted files for both now.

skoren closed this as completed Jun 11, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

guppy v6 #64

guppy v6 #64

aafshinfard commented Jul 27, 2022

skoren commented Aug 1, 2022

aafshinfard commented Aug 2, 2022

hasindu2008 commented Aug 11, 2022

aafshinfard commented Aug 11, 2022

hasindu2008 commented Aug 29, 2022

aafshinfard commented Aug 29, 2022

aafshinfard commented Aug 29, 2022

arangrhie commented Aug 29, 2022

aafshinfard commented Aug 31, 2022

hasindu2008 commented Sep 1, 2022

hasindu2008 commented Nov 14, 2022

aafshinfard commented Nov 15, 2022

aafshinfard commented Nov 23, 2022

aafshinfard commented Nov 28, 2022

hasindu2008 commented Nov 29, 2022

aafshinfard commented Nov 29, 2022

skoren commented Jun 11, 2024

guppy v6 #64

guppy v6 #64

Comments

aafshinfard commented Jul 27, 2022

skoren commented Aug 1, 2022

aafshinfard commented Aug 2, 2022

hasindu2008 commented Aug 11, 2022

aafshinfard commented Aug 11, 2022

hasindu2008 commented Aug 29, 2022

aafshinfard commented Aug 29, 2022

aafshinfard commented Aug 29, 2022

arangrhie commented Aug 29, 2022

aafshinfard commented Aug 31, 2022

hasindu2008 commented Sep 1, 2022

hasindu2008 commented Nov 14, 2022

aafshinfard commented Nov 15, 2022

aafshinfard commented Nov 23, 2022

aafshinfard commented Nov 28, 2022

hasindu2008 commented Nov 29, 2022

aafshinfard commented Nov 29, 2022

skoren commented Jun 11, 2024