Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

There are too few interaction signals to manually adjust them in Juicebox #3

Open
2benaszq opened this issue Dec 26, 2024 · 8 comments
Labels
enhancement New feature or request

Comments

@2benaszq
Copy link

Hello,

First of all, thank you very much for developing such an excellent phasing and scaffolding tool. However, I have encountered some issues while using it.

I first assembled hap1.fa and hap2.fa using hifiasm, each approximately 1 GB in size. Then, I merged them into a single file named output.fa. Subsequently, I used the following commands:

cphasing pipeline -f output.fa -pcd porec.fq.gz -t 100 -n 18:2 -hcr
ln -sf cphasing_output/porec.pairs.gz ./
ln -sf cphasing_output/4.scaffolding/groups.agp ./
cphasing pairs2mnd porec.pairs.gz -o porec.mnd.txt
cphasing utils agp2assembly groups.agp > groups.assembly
docker run -i --rm -w ${PWD} -u $(id -u):$(id -g) -v /calculate:/calculate -v /data:/data hic:v2.1 /software/3d-dna-201008/visualize/run-assembly-visualizer.sh groups.assembly porec.mnd.txt

I then loaded the resulting groups.assembly and groups.hic files into Juicebox for manual adjustment, but I found that the interaction signals were too sparse to make adjustments effectively.
Here is a screenshot of the Juicebox view:
image

Additionally, the scaffolding plot generated directly by cphasing (without manual adjustment) looks as follows:
image

The results seem excellent, and I only need to adjust a few contigs to achieve a nearly perfect phased genome. However, due to the low signal visibility in Juicebox, I am unable to complete this task.

As mentioned earlier, the merged genome is 2 GB in size, and the porec data is only about 25 GB. I wonder if the low data volume in phasing mode is causing the weak signal display in Juicebox, or if there is any way to enhance the signal visibility in Juicebox to help me adjust misassembled contigs. Alternatively, what amount of porec data would be required for this genome to display strong interaction signals in Juicebox

Thanks!

@wangyibin
Copy link
Owner

Sorry for my late reply,
Firstly, thank you for your feedback on this issue.

The heatmap plotting from cphasing shows that 25 Gb of pore-c is enough to adjust the genome in Juicebox.

Your genome’s high homozygosity may result in most of the mapping quality of pore-c fragments being smaller than 1.
By default, cphasing pairs2mnd removes the interaction with a quality <1; you can set -q 0 to load all the interactions of pore-c to the Juicebox.

cphasing pairs2mnd porec.pairs.gz -o porec.mnd.txt  -q 0

Best regards,
Yibin

@2benaszq
Copy link
Author

Dear Dr. Yibin,

Thank you for your response.
With your help, I was indeed able to achieve the results I wanted.
image

However, I have one more question that I didn’t mention last time.
I noticed that CPhasing seems to call the program Partig, which appears to have a limitation on the contig length, requiring contig lengths not to exceed 2**27 bp, approximately 134 Mb. My genome, however, has a contig as long as 136 Mb. To address this, I manually split it into 100 Mb and 36 Mb segments, completed the scaffolding, and then merged them back together. While this approach works, it feels a bit inconvenient.

Is there any way to avoid this issue, or will this minor limitation be fixed in future updates?

Best regards,
Iron Man

@wangyibin wangyibin added the enhancement New feature or request label Jan 2, 2025
@wangyibin
Copy link
Owner

Thank you for your suggestion.

We will fix this limitation in the future or add a function to split long contigs.

Best regards,
Yibin.

@2benaszq
Copy link
Author

2benaszq commented Jan 2, 2025

Dear Dr. Yibin,
Thank you for your reply, I look forward to the next update of CPhasing with great anticipation!

Best regards,
Iron Man

@2benaszq
Copy link
Author

Dear Dr. Yibin:
I have recently been trying version 0.2.5.r291 of CPhasing. The work I am doing involves running CPhasing with Hi-C data and using a haploid mode for mounting because my genome only has one set of pseudo-haplotypes. After using Juicebox for the adjustment, I obtained the groups.review.assembly file. I wanted to use the adjusted results to generate an interaction heatmap again, so I ran the following commands:

cphasing utils assembly2agp groups.review.assembly -n 18
cphasing agp2fasta groups.review.agp genome.contigs.fasta > groups.review.chr.fasta
cphasing-rs pairs-break -o corrected.pairs.gz pass.pairs.gz groups.review.corrected.agp
cphasing pairs2cool corrected.pairs.gz cphasing_output/genome.contigs.contigsizes pass.q1.10k.cool -q 1 -bs 10k
cphasing plot -a groups.review.agp -m pass.q1.10k.cool -o groups.q1.500k.wg.png -bs 500k -oc

The resulting groups.q1.500k.wg.png is as shown below:

Image

However, the interaction signals in Juicebox are very good, as shown here:

Image

I am not sure where the issue lies—whether I made a mistake in my operations—but when I used the phasing mode previously, the generated plots didn’t seem to have any issues. I look forward to your reply.

Best regards,
Iron Man

@2benaszq
Copy link
Author

Dear Dr. Yibin: I have recently been trying version 0.2.5.r291 of CPhasing. The work I am doing involves running CPhasing with Hi-C data and using a haploid mode for mounting because my genome only has one set of pseudo-haplotypes. After using Juicebox for the adjustment, I obtained the groups.review.assembly file. I wanted to use the adjusted results to generate an interaction heatmap again, so I ran the following commands:

cphasing utils assembly2agp groups.review.assembly -n 18 cphasing agp2fasta groups.review.agp genome.contigs.fasta > groups.review.chr.fasta cphasing-rs pairs-break -o corrected.pairs.gz pass.pairs.gz groups.review.corrected.agp cphasing pairs2cool corrected.pairs.gz cphasing_output/genome.contigs.contigsizes pass.q1.10k.cool -q 1 -bs 10k cphasing plot -a groups.review.agp -m pass.q1.10k.cool -o groups.q1.500k.wg.png -bs 500k -oc

The resulting groups.q1.500k.wg.png is as shown below:

Image

However, the interaction signals in Juicebox are very good, as shown here:

Image

I am not sure where the issue lies—whether I made a mistake in my operations—but when I used the phasing mode previously, the generated plots didn’t seem to have any issues. I look forward to your reply.

Best regards, Iron Man

Sorry, I misspoke earlier. It's still the porec data, but I'm using the --mode haploid. The command is as follows:

cphasing pipeline -f ../../00.data/genome.contigs.fasta -pcd pass.fq.gz -t 60 --mode haploid

@wangyibin
Copy link
Owner

I am very sorry for replying so late.

The problem of losing contacts is a bug of the cphasing pairs2cool command; you can add the parameter of --low-memory to generate a new .cool file to skip it. And we will fix it in the next release.

One other note: The pass.q1.10k.cool is generated from the corrected.pairs.gz, when you execute the plot you should replace the groups.review.agp to the groups.review.corrected.agp.

Best wishes.

@2benaszq
Copy link
Author

2benaszq commented Mar 3, 2025

Dear Dr. Yibin:
Thank you for your reply. The cphasing pairs2cool --low-memory parameter is indeed useful—I was able to obtain my desired results using it. However, when I tried to plot the final interaction heatmap with groups.review.assembly, it seemed that I still had to use cphasing plot -a groups.review.agp to generate the correct heatmap.

Since most of my genomes were manually fragmented during scaffolding, I was confused for some time about whether to use the groups.review.agp file or the groups.review.corrected.agp file for plotting. Taking the above case as an example, when I used the groups.review.corrected.agp file, the resulting heatmap was as follows:

Image

It is obvious that the chromosome sizes do not match those in the Juicebox screenshot above.

However, when I used the groups.review.agp file, the heatmap appeared normal, as shown below:

Image

So, should groups.review.agp indeed be used, or does the workflow of cphasing not align with your expected output?

Best wishes.
Iron Man

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants