Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

alntmscore output is wrong #312

Closed
ekiefl opened this issue Jul 23, 2024 · 11 comments
Closed

alntmscore output is wrong #312

ekiefl opened this issue Jul 23, 2024 · 11 comments

Comments

@ekiefl
Copy link

ekiefl commented Jul 23, 2024

Expected Behavior

I expect alntmscore to be different from ttmscore.

Current Behavior

All alntmscore values are equal to ttmscore.

Steps to Reproduce (for bugs)

Here is a zipped directory of 75 structures in pdb format:

structures.zip

Unzip this directory. Then perform a search and convert the alignment.

rm -rf tmp
foldseek createdb ./structures/ targetDB
foldseek search targetDB targetDB alnDB tmp -a --exhaustive-search
foldseek convertalis targetDB targetDB alnDB output.txt --format-mode 4 --format-output query,target,qlen,tlen,alnlen,qtmscore,ttmscore,alntmscore,cigar,qseq,tseq

In a python environment with pandas, confirm that all alntmscore equal all ttmscore and sample 20 hits to be printed:

import pandas as pd

df = pd.read_csv("output.txt", sep="\t")

# This assert passes, meaning every ttmscore equals every alntmscore
assert (df.ttmscore == df.alntmscore).all()

print(
    df[["query", "target", "qlen", "tlen", "alnlen", "qtmscore", "ttmscore", "alntmscore"]]
    .sample(20)
    .tail(10)
    .to_markdown()
)

Output of script:

query target qlen tlen alnlen qtmscore ttmscore alntmscore
1187 B0RXV1 V7BU96 227 563 206 0.5245 0.2354 0.2354
5406 U5U2L0 V4KAC2 550 599 560 0.6095 0.5638 0.5638
3335 Q5F9Z5 Q9XZT6 206 250 234 0.6303 0.5362 0.5362
861 B0BUU8 Q20230 203 191 210 0.5333 0.5591 0.5591
842 B0BUU8 B0RXV1 203 227 199 0.8489 0.7633 0.7633
2810 K0F1X4 B1JTS0 178 206 211 0.6423 0.5692 0.5692
2927 L8IGY7 P48769 256 260 254 0.9265 0.9125 0.9125
4925 Q8R9S6 Q834T6 206 226 245 0.4625 0.4305 0.4305
3861 W5N438 B8F7G0 264 208 255 0.5412 0.6642 0.6642
960 B0K119 C5A558 203 190 218 0.5143 0.5417 0.5417

Foldseek Output (for bugs)

Full Foldseek standard out:

targetDB exists and will be overwritten
createdb ./structures/ targetDB

MMseqs Version:        	9.427df8a
Path to ProstT5
Chain name mode        	0
Write mapping file     	0
Mask b-factor threshold	0
Coord store mode       	2
Write lookup file      	1
Input format           	0
File Inclusion Regex   	.*
File Exclusion Regex   	^$
Threads                	14
Verbosity              	3

Output file: targetDB
[=================================================================] 100.00% 75 0s 11ms
Time for merging to targetDB_ss: 0h 0m 0s 1ms
Time for merging to targetDB_h: 0h 0m 0s 1ms
Time for merging to targetDB_ca: 0h 0m 0s 1ms
Time for merging to targetDB: 0h 0m 0s 1ms
Ignore 0 out of 75.
Too short: 0, incorrect: 0, not proteins: 0.
Time for processing: 0h 0m 0s 65ms
Create directory tmp
search targetDB targetDB alnDB tmp -a --exhaustive-search

MMseqs Version:              	9.427df8a
Seq. id. threshold           	0
Coverage threshold           	0
Coverage mode                	0
Max reject                   	2147483647
Max accept                   	2147483647
Add backtrace                	true
TMscore threshold            	0
TMalign hit order            	0
TMalign fast                 	1
Preload mode                 	0
Threads                      	14
Verbosity                    	3
LDDT threshold               	0
Sort by structure bit score  	1
Alignment type               	2
Exact TMscore                	0
Substitution matrix          	aa:3di.out,nucl:3di.out
Alignment mode               	3
Alignment mode               	0
E-value threshold            	10
Min alignment length         	0
Seq. id. mode                	0
Alternative alignments       	0
Max sequence length          	65535
Compositional bias           	1
Compositional bias           	1
Gap open cost                	aa:10,nucl:10
Gap extension cost           	aa:1,nucl:1
Compressed                   	0
Seed substitution matrix     	aa:3di.out,nucl:3di.out
Sensitivity                  	9.5
k-mer length                 	0
Target search mode           	0
k-score                      	seq:2147483647,prof:2147483647
Max results per query        	1000
Split database               	0
Split mode                   	2
Split memory limit           	0
Diagonal scoring             	true
Exact k-mer matching         	0
Mask residues                	0
Mask residues probability    	0.99995
Mask lower case residues     	1
Minimum diagonal score       	30
Selected taxa
Spaced k-mers                	1
Spaced k-mer pattern
Local temporary path
Exhaustive search mode       	true
Prefilter mode               	0
Search iterations            	1
Remove temporary files       	true
MPI runner
Force restart with latest tmp	false
Cluster search               	0

structurealign targetDB targetDB tmp/15707625884678452062/pref tmp/15707625884678452062/strualn --tmscore-threshold 0 --lddt-threshold 0 --sort-by-structure-bits 1 --alignment-type 2 --exact-tmscore 0 --sub-mat 'aa:3di.out,nucl:3di.out' -a 1 --alignment-mode 3 --alignment-output-mode 0 --wrapped-scoring 0 -e 10 --min-seq-id 0 --min-aln-len 0 --seq-id-mode 0 --alt-ali 0 -c 0 --cov-mode 0 --max-seq-len 65535 --comp-bias-corr 1 --comp-bias-corr-scale 0.5 --max-rejected 2147483647 --max-accept 2147483647 --add-self-matches 0 --db-load-mode 0 --pca substitution:1.100,context:1.400 --pcb substitution:4.100,context:5.800 --score-bias 0 --realign 0 --realign-score-bias -0.2 --realign-max-seqs 2147483647 --corr-score-weight 0 --gap-open aa:10,nucl:10 --gap-extend aa:1,nucl:1 --zdrop 40 --threads 14 --compressed 0 -v 3

[=================================================================] 100.00% 75 0s 223ms
Time for merging to strualn: 0h 0m 0s 1ms
Time for processing: 0h 0m 0s 282ms
mvdb tmp/15707625884678452062/strualn tmp/15707625884678452062/aln

Time for processing: 0h 0m 0s 2ms
mvdb tmp/15707625884678452062/aln alnDB -v 3

Time for processing: 0h 0m 0s 3ms
Removing temporary files
rmdb tmp/15707625884678452062/pref -v 3

Time for processing: 0h 0m 0s 1ms
output.txt exists and will be overwritten
convertalis targetDB targetDB alnDB output.txt --format-mode 4 --format-output query,target,qlen,tlen,alnlen,qtmscore,ttmscore,alntmscore,cigar,qseq,tseq

MMseqs Version:        	9.427df8a
Substitution matrix    	aa:3di.out,nucl:3di.out
Alignment format       	4
Format alignment output	query,target,qlen,tlen,alnlen,qtmscore,ttmscore,alntmscore,cigar,qseq,tseq
Gap open cost          	aa:10,nucl:10
Gap extension cost     	aa:1,nucl:1
Database output        	false
Preload mode           	0
Threads                	14
Compressed             	0
Verbosity              	3
Exact TMscore          	0

[=================================================================] 100.00% 75 0s 574ms
Time for merging to output.txt: 0h 0m 0s 3ms
Time for processing: 0h 0m 0s 715ms

Your Environment

Include as many relevant details about the environment you experienced the bug in.

  • Git commit used (The string after "MMseqs Version:" when you execute foldseek without any parameters): 9.427df8a
  • Which foldseek version was used (Statically-compiled, self-compiled, Conda, etc.): conda (build: pl5321h79102bb_0, also reproduced on pl5321h8ec77f5_1)
  • For self-compiled and Homebrew: Compiler and Cmake versions used and their invocation: N/A
  • Server specifications (especially CPU support for AVX2/SSE and amount of system memory): N/A
  • Operating system and version: macOS 14.1

EDIT: The behavior is observed in the following versions (all installed via conda):

  • 9.427df8a
  • 8.ef4e960
  • 7.04e0ec8
  • 6.29e2557

The script fails in 5.53465f0 because qtmscore has not been implemented.

Related issues

#221

@ekiefl
Copy link
Author

ekiefl commented Jul 23, 2024

The behavior is also observed with easy-search:

rm -rf tmp
foldseek easy-search ./structures/ ./structures/ output.txt tmp -a --exhaustive-search --format-mode 4 --format-output query,target,qlen,tlen,alnlen,qtmscore,ttmscore,alntmscore,cigar,qseq,tseq

@austinhpatton
Copy link

I wanted to dig into this error a bit more, as I've found that in some cases we don't observe the equivalence between ttmscore and alntmscore. I suspected it had something to do with the structure of the --format-output specification. So I tried six combinations, changing the order in which alntmscore, qtmscore, and ttmscore were requested.

The commands used are shown below:

foldseek easy-search ./structures/ ./structures/ v1.m8 tmp_dir_1 --exhaustive-search 1 --alignment-type 2 --format-mode 4 --format-output query,target,qlen,tlen,alnlen,qtmscore,ttmscore,alntmscore
foldseek easy-search ./structures/ ./structures/ v2.m8 tmp_dir_2 --exhaustive-search 1 --alignment-type 2 --format-mode 4 --format-output query,target,qlen,alnlen,tlen,qtmscore,alntmscore,ttmscore
foldseek easy-search ./structures/ ./structures/ v3.m8 tmp_dir_2 --exhaustive-search 1 --alignment-type 2 --format-mode 4 --format-output query,target,tlen,alnlen,qlen,ttmscore,alntmscore,qtmscore
foldseek easy-search ./structures/ ./structures/ v4.m8 tmp_dir_3 --exhaustive-search 1 --alignment-type 2 --format-mode 4 --format-output query,target,tlen,qlen,alnlen,ttmscore,qtmscore,alntmscore
foldseek easy-search ./structures/ ./structures/ v5.m8 tmp_dir_4 --exhaustive-search 1 --alignment-type 2 --format-mode 4 --format-output query,target,alnlen,qlen,tlen,alntmscore,qtmscore,ttmscore
foldseek easy-search ./structures/ ./structures/ v6.m8 tmp_dir_5 --exhaustive-search 1 --alignment-type 2 --format-mode 4 --format-output query,target,alnlen,tlen,qlen,alntmscore,ttmscore,qtmscore

I modified @ekiefl's python script to check if alntmscore was always equal to either ttmscore or qtmscore given an output file provided from the commandline - this is shown below:

import pandas as pd
import argparse

parser = argparse.ArgumentParser()
parser.add_argument("input_file", type=str)
args = parser.parse_args()

df = pd.read_csv(args.input_file, sep="\t")

# Check if alntmscore equals ttmscore or qtmscore
if (df.ttmscore == df.alntmscore).all():
    print("All alntmscore values equal ttmscore.")
elif (df.qtmscore == df.alntmscore).all():
    print("All alntmscore values equal qtmscore.")
else:
    print("alntmscore values do not match ttmscore or qtmscore consistently.")

print(
    df[["query", "target", "qlen", "tlen", "alnlen", "qtmscore", "ttmscore", "alntmscore"]]
    .sample(20)
    .tail(10)
    .to_markdown()
)

And using this script, we find the following:

python foldseek_debug.py v1.m8
All alntmscore values equal ttmscore.
|      | query      | target     |   qlen |   tlen |   alnlen |   qtmscore |   ttmscore |   alntmscore |
|-----:|:-----------|:-----------|-------:|-------:|---------:|-----------:|-----------:|-------------:|
| 1694 | Q9PRC5.pdb | V4TDT2.pdb |    230 |    539 |      286 |     0.6175 |     0.2986 |       0.2986 |
|  317 | B0K7K7.pdb | B1AI06.pdb |    203 |    230 |      209 |     0.88   |     0.7825 |       0.7825 |
| 2455 | B1L821.pdb | U5U2L0.pdb |    197 |    550 |      182 |     0.5044 |     0.2132 |       0.2132 |
| 4026 | Q20230.pdb | W5PRH4.pdb |    191 |    277 |      245 |     0.6217 |     0.4638 |       0.4638 |
| 5581 | L1J5L1.pdb | Q54UT2.pdb |    254 |    285 |      256 |     0.6551 |     0.5885 |       0.5885 |
| 2561 | U7PJT1.pdb | W5N0Q0.pdb |    257 |    263 |      288 |     0.3744 |     0.368  |       0.368  |
| 1031 | Q8R9S6.pdb | I3MMU0.pdb |    206 |    277 |      276 |     0.3828 |     0.3006 |       0.3006 |
| 1800 | Q54UT2.pdb | Q54UT2.pdb |    285 |    285 |      285 |     1      |     1      |       1      |
| 1023 | Q8R9S6.pdb | L8IGY7.pdb |    206 |    256 |      252 |     0.4647 |     0.3913 |       0.3913 |
| 4843 | Q4DE18.pdb | K7GFB3.pdb |    351 |    242 |      240 |     0.4184 |     0.5662 |       0.5662 |

python foldseek_debug.py v2.m8
All alntmscore values equal qtmscore.
|      | query      | target     |   qlen |   tlen |   alnlen |   qtmscore |   ttmscore |   alntmscore |
|-----:|:-----------|:-----------|-------:|-------:|---------:|-----------:|-----------:|-------------:|
| 3434 | B1AIY1.pdb | B0U447.pdb |    224 |    217 |      260 |     0.3748 |     0.3864 |       0.3748 |
|  369 | W5N0Q0.pdb | P38493.pdb |    263 |    224 |      274 |     0.3627 |     0.4111 |       0.3627 |
| 1324 | P38493.pdb | B7MTM8.pdb |    224 |    213 |      263 |     0.39   |     0.4051 |       0.39   |
| 4291 | B0U447.pdb | B1L821.pdb |    217 |    197 |      196 |     0.7853 |     0.8619 |       0.7853 |
| 1887 | B0W8G8.pdb | W5N438.pdb |    246 |    264 |      258 |     0.8034 |     0.7519 |       0.8034 |
| 4467 | P48769.pdb | B7MTM8.pdb |    260 |    213 |      258 |     0.5212 |     0.6163 |       0.5212 |
| 5507 | P27707.pdb | W0R9N0.pdb |    260 |    221 |      246 |     0.676  |     0.7849 |       0.676  |
| 4474 | P48769.pdb | Q7VKH4.pdb |    260 |    214 |      265 |     0.5115 |     0.6031 |       0.5115 |
| 4617 | B1L821.pdb | Q20230.pdb |    197 |    191 |      211 |     0.4935 |     0.5054 |       0.4935 |
| 1502 | V4TDT2.pdb | V7BU96.pdb |    539 |    563 |      564 |     0.622  |     0.5973 |       0.622  |

python foldseek_debug.py v3.m8
All alntmscore values equal ttmscore.
|      | query      | target     |   qlen |   tlen |   alnlen |   qtmscore |   ttmscore |   alntmscore |
|-----:|:-----------|:-----------|-------:|-------:|---------:|-----------:|-----------:|-------------:|
| 1729 | B7MTM8.pdb | Q7VKH4.pdb |    213 |    214 |      209 |     0.9559 |     0.9515 |       0.9515 |
| 4020 | B1I165.pdb | L8IGY7.pdb |    221 |    256 |      249 |     0.5575 |     0.4923 |       0.4923 |
| 2513 | B0W8G8.pdb | B8F7G0.pdb |    246 |    208 |      235 |     0.5625 |     0.6493 |       0.6493 |
| 4939 | V4T3S7.pdb | Q1QEQ3.pdb |    470 |    230 |      205 |     0.2217 |     0.4064 |       0.4064 |
| 4871 | Q20230.pdb | V4KAC2.pdb |    191 |    599 |      182 |     0.511  |     0.1944 |       0.1944 |
| 3962 | W5JIH9.pdb | P63807.pdb |    275 |    219 |      265 |     0.3686 |     0.4393 |       0.4393 |
| 4892 | V4T3S7.pdb | I3MMU0.pdb |    470 |    277 |      278 |     0.3968 |     0.6615 |       0.6615 |
| 2351 | Q6FZE3.pdb | B1JI38.pdb |    192 |    212 |      231 |     0.4975 |     0.4619 |       0.4619 |
| 2572 | I3MMU0.pdb | B0W8G8.pdb |    277 |    246 |      256 |     0.701  |     0.7851 |       0.7851 |
| 4191 | Q9PRC5.pdb | Q8Y5W6.pdb |    230 |    224 |      257 |     0.4048 |     0.4136 |       0.4136 |

python foldseek_debug.py v4.m8
All alntmscore values equal qtmscore.
|      | query      | target     |   qlen |   tlen |   alnlen |   qtmscore |   ttmscore |   alntmscore |
|-----:|:-----------|:-----------|-------:|-------:|---------:|-----------:|-----------:|-------------:|
| 1507 | W5PR96.pdb | Q58EI2.pdb |    260 |    264 |      264 |     0.9114 |     0.898  |       0.9114 |
| 3144 | B6U6Y9.pdb | Q3AFE0.pdb |    489 |    289 |      181 |     0.1788 |     0.2795 |       0.1788 |
| 3583 | W5JIH9.pdb | Q20230.pdb |    275 |    191 |      224 |     0.4478 |     0.6059 |       0.4478 |
|  690 | O83373.pdb | B7MTM8.pdb |    208 |    213 |      217 |     0.7442 |     0.7285 |       0.7442 |
| 5293 | U7PMD6.pdb | B2UVL9.pdb |    307 |    191 |      279 |     0.3281 |     0.4828 |       0.3281 |
| 3994 | B4TFH5.pdb | Q9PRC5.pdb |    213 |    230 |      216 |     0.8171 |     0.7617 |       0.8171 |
| 4220 | V4TDT2.pdb | V6DIT2.pdb |    539 |    234 |      275 |     0.4053 |     0.9165 |       0.4053 |
| 2414 | V4T3S7.pdb | M3ZT88.pdb |    470 |    263 |      279 |     0.4116 |     0.7195 |       0.4116 |
| 2393 | Q6FZE3.pdb | B6U6Y9.pdb |    192 |    489 |      200 |     0.4129 |     0.1927 |       0.4129 |
| 2929 | W1NL77.pdb | I1HTV7.pdb |    586 |    517 |      521 |     0.5987 |     0.6735 |       0.5987 |

python foldseek_debug.py v5.m8
alntmscore values do not match ttmscore or qtmscore consistently.
|      | query      | target     |   qlen |   tlen |   alnlen |   qtmscore |   ttmscore |   alntmscore |
|-----:|:-----------|:-----------|-------:|-------:|---------:|-----------:|-----------:|-------------:|
|   22 | B0K7K7.pdb | O83373.pdb |    203 |    208 |      210 |     0.8133 |     0.7951 |       0.8133 |
| 1205 | M8CV53.pdb | U5U2L0.pdb |    526 |    550 |      520 |     0.6098 |     0.5853 |       0.6161 |
| 1249 | M8CV53.pdb | B0K7K7.pdb |    526 |    203 |      198 |     0.211  |     0.479  |       0.4891 |
| 2314 | B1JI38.pdb | Q3AFE0.pdb |    212 |    289 |      209 |     0.3783 |     0.3027 |       0.3827 |
| 4951 | I1HTV7.pdb | M8CV53.pdb |    517 |    526 |      468 |     0.6866 |     0.6754 |       0.7547 |
| 1175 | V6DIT2.pdb | B1J4Z5.pdb |    234 |    207 |      243 |     0.51   |     0.5661 |       0.5661 |
| 5024 | I1HTV7.pdb | Q8R9S6.pdb |    517 |    206 |      305 |     0.2084 |     0.4336 |       0.4336 |
| 3224 | P48769.pdb | B1AIY1.pdb |    260 |    224 |      292 |     0.3047 |     0.3421 |       0.3421 |
| 1199 | V6DIT2.pdb | B1J5H0.pdb |    234 |    228 |      272 |     0.4114 |     0.4199 |       0.4199 |
| 2686 | L8IGY7.pdb | C5A558.pdb |    256 |    190 |      235 |     0.4193 |     0.5343 |       0.5343 |

python foldseek_debug.py v6.m8
alntmscore values do not match ttmscore or qtmscore consistently.
|      | query      | target         |   qlen |   tlen |   alnlen |   qtmscore |   ttmscore |   alntmscore |
|-----:|:-----------|:---------------|-------:|-------:|---------:|-----------:|-----------:|-------------:|
| 3852 | B8F7G0.pdb | W5PR96.pdb     |    208 |    260 |      259 |     0.6604 |     0.545  |       0.6604 |
| 5475 | B1J4Z5.pdb | B1J4Z5.pdb     |    207 |    207 |      207 |     1      |     1      |       1      |
| 2670 | Q9XZT6.pdb | B1J4Z5.pdb     |    250 |    207 |      230 |     0.5031 |     0.5887 |       0.5887 |
| 3688 | I3MMU0.pdb | A0A818UTT8.pdb |    277 |    249 |      243 |     0.6877 |     0.7622 |       0.7804 |
| 3365 | Q4DE18.pdb | B1AIY1.pdb     |    351 |    224 |      267 |     0.2955 |     0.422  |       0.422  |
| 3928 | Q8Y5W6.pdb | I3MMU0.pdb     |    224 |    277 |      278 |     0.4254 |     0.3598 |       0.4254 |
| 4085 | W6ULK1.pdb | Q9UXG7.pdb     |    186 |    189 |      173 |     0.5381 |     0.5309 |       0.5719 |
| 3140 | B0K7K7.pdb | Q8R9S6.pdb     |    203 |    206 |      230 |     0.4541 |     0.4497 |       0.4541 |
| 4465 | Q834T6.pdb | P48769.pdb     |    226 |    260 |      265 |     0.4025 |     0.3607 |       0.4025 |
| 5571 | O83373.pdb | B1AI06.pdb     |    208 |    230 |      219 |     0.7591 |     0.6935 |       0.7591 |

In brief:

  1. When alntmscore is not the first of the three TM-scores requested by --format-output, it always equals either ttmscore or qtmscore
    • Which score it is equal to switches dependent on the order in which columns are requested
  2. When alntmscore is the first of the three TM-scores requested by --format-output, it does not consistently equal one or the other, but often (though not always) is equal to one of them, but not in a manner that is consistently predicted by qlen, tlen, or alnlen

@ekiefl
Copy link
Author

ekiefl commented Aug 29, 2024

Bumping this issue, @milot-mirdita and @martin-steinegger.

Below is an example of how detrimental this bug is.

I performed a one-versus-many calculation of TM-score with both Foldseek and TMalign. For each TMalign result, the "alntmscore" is calculated manually with the following prescription:


image


Then, the scores are ranked from high to low and the result is shown as the monotonically increasing trace (blue). Using the same ordering, the foldseek alntmscore results are shown as red dots using the following settings:

prefilter_mode=2
alignment_type=1
tmalign_fast=0
exact_tmscore=1
image

If instead I manually calculate alntmscore from the Foldseek values qlen, qtmscore, tlen and tmscore according to the above prescription, the results between TMalign and Foldseek converge:

image

As far as I can tell, everyone who uses the alntmscore output by Foldseek gets results akin to the first plot, as is demonstrated in the MRE I presented above.

@martin-steinegger
Copy link
Collaborator

martin-steinegger commented Aug 29, 2024

Thank you for the analysis. How do you compute the alntmscore? I checked the code we do use std::min(static_cast(res.backtrace.size()), std::min(res.dbLen, res.qLen))) as normalization factor.

Normalizing by res.backtrace.size() might be better, maybe we should change this.

@martin-steinegger
Copy link
Collaborator

martin-steinegger commented Aug 29, 2024

Okay I think I know what is going on. If you print out qtmscore,ttmscore,alntmscore then the alntmscore = ttmscore.
If you print ttmscore,qtmscore,alntmscore then alntmscore = qtmscore;
However, if you print alntmscore,qtmscore,ttmscore then it should work. This is a bug, which I will fix soon.

@ekiefl
Copy link
Author

ekiefl commented Aug 29, 2024

Thanks for the response.

How do you compute the alntmscore?

I provided some equations above, perhaps they didn't render. Or is there something more specific you are curious about?

In case of TMalign, our ranking is done by (qTM+tTM)/2. Might this explain why you see this kind of ranking?

Each comparison, whether calculated by TMalign or Foldseek, is ordered according to TMAlign's alntmscore. That's why the TMalign curve monotonically increases. So Foldseek's ranking is irrelevant given how the data has been presented.

Normalizing by res.backtrace.size() might be better, maybe we should change this.

Given that ttmscore is normalized by tlen, and qtmscore is normalized by qlen, I think I agree that alntmscore should be calculated by normalizing by alnlen.


However, a bug persists even if this were changed, as illustrated in this table output from the MRE. Allow me to explain.

Basically, in the table alntmscore always equates with ttmscore. Given that the normalization is min(alnlen, qlen, tlen), the only way in which this would be possible is if tlen is always smaller thanalnlen, however, the other columns show that isn't the case.

query target qlen tlen alnlen qtmscore ttmscore alntmscore
1187 B0RXV1 V7BU96 227 563 206 0.5245 0.2354 0.2354
5406 U5U2L0 V4KAC2 550 599 560 0.6095 0.5638 0.5638
3335 Q5F9Z5 Q9XZT6 206 250 234 0.6303 0.5362 0.5362
861 B0BUU8 Q20230 203 191 210 0.5333 0.5591 0.5591
842 B0BUU8 B0RXV1 203 227 199 0.8489 0.7633 0.7633
2810 K0F1X4 B1JTS0 178 206 211 0.6423 0.5692 0.5692
2927 L8IGY7 P48769 256 260 254 0.9265 0.9125 0.9125
4925 Q8R9S6 Q834T6 206 226 245 0.4625 0.4305 0.4305
3861 W5N438 B8F7G0 264 208 255 0.5412 0.6642 0.6642
960 B0K119 C5A558 203 190 218 0.5143 0.5417 0.5417

@martin-steinegger
Copy link
Collaborator

martin-steinegger commented Aug 29, 2024

@ekiefl thank you so much. Please see my comment above. Could you please try to print out the tmscores in this order: alntmscore,qtmscore,ttmscore. Does this change anything?

@ekiefl
Copy link
Author

ekiefl commented Aug 29, 2024

Okay I think I know what is going on. If you print out qtmscore,ttmscore,alntmscore then the alntmscore = ttmscore.
If you print ttmscore,qtmscore,alntmscore then alntmscore = qtmscore;
However, if you print alntmscore,qtmscore,ttmscore then it should work. This is a bug, which I will fix soon.

Exactly. It has something to do with this variable:

tmres = tmaligner->computeTMscore(targetCaData, &targetCaData[res.dbLen], &targetCaData[res.dbLen+res.dbLen], res.dbLen,

Given that the normalization is currently min(alnlen, qlen, tlen), @austinhpatton's comment makes sense:

When alntmscore is the first of the three TM-scores requested by --format-output, it does not consistently equal one or the other, but often (though not always) is equal to one of them, but not in a manner that is consistently predicted by qlen, tlen, or alnlen


Could you try to print out the tmscores in this order: alntmscore,qtmscore,ttmscore. Does this change anything?

The specific example stems from a subset of a larger all-vs-all, so it's not easy to re-run the results. But, I have just confirmed the effect that the order alntmscore,qtmscore,ttmscore has on the MRE, which now produces this table:

query target qlen tlen alnlen qtmscore ttmscore alntmscore
3599 B1I165 U7PMD6 221 307 303 0.4044 0.315 0.4044
3772 B1JI38 Q9UXG7 212 189 209 0.7307 0.8108 0.8108
2365 I3MMU0 B4TFH5 277 213 254 0.4645 0.5776 0.5776
4929 Q20230 P0C1G0 191 228 230 0.6095 0.5279 0.6095
1568 Q9UXG7 Q8R9S6 189 206 217 0.4942 0.4614 0.4942
5620 Q7VKH4 U7PMD6 214 307 299 0.4353 0.3304 0.4353
760 W5N0Q0 V4CH82 263 229 262 0.7477 0.8545 0.8545
2397 I3MMU0 Q3AFE0 277 289 234 0.3628 0.3507 0.4174
5389 Q5F9Z5 B1AIY1 206 224 262 0.4457 0.4171 0.4457
1115 W6ULK1 P63807 186 219 221 0.2924 0.2573 0.2924

This table matches what one expects if alntmscore is normalized by the minimum of the lengths.


When you get around to fixing the bug, may I suggest reporting a rawtmscore that one can normalize how they see fit?

martin-steinegger added a commit that referenced this issue Aug 29, 2024
alntmscore is now normalized by the backtrace length
@martin-steinegger
Copy link
Collaborator

Thank you Evan. I pushed a fix. Could you retry it with the newest version?
It should fix the order issue and I changed the normalization to the backtrace size as well.

@ekiefl
Copy link
Author

ekiefl commented Aug 29, 2024

Great. I can test it out if you point me to some instructions for building from source.

gamcil added a commit to steineggerlab/foldmason that referenced this issue Oct 26, 2024
d2d09b58 Merge pull request #330 from stromjm/interface_v2.0
4e514a2a Merge branch 'steineggerlab:master' into interface_v2.0
8daecacc except overlap
e1f38a1e Merge pull request #362 from rachelse/master
6b00c5d9 Merge pull request #366 from rachelse/steineggerlab
c18727e4 Deleted search-clust pipeline from README
3d85d5c4 minor
d692f966 there could exist no match against itself
e1f238df error
a078fb9f typo in readme
24c8c92b treat monomer as singleton when scoremultimer uses --monomer-include-mode 1
1d5f9369 fix interface extraction exceptions
c27a629a single chain alignment bug fixed
19c8820c Merge pull request #359 from steineggerlab/multimer
079a5a13 monomer related update done
06275df3 rollback to 43fd26f3d3e043c8f9fd4c2b193a8b68f8781689
4046f00f test rbh filter off
13e9883c Merge branch 'multimer' of https://github.com/steineggerlab/foldseek into multimer
2dadffc0 test rbh filter off
83cc643b update single chain cluster
a17598cd build exceptions for interface mode
43fd26f3 remove tmscore threshold
704c3a82 fix chain cov ratio
1800e6a9 update for single chained alignments
cd26d54c implement complex-tm-threshold
0b1fa423 typo
c74a1a5f replace singlechain mode into  monomer mode
34104148 merged steineggerlab/foldseek
d414d908 merged steineggerlab/foldseek
232a4c43 Merge pull request #353 from rachelse/master
19595fd5 order in LocalParameters
c5412cf7 Merge remote-tracking branch 'upstream/master' into interface_v2.0
c1a6b76c merged steineggerlab/foldseek
88093635 Merge pull request #354 from steineggerlab/test
d267b3d8 bug fixed
7f2c6219 bug fix try4
abff375e bug fix try3
25e9629c bug fix try2
d9b2913b bug fix try1
8711f6b7 minor things
48666e18 typo in Readme
154019b6 Merge remote-tracking branch 'upstream/master'
a2ec51d4 fix parameter explain
306ffb82 add parameter for single chained assignments
b7c58acb Update regression
2c6b809d Update regression and convertalis ALNTMSCORE score
7e6be60a Revise alignment TMscore computation
ab20120e scoremultimer
319144b4 minor
7112fccb minor
e9b0f234 check alinged chain num when interfacelddt
2256e219 Merge pull request #345 from steineggerlab/foldseek_multiple_with_singlechain
9a2b7da4 bug fixed: single elemented vector for single chain alignment
d99d79c1 single chain allowing multimersearch
52029c06 Added BFVD as a foldseek database (#344)
af1e86e5 default parameters
e1d15f64 alnLen seems much better
5b76247b minor
ac0a32b1 minor
06016bdd minor
fd87b160 added parameters in example in Readme
17402748 added coverage in Readme
f1fc6a96 minor
1b39c482 big complex first to prevent big ones to be left and run alone for few hours using 1 thread
08b7e9ce previous version is twice faster
de945b2b simd returns segfault
602ff37b Merge branch ‘steineggerlab:master’ into master
be9fc339 Merge pull request #343 from steineggerlab/foldseek_multimer_bottleneck
3bcdabae checkChainRedundancy with unordered_sets
5a4ad0be foundNeighbors as an unordered_set
b15e236a foldseek-multimer bottleneck solved
b947f688 implement foundNeighbors
89f371bb DBSCAN as non-recursive function
9f603b29 check new scoremultimer
4f70b3f4 Readme
50c1df1b Readme
4f1592a4 Readme
c3093cd3 outputs filtcov too
420038de chaging readme
3fd78777 chaging readme
aa2ced33 all_seqs.fasta not working
d1605fe7 SIMD for tmscore
72f5028c map to vector, complex to multimer, [TODO] check if speed improved
63bca7b2 remove distMap
73e41342 seq3di.clear() in GemmiWrapper
2758b96e Merge branch 'master' into interface_v2.0
9d74a1bf orders of Parameters
94874214 make mergeable
2b731d3c Updated regression
02fb1e58 Updated submodule
17986f4c code styles
6db582ea made filtermultimer to get one argument as output. 'output' and 'outout_info' will be the actual outputs
a6bee293 important issue solved, thread_idx while writing
4bbdca4f minor
aeacb68b changed way to buffer for ustring, tstring
05a80c58 filtcov.tsv to complex_filt_info file. createtsv query query complex_filt_info filtcov.tsv possible
51197117 minor
868bfb1c Update README.md
3ed737c0 Add --tmscore-threshold-mode to allow to switch normalization
552e18dc Fix convertalis and alignment normalization
6b77a4f6 code styles
b40729c1 Fix issue steineggerlab/foldseek#312 alntmscore is now normalized by the backtrace length
0415c37c minor issues
edef0856 two db lines for each interface
9730f059 minor
2639127a Camel
ce4528b1 Use MathUtil squaredist
69d397b2 consistency loss with multithreading solved
a21576a0 mergeable, also only if at least 4 residues
0d82857c mergeable, only if at least 4 residues
aefcffca Addition of interface code
6740f823 merged steineggerlab/foldseek
71b1f38f complex_h
56d3adbd monomer in scoremultimer
aa30bec5 NogridInterface
928984bf Add BFMD database to repository
bde99a74 Add ungappedprefilter to it. profile searches
16dc9150 complex-multimer DBSCAN earlystop with maxClusterNum
04876ca2 Merge commit '97d4c6cfb57bb7f0994015580579f31a18aaf9c5'
97d4c6cf Squashed 'lib/mmseqs/' changes from 804bb2af6d..ffb05619ca
0f6bb3cc Deleted original interfaceLDDT code file
22d24ffc Separated interface retrieving and saving
7b5e7287 Implemented interfaceLDDT but naive
e478a324 Saved aligned coordinates into vector but cannot use SIMD operations
75013627 Sync with master branch
c86d2ce3 solved complex_db_h for monomers
50208e9b Merged commit with review
3d26d2ee commit before pull
27756597 changed order of elements in struct and class for memory
e35f355f setting default parameter collides with existing default values
543db3ad createtsv with --threads 1 to make complex_db_h in order
a8f0a091 The mistake was not a big problem. One stage before putting iLDDT code
3fd0dab7 Corrected targetcomplexid mistake & chain number comparison
6a94924e Corrected mistake: Saved dbKey as target complex id so far..
3123bae1 Removed redundant loops and improved performance
cdf6e786 minor, input->query
e44ea30e Made Complex struct and implementation is in progress
3922544e Inactivated filter-mode param: chainNum & conformation is affected
7e3e4764 Recovery point : saved previous iLDDT implementation
b6943b8e Merge branch 'steineggerlab:master' into master
6494f8a6 minor
bc212bc8 FoldseekBase.cpp update (#306)
3df6bc46 solved everything
7635ea3d minor
a82587c6 minor
c5f59d20 Merge branch 'steineggerlab:master'
ee77f9d7 Merge
4604c238 complex to multimer
25812ffa Try moving to macos 11 in azure pipelines CI
ebfdc666 Revert "Fix GCC 14 warnings"
044806f3 Fix GCC 14 warnings
59d2a253 Fix pymol mmcif files breaking gemmi (upstreamed here project-gemmi/gemmi#325)
e06bc508 octant
1da321c2 not done, but added vector check
e8469df0 Update filtercomplex.cpp
5b10e67f Update filtercomplex.cpp
cb0a43ec minor
b31f2ada reset
8bc07703 reset
cb277387 [MAYBE SOLVED} chainTM
c411e323 DbKey to AlnId/DbId
01a39259 Check if no aligned chain exists
693d723d res.Len seems right
3855d2e8 Look at this. ChainTM goes higher than 1
a8f6588f simple
09b4e410 Solved Multithreading
fe0c9383 [TODO] multithreading segfault
e4abea41 Revert "maybe solved chain TM"
27f9ac86 maybe solved chain TM
0430e9e5 Calculate chain TM everytime
a78bbd5d Set default param as set4final when computing chaintmscore
e333ad48 Made few comments reviewing filtercomplex.cpp
8f2ab715 simplify building complex header
cf28e076 parsing problem solved
55b5338c memcpy error solve
6f3ac2b6 parsing with pdb
d02373d8 parsing
961b8cf0 removing extension
a2ea4743 make filtcov.tsv not db
0bc9d97d minor
f3a9c22b minor
36490d18 minor change
77936ab3 handling monomer & calculate chainTM if complexTM satisfied
a0b426ef Merge branch 'steineggerlab:master' into master
79ad721c Solved weird chain TM-score behavior
81fbfd99 Implemented per-chain-tm but tmscore is suspicious
e44034e6 Implemented realloc function in Coordinate.h
6ef9dc7b modified complex header
94da95c4 complex header make
5e47a2ac still
99251fd6 complexheader, but still issue exists
4e7c3624 Revised code of filter complex
a27efde9 tmthreshold parameter
52f0459a TODO maybe TMthreshold
c1294530 both tm for all cov modes
7367a247 assID, query, target, coverage(1 or 2), tm(1 or 2)
ff5a8e5f filtercomplex tmp coverage.tsv
73c2aa7a Merge branch 'steineggerlab:master' into master
369842e2 Solved argument list too long issue
4329b254 Finalized rmdb
81ac5ef5 Generates comment about rep complex in fastafile
b7a27454 easy-cc description
b25de75f remove tmp files
4ceb9ada Completed to output rep seqs fasta file
560c6e49 temporary Result2repseq
d16ac100 tmp remove
7e5bd089 changed tmp dir
86f5fe9e colsed easycc
412b51d0 header file
fa875fdf making complex header file
fe9865cb small changes
819a75a8 add description
06945aed Parameters
7d34bcbb default parameters
e3defcd4 separated buildCmplDb from filtercomplex
83100b8e Solved complexsearch parameter not applied problem
607c14fc Success command run
9b183905 share status
7e054b6f [DONE] Build successed. [TODO] Default Parameter setting
6ac16224 finally make works
c3a7c959 [TODO] Solve conflicts during make
8dd17b24 Organized shell scripts
a36b8c7e git conflicts
6db40b57 tmp LocalParameters.cpp
dd34d670 still build failed
aebd3fd8 small changes
b9d73150 .cpp files
02a89148 easycc and cc .sh
dbd9b076 To Complexclusterworkflow
f7b9508e Changes
03c635e5 Changed ComplexCluster into FilterComplex
ec234b1d revised parameters for filtercomplex
39b2f062 renamed complexcluster.sh to filtercomplex.sh and finalized
47cfb386 share status
40a0e719 to share status
413faeeb Add filtercomplex parameter for coverage
8667be3d [TODO] Build failed. check localparameters, workflowfiles, etc.
3782b550 Made workflow file
e15a2241 [IN PROGRESS] separated complexcluster and easycomplexcluster but need to organize
84c5279f FoldSeelBase.cpp should be changed though, easy-complexcluster output instruction
017ad0fe Updated LocalParameter files
83a46549 clustered results to flatfiles
38b60958 data/CMakeLists update
46611c48 CMakeLists update
51d29b8c Changed complexcluster.sh to easycomplexcluster.sh
69876616 Merge branch 'master' of https://github.com/rachelse/foldseek
b5c45c37 minor modification
ef00e785 [IN PROGRESS] Draft state complexcluster.sh
5ac175fd erased default -c 0.8
aaf1a6b1 complexclust.sh
e52c527a cleaned code
b7bc37fc -c default 0.8
d81811f9 TODO: select highest aligned alignments among same complex-complex & what if user wants to use -c 0.0?
5adeb999 no errors, not debugged yet
03860d19 has error, but for sharing status. Coverge criteria
47c37e08 Merge branch 'steineggerlab:master' into master
35c5914a Merge branch 'steineggerlab:master' into master
bb7ec93b First version for complex filter

git-subtree-dir: lib/foldseek
git-subtree-split: d2d09b588f50d5f8e2fd7a958377a33b2f725415
gamcil added a commit to steineggerlab/foldmason that referenced this issue Dec 4, 2024
33103374 Add parameter for taxonomy report in easy-search (#389)
3cad1360 Fix cluster reassign + tm-align #383
b43e63d7 Rework residue mapping to combine most gemmi AAs with prev FS AAs #387
8c3e3938 Merge pull request #385 from rachelse/master
7a3a9db3 Merge branch 'steineggerlab:master' into master
214886bb Update citations
d295eec6 Replace rust with rustup in github CI
8485aaf9 Switch mac arm test to github hosted runner
b011b8e4 lddt works when chaintm 0
d6056322 Fix convertalis for FS multimer
d2d09b58 Merge pull request #330 from stromjm/interface_v2.0
4e514a2a Merge branch 'steineggerlab:master' into interface_v2.0
8daecacc except overlap
e1f38a1e Merge pull request #362 from rachelse/master
6b00c5d9 Merge pull request #366 from rachelse/steineggerlab
c18727e4 Deleted search-clust pipeline from README
3d85d5c4 minor
d692f966 there could exist no match against itself
e1f238df error
a078fb9f typo in readme
24c8c92b treat monomer as singleton when scoremultimer uses --monomer-include-mode 1
1d5f9369 fix interface extraction exceptions
c27a629a single chain alignment bug fixed
19c8820c Merge pull request #359 from steineggerlab/multimer
079a5a13 monomer related update done
06275df3 rollback to 43fd26f3d3e043c8f9fd4c2b193a8b68f8781689
4046f00f test rbh filter off
13e9883c Merge branch 'multimer' of https://github.com/steineggerlab/foldseek into multimer
2dadffc0 test rbh filter off
83cc643b update single chain cluster
a17598cd build exceptions for interface mode
43fd26f3 remove tmscore threshold
704c3a82 fix chain cov ratio
1800e6a9 update for single chained alignments
cd26d54c implement complex-tm-threshold
0b1fa423 typo
c74a1a5f replace singlechain mode into  monomer mode
34104148 merged steineggerlab/foldseek
d414d908 merged steineggerlab/foldseek
232a4c43 Merge pull request #353 from rachelse/master
19595fd5 order in LocalParameters
c5412cf7 Merge remote-tracking branch 'upstream/master' into interface_v2.0
c1a6b76c merged steineggerlab/foldseek
88093635 Merge pull request #354 from steineggerlab/test
d267b3d8 bug fixed
7f2c6219 bug fix try4
abff375e bug fix try3
25e9629c bug fix try2
d9b2913b bug fix try1
8711f6b7 minor things
48666e18 typo in Readme
154019b6 Merge remote-tracking branch 'upstream/master'
a2ec51d4 fix parameter explain
306ffb82 add parameter for single chained assignments
b7c58acb Update regression
2c6b809d Update regression and convertalis ALNTMSCORE score
7e6be60a Revise alignment TMscore computation
ab20120e scoremultimer
319144b4 minor
7112fccb minor
e9b0f234 check alinged chain num when interfacelddt
2256e219 Merge pull request #345 from steineggerlab/foldseek_multiple_with_singlechain
9a2b7da4 bug fixed: single elemented vector for single chain alignment
d99d79c1 single chain allowing multimersearch
52029c06 Added BFVD as a foldseek database (#344)
af1e86e5 default parameters
e1d15f64 alnLen seems much better
5b76247b minor
ac0a32b1 minor
06016bdd minor
fd87b160 added parameters in example in Readme
17402748 added coverage in Readme
f1fc6a96 minor
1b39c482 big complex first to prevent big ones to be left and run alone for few hours using 1 thread
08b7e9ce previous version is twice faster
de945b2b simd returns segfault
602ff37b Merge branch ‘steineggerlab:master’ into master
be9fc339 Merge pull request #343 from steineggerlab/foldseek_multimer_bottleneck
3bcdabae checkChainRedundancy with unordered_sets
5a4ad0be foundNeighbors as an unordered_set
b15e236a foldseek-multimer bottleneck solved
b947f688 implement foundNeighbors
89f371bb DBSCAN as non-recursive function
9f603b29 check new scoremultimer
4f70b3f4 Readme
50c1df1b Readme
4f1592a4 Readme
c3093cd3 outputs filtcov too
420038de chaging readme
3fd78777 chaging readme
aa2ced33 all_seqs.fasta not working
d1605fe7 SIMD for tmscore
72f5028c map to vector, complex to multimer, [TODO] check if speed improved
63bca7b2 remove distMap
73e41342 seq3di.clear() in GemmiWrapper
2758b96e Merge branch 'master' into interface_v2.0
9d74a1bf orders of Parameters
94874214 make mergeable
2b731d3c Updated regression
02fb1e58 Updated submodule
17986f4c code styles
6db582ea made filtermultimer to get one argument as output. 'output' and 'outout_info' will be the actual outputs
a6bee293 important issue solved, thread_idx while writing
4bbdca4f minor
aeacb68b changed way to buffer for ustring, tstring
05a80c58 filtcov.tsv to complex_filt_info file. createtsv query query complex_filt_info filtcov.tsv possible
51197117 minor
868bfb1c Update README.md
3ed737c0 Add --tmscore-threshold-mode to allow to switch normalization
552e18dc Fix convertalis and alignment normalization
6b77a4f6 code styles
b40729c1 Fix issue steineggerlab/foldseek#312 alntmscore is now normalized by the backtrace length
0415c37c minor issues
edef0856 two db lines for each interface
9730f059 minor
2639127a Camel
ce4528b1 Use MathUtil squaredist
69d397b2 consistency loss with multithreading solved
a21576a0 mergeable, also only if at least 4 residues
0d82857c mergeable, only if at least 4 residues
aefcffca Addition of interface code
6740f823 merged steineggerlab/foldseek
71b1f38f complex_h
56d3adbd monomer in scoremultimer
aa30bec5 NogridInterface
928984bf Add BFMD database to repository
bde99a74 Add ungappedprefilter to it. profile searches
16dc9150 complex-multimer DBSCAN earlystop with maxClusterNum
04876ca2 Merge commit '97d4c6cfb57bb7f0994015580579f31a18aaf9c5'
97d4c6cf Squashed 'lib/mmseqs/' changes from 804bb2af6d..ffb05619ca
0f6bb3cc Deleted original interfaceLDDT code file
22d24ffc Separated interface retrieving and saving
7b5e7287 Implemented interfaceLDDT but naive
e478a324 Saved aligned coordinates into vector but cannot use SIMD operations
75013627 Sync with master branch
c86d2ce3 solved complex_db_h for monomers
50208e9b Merged commit with review
3d26d2ee commit before pull
27756597 changed order of elements in struct and class for memory
e35f355f setting default parameter collides with existing default values
543db3ad createtsv with --threads 1 to make complex_db_h in order
a8f0a091 The mistake was not a big problem. One stage before putting iLDDT code
3fd0dab7 Corrected targetcomplexid mistake & chain number comparison
6a94924e Corrected mistake: Saved dbKey as target complex id so far..
3123bae1 Removed redundant loops and improved performance
cdf6e786 minor, input->query
e44ea30e Made Complex struct and implementation is in progress
3922544e Inactivated filter-mode param: chainNum & conformation is affected
7e3e4764 Recovery point : saved previous iLDDT implementation
b6943b8e Merge branch 'steineggerlab:master' into master
6494f8a6 minor
bc212bc8 FoldseekBase.cpp update (#306)
3df6bc46 solved everything
7635ea3d minor
a82587c6 minor
c5f59d20 Merge branch 'steineggerlab:master'
ee77f9d7 Merge
4604c238 complex to multimer
25812ffa Try moving to macos 11 in azure pipelines CI
ebfdc666 Revert "Fix GCC 14 warnings"
044806f3 Fix GCC 14 warnings
59d2a253 Fix pymol mmcif files breaking gemmi (upstreamed here project-gemmi/gemmi#325)
e06bc508 octant
1da321c2 not done, but added vector check
e8469df0 Update filtercomplex.cpp
5b10e67f Update filtercomplex.cpp
cb0a43ec minor
b31f2ada reset
8bc07703 reset
cb277387 [MAYBE SOLVED} chainTM
c411e323 DbKey to AlnId/DbId
01a39259 Check if no aligned chain exists
693d723d res.Len seems right
3855d2e8 Look at this. ChainTM goes higher than 1
a8f6588f simple
09b4e410 Solved Multithreading
fe0c9383 [TODO] multithreading segfault
e4abea41 Revert "maybe solved chain TM"
27f9ac86 maybe solved chain TM
0430e9e5 Calculate chain TM everytime
a78bbd5d Set default param as set4final when computing chaintmscore
e333ad48 Made few comments reviewing filtercomplex.cpp
8f2ab715 simplify building complex header
cf28e076 parsing problem solved
55b5338c memcpy error solve
6f3ac2b6 parsing with pdb
d02373d8 parsing
961b8cf0 removing extension
a2ea4743 make filtcov.tsv not db
0bc9d97d minor
f3a9c22b minor
36490d18 minor change
77936ab3 handling monomer & calculate chainTM if complexTM satisfied
a0b426ef Merge branch 'steineggerlab:master' into master
79ad721c Solved weird chain TM-score behavior
81fbfd99 Implemented per-chain-tm but tmscore is suspicious
e44034e6 Implemented realloc function in Coordinate.h
6ef9dc7b modified complex header
94da95c4 complex header make
5e47a2ac still
99251fd6 complexheader, but still issue exists
4e7c3624 Revised code of filter complex
a27efde9 tmthreshold parameter
52f0459a TODO maybe TMthreshold
c1294530 both tm for all cov modes
7367a247 assID, query, target, coverage(1 or 2), tm(1 or 2)
ff5a8e5f filtercomplex tmp coverage.tsv
73c2aa7a Merge branch 'steineggerlab:master' into master
369842e2 Solved argument list too long issue
4329b254 Finalized rmdb
81ac5ef5 Generates comment about rep complex in fastafile
b7a27454 easy-cc description
b25de75f remove tmp files
4ceb9ada Completed to output rep seqs fasta file
560c6e49 temporary Result2repseq
d16ac100 tmp remove
7e5bd089 changed tmp dir
86f5fe9e colsed easycc
412b51d0 header file
fa875fdf making complex header file
fe9865cb small changes
819a75a8 add description
06945aed Parameters
7d34bcbb default parameters
e3defcd4 separated buildCmplDb from filtercomplex
83100b8e Solved complexsearch parameter not applied problem
607c14fc Success command run
9b183905 share status
7e054b6f [DONE] Build successed. [TODO] Default Parameter setting
6ac16224 finally make works
c3a7c959 [TODO] Solve conflicts during make
8dd17b24 Organized shell scripts
a36b8c7e git conflicts
6db40b57 tmp LocalParameters.cpp
dd34d670 still build failed
aebd3fd8 small changes
b9d73150 .cpp files
02a89148 easycc and cc .sh
dbd9b076 To Complexclusterworkflow
f7b9508e Changes
03c635e5 Changed ComplexCluster into FilterComplex
ec234b1d revised parameters for filtercomplex
39b2f062 renamed complexcluster.sh to filtercomplex.sh and finalized
47cfb386 share status
40a0e719 to share status
413faeeb Add filtercomplex parameter for coverage
8667be3d [TODO] Build failed. check localparameters, workflowfiles, etc.
3782b550 Made workflow file
e15a2241 [IN PROGRESS] separated complexcluster and easycomplexcluster but need to organize
84c5279f FoldSeelBase.cpp should be changed though, easy-complexcluster output instruction
017ad0fe Updated LocalParameter files
83a46549 clustered results to flatfiles
38b60958 data/CMakeLists update
46611c48 CMakeLists update
51d29b8c Changed complexcluster.sh to easycomplexcluster.sh
69876616 Merge branch 'master' of https://github.com/rachelse/foldseek
b5c45c37 minor modification
ef00e785 [IN PROGRESS] Draft state complexcluster.sh
5ac175fd erased default -c 0.8
aaf1a6b1 complexclust.sh
e52c527a cleaned code
b7bc37fc -c default 0.8
d81811f9 TODO: select highest aligned alignments among same complex-complex & what if user wants to use -c 0.0?
5adeb999 no errors, not debugged yet
03860d19 has error, but for sharing status. Coverge criteria
47c37e08 Merge branch 'steineggerlab:master' into master
35c5914a Merge branch 'steineggerlab:master' into master
bb7ec93b First version for complex filter

git-subtree-dir: lib/foldseek
git-subtree-split: 3310337471fc46880c245508af6a23adcb192cee
@martin-steinegger
Copy link
Collaborator

We will update conda soon. Also the static compiled binaries should work.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants