Attribution map improvements (#75) #78

jannisborn · 2022-02-19T18:40:48Z

Context:
Related to the issue #48, solved by @whitead in #75, I am addressing minor inconsistencies that I detailed here. PR ready for review.

Content:
Adjusting how the index is build in the attribution maps.

In the encoder attributions, we now start counting from 0, just like for the decoder.
Encoder does not skip anymore over bond tokens

Example:

smiles = 'C=CO'
selfie, attribution = encoder(smiles, attribute=True)
print(selfie)
for a in attribution:
    print(a)

Old result:

[C][=C][O]
('[C]', [(1, 'C')])
('[=C]', [(2, 'C')])
('[O]', [(3, 'O')])

Behavior new:

[C][=C][O]
('[C]', [(0, 'C')])
('[=C]', [(2, 'C')])
('[O]', [(3, 'O')])

Closes #48

whitead · 2022-02-19T18:47:07Z

Thanks for catching this! Maybe you could add a short unit test based on your example here so that we can keep track of this behavior?

jannisborn · 2022-02-19T19:19:07Z

Thanks for the quick feedback @whitead. I expanded your test to guarantee the indices are correct 👍🏼

On a separate note, I was a bit surprised about your example from the unittest: The order of the selfies tokens in the attribution map does not correspond to their oder in the returned string. This does not seem right to me. I verified with a checkout to an older commit, this behavior is not related to my PR (the indices here correspond to my latest changes, but the ordering is identical in both cases). Any thoughts about this?

smiles:  C1([O-])C=CC=C1Cl
selfies:  [C][Branch1][C][O-1][C][=C][C][=C][Ring1][=Branch1][Cl]
('[C]', [(0, 'C')])
('[O-1]', [(3, '[O-]')])
('[Branch1]', [(3, '[O-]')])
('[C]', [(3, '[O-]')])
('[C]', [(5, 'C')])
('[=C]', [(7, 'C')])
('[C]', [(8, 'C')])
('[=C]', [(10, 'C')])
('[Ring1]', None)
('[=Branch1]', None)
('[Cl]', [(12, 'Cl')])

I used this code:

import selfies as sf

smiles = "C1([O-])C=CC=C1Cl"
s, am = sf.encoder(smiles, attribute=True)
print('smiles: ', smiles)
print('selfies: ', s)
for a in am:
    print(a)

whitead · 2022-02-22T03:10:10Z

@jannisborn Thanks - That is intended behavior - encoder attribution is ordered by input SMILES token. I can see now that would be relatively useless though since it's non-trivial to align them. I'll open an issue to add indices on SELFIES tokens and/or sort them.

fix: bugfixes for #75, closes #48

608c048

test: expanded unittest with indices

d65d99a

chore: fix CI pipeline

9d3cce7

MarioKrenn6240 merged commit fdb1789 into aspuru-guzik-group:master Feb 21, 2022

whitead mentioned this pull request Feb 22, 2022

Attribution Map Encoder Ordering #79

Closed

whitead mentioned this pull request May 7, 2022

Improvements to attribution #84

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Attribution map improvements (#75) #78

Attribution map improvements (#75) #78

jannisborn commented Feb 19, 2022 •

edited

Loading

whitead commented Feb 19, 2022

jannisborn commented Feb 19, 2022

whitead commented Feb 22, 2022

Attribution map improvements (#75) #78

Attribution map improvements (#75) #78

Conversation

jannisborn commented Feb 19, 2022 • edited Loading

whitead commented Feb 19, 2022

jannisborn commented Feb 19, 2022

whitead commented Feb 22, 2022

jannisborn commented Feb 19, 2022 •

edited

Loading