Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Polymer builder overhaul #34

Merged
merged 78 commits into from
Dec 12, 2024
Merged
Changes from 1 commit
Commits
Show all changes
78 commits
Select commit Hold shift + click to select a range
1adc0b4
Added typehints for SMILES and SMARTS strings
timbernat Dec 4, 2024
5919913
Exposed Smiles and Smarts typehints at subpackage level
timbernat Dec 4, 2024
d5cd8fc
Added function for uniquifying strings (which can preserve character …
timbernat Dec 4, 2024
9a6278e
Updated DOP calculation to check for and yield correct number of mono…
timbernat Dec 4, 2024
c18d394
Established placeholder file + sample fragments for polymer building …
timbernat Dec 4, 2024
4cc360d
Added internal used-monomer-only MonomerGroup which improves accuracy…
timbernat Dec 4, 2024
173990c
Deprecated DOP alias for n_monomers property
timbernat Dec 4, 2024
8eb7267
Expunged all references to "DOP" in favor of clearer terminology
timbernat Dec 4, 2024
88f8b6b
Deprecated filter_text_by_condition()
timbernat Dec 4, 2024
98cec89
Renamed textual.strsearch to textual.substrings, updated docstring
timbernat Dec 4, 2024
de314e7
Implemented function for repeating a string a (possibly fractional) n…
timbernat Dec 4, 2024
3685b39
Added argument for indicating separator between string repeats
timbernat Dec 4, 2024
497791e
Renamed uniquify_str() to unique_string()
timbernat Dec 4, 2024
3c7448e
Renamed "join_indicator" to "joiner" for brevity
timbernat Dec 4, 2024
13c9db7
Fixed bug with parenthesization vs tuplification
timbernat Dec 4, 2024
d3c8be7
Wrote unit tests for textual.substrings
timbernat Dec 4, 2024
487ae6a
Delayed monomer linearity check to only be on the monomer fragments s…
timbernat Dec 4, 2024
ddc20bc
Added range and int typing checks to target_length
timbernat Dec 5, 2024
8714eb8
Added option to register residue names when converting a spec SMARTS …
timbernat Dec 5, 2024
0fe10e9
Renamed "resname_repl" to "resname_map" throughout
timbernat Dec 5, 2024
0bd4355
Implemented mBuild Compound to RDKit converter which preserves confor…
timbernat Dec 5, 2024
d947336
Deprecated irrelevant custom Exceptions, pared down use of "Error" su…
timbernat Dec 5, 2024
7d4b5a1
Implemented support for fractional sequence repeats, with informative…
timbernat Dec 5, 2024
3797c64
Added new custom Exception for end-group dominated chains
timbernat Dec 5, 2024
8c96e48
Separated procrustean sequence determination into dedicated helper fu…
timbernat Dec 5, 2024
f7422bb
Switched order of residue name and head/tail identifier in MonomerGro…
timbernat Dec 5, 2024
e2be34d
Added __post_init__ check for listification of bare SMARTS and for SM…
timbernat Dec 5, 2024
56b4144
Added module-level logger
timbernat Dec 5, 2024
d852ed3
Added internal method for producing end groups for linear polymer bui…
timbernat Dec 5, 2024
9903dd1
Changed MonomerGroup.linear_end_groups from property to vanilla metho…
timbernat Dec 5, 2024
70ee787
Deprecated _has_valid_linear_term_orient, included residue name in li…
timbernat Dec 6, 2024
df12d48
Deferred end group determination to internal implemenation in Monomer…
timbernat Dec 6, 2024
f1e6925
Enhanced logging of sequence breakdown, unified logging between whole…
timbernat Dec 6, 2024
dc053af
Added custom Exception for missing package dependency which reduces e…
timbernat Dec 6, 2024
d4f6361
Deleted superfluous imports
timbernat Dec 6, 2024
88182e7
Converted polymers.building into a package, split up functionality am…
timbernat Dec 6, 2024
c66a3f2
Fixed missing "raise" keywords and incorrect package checks
timbernat Dec 6, 2024
e696828
Fiddled with MissingPrerequisitePackage error message format
timbernat Dec 6, 2024
31f0675
Added Exception for unexpectedly-empty copolymer sequences
timbernat Dec 6, 2024
79296ae
Added precheck for empty sequence kernel
timbernat Dec 9, 2024
cb2587a
Expanded PROCRUSTEAN sequencing algorithm into dedicated dataclass
timbernat Dec 9, 2024
742522e
Made LinearCopolymerSequencer serializable to/from JSON
timbernat Dec 9, 2024
013e09b
Added RDKit-driven PDB writer for mbuild Compounds
timbernat Dec 9, 2024
fb3d898
Expanded out unit test modules for .polymers.building
timbernat Dec 9, 2024
677fede
Updated description of the "PROCRUSTEAN" acronym
timbernat Dec 9, 2024
870a1eb
Wrote unit tests fo copolymer sequencing
timbernat Dec 9, 2024
ce9c55c
Updates SMILES/SMARTS-related type annotations on validation functions
timbernat Dec 9, 2024
75fccee
Moved fragment data directly into code, as opposed to maintaining sep…
timbernat Dec 9, 2024
59f988a
Removed superfluous mBuild imports
timbernat Dec 9, 2024
29d3198
Added devnote to revisit SMARTS-specification auto-cleaning
timbernat Dec 9, 2024
7afceed
Added devnote for spec compliance checker
timbernat Dec 9, 2024
d6d5880
Added MPD-TMC polyamide fragments for examples
timbernat Dec 9, 2024
0e7749f
Added unit tests for MonomerGroup initialization and core properties
timbernat Dec 10, 2024
e33ca3d
Added polyethylene example to test when fewer than the max 2 end grou…
timbernat Dec 10, 2024
0074e4a
Wrote unit test for end group identification
timbernat Dec 10, 2024
f27263f
Expanded syntax and support for addition/validation of new monomer SM…
timbernat Dec 10, 2024
10b8b8c
Added bug note for validation skipping when accessing monomer attribu…
timbernat Dec 10, 2024
30642dc
Added test for degenerate eng group autoassignment (i.e. when NO term…
timbernat Dec 10, 2024
bda08d8
Attempted (unsuccessfully) to get __hash__ working for MonomerGroup
timbernat Dec 10, 2024
32f8d53
Wrote unit tests for linear polymer builder
timbernat Dec 10, 2024
4e31488
Fixed indent on openff_topology_to_openmm() arguments
timbernat Dec 10, 2024
051d128
Removed deprecated local TKREGS import
timbernat Dec 10, 2024
e0cdfc5
Moved unitsys outside of omminter to resolve circular import
timbernat Dec 10, 2024
f416a32
Moved sample monomer fragment sets from unit tests to polymerist proper
timbernat Dec 11, 2024
a770906
Corrected typo in end group autogen warning
timbernat Dec 11, 2024
c991c17
Fixed indent on serialize_openmm_pdb() arguments
timbernat Dec 11, 2024
040a718
Fixed accidental duplication of 3-functional TMC monomer fragment
timbernat Dec 11, 2024
bbd1b85
Added new subpackage for molecule file I/O
timbernat Dec 11, 2024
34d1a7b
Froze SerialAtomLabeller dataclass to avoid unintentional label forma…
timbernat Dec 11, 2024
d9afdc4
Switched PDB atom labelled to dependency-injection based model
timbernat Dec 11, 2024
668c3f9
Renamed "chain" to "polymer" where it occurs to avoid confusion with …
timbernat Dec 11, 2024
a5cc442
Added placeholder unit tests for newly-created `molfiles` subpackage
timbernat Dec 11, 2024
4d128e3
Added residue info injection into mbmol_to_openmm_pdb (PDB outputs ar…
timbernat Dec 11, 2024
f1f8039
Renamed "atom_label_size" to "atom_label_length" for clarity
timbernat Dec 12, 2024
370dc3a
Renamed once more to atom_label_width
timbernat Dec 12, 2024
c61f16a
Fixed non-attribute value in atom_label_width Exception message
timbernat Dec 12, 2024
e40600e
Added string type check for atom element symbols
timbernat Dec 12, 2024
5c282d7
Wrote unit tests for molfiles.pdb.SerialAtomLabeller
timbernat Dec 12, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Prev Previous commit
Next Next commit
Added internal used-monomer-only MonomerGroup which improves accuracy…
… of n_atoms estimate
  • Loading branch information
timbernat committed Dec 4, 2024
commit 4cc360d179bd260c836c68709f70c515fbcf5c19
13 changes: 9 additions & 4 deletions polymerist/polymers/building.py
Original file line number Diff line number Diff line change
@@ -94,7 +94,10 @@ def build_linear_polymer(monomers : MonomerGroup, DOP : int, sequence : str='A',
# ...for example, could allow 5/2 * 'BACA' to be interpreted as 'BACA|BACA|BA'; 5/3 * 'BACA' would still be invalid though
LOGGER.info(f'Target chain length achievable with {n_seq_reps} block sequence repeat(s) ({n_seq_reps}*{block_size} [{sequence}] middle monomers + {n_terminal} terminal monomers = {DOP} total monomers)')

# 2) ADD MIDDLE MONOMERS TO CHAIN
# 2) REGISTERING MONOMERS TO BE USED FOR CHAIN ASSEMBLY
monomers_used = MonomerGroup() # used to track and estimate sized of the monomers being used

## 2A) ADD MIDDLE MONOMERS TO CHAIN
chain = MBPolymer()
for (resname, middle_monomer), sequence_key in zip(
monomers.iter_rdmols(term_only=False),
@@ -103,8 +106,9 @@ def build_linear_polymer(monomers : MonomerGroup, DOP : int, sequence : str='A',
LOGGER.info(f'Registering middle monomer {resname} (block identifier "{sequence_key}")')
mb_monomer, linker_ids = mbmol_from_mono_rdmol(middle_monomer)
chain.add_monomer(compound=mb_monomer, indices=linker_ids)
monomers_used.monomers[resname] = monomers.monomers[resname]

# 3) ADD TERMINAL MONOMERS TO CHAIN
## 2B) ADD TERMINAL MONOMERS TO CHAIN
term_iters = { # need to convert to iterators to allow for generator-like advancement (required for term group selection to behave as expected)
resname : iter(rdmol_list) # made necessary by annoying list-bound structure of current substructure spec
for resname, rdmol_list in monomers.rdmols(term_only=True).items()
@@ -114,9 +118,10 @@ def build_linear_polymer(monomers : MonomerGroup, DOP : int, sequence : str='A',
term_monomer = next(term_iters[resname])
mb_monomer, linker_ids = mbmol_from_mono_rdmol(term_monomer)
chain.add_end_groups(compound=mb_monomer, index=linker_ids.pop(), label=head_or_tail, duplicate=False) # use single linker ID and provided head-tail orientation
monomers_used.monomers[resname] = monomers.monomers[resname]

# 4) ASSEMBLE AND RETURN CHAIN
n_atoms_est = estimate_chain_len_linear(monomers, DOP) # TODO: create new MonomerGroup with ONLY the registered monomers to guarantee accuracy
# 3) ASSEMBLE AND RETURN CHAIN
n_atoms_est = estimate_chain_len_linear(monomers_used, DOP) # TODO: create new MonomerGroup with ONLY the registered monomers to guarantee accuracy
LOGGER.info(f'Assembling linear {DOP}-mer chain (estimated {n_atoms_est} atoms)')
chain.build(n_seq_reps, sequence=sequence, add_hydrogens=add_Hs) # "-2" is to account for term groups (in mbuild, "n" is the number of times to replicate just the middle monomers)
for atom in chain.particles():