Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merge branch "feature cleanup" #38

Merged
merged 20 commits into from
Jan 17, 2025
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
20 commits
Select commit Hold shift + click to select a range
f8f7fe1
Added convenience method for fitting strings into a given number of c…
timbernat Jan 6, 2025
05e8cda
Downgraded logging levels on monomer registration and SMARTS cleanup …
timbernat Jan 10, 2025
81d8079
Added functions for enabling and disabling Kekulized molecule drawing
timbernat Jan 10, 2025
f42489b
Made rdkdraw setting accessible at rdutils toplevel
timbernat Jan 10, 2025
8e75774
Added flag to disable kekulization when expanding SMILES
timbernat Jan 10, 2025
5e0b1e7
Overhauled chemdbqueries to implement Strategy pattern with robust lo…
timbernat Jan 14, 2025
ae3a7a0
Added registration of valid namespaces for CIR and PubChem queries (N…
timbernat Jan 14, 2025
eae0080
Corrected typo in PubChemQueryStrategy ("namepaces" -> "namespaces")
timbernat Jan 14, 2025
365d61a
Made total chemical query failure Exceptions more specific than Value…
timbernat Jan 14, 2025
a6625aa
Expanded valid Exception values passed to requires_modules()
timbernat Jan 14, 2025
a465a79
Wrapped cirpy-pubchempy-specific methods with requires_modules() deco…
timbernat Jan 14, 2025
b11564b
Added methods to check if chemical database services are online, chan…
timbernat Jan 16, 2025
adefe8e
Added syntax highlighting to code blocks
timbernat Jan 16, 2025
f54615c
Added classmethod for communicating required dependencies to other co…
timbernat Jan 17, 2025
51fa8e3
Decoupled online check from dependencies in database service query im…
timbernat Jan 17, 2025
b9d2610
Added InChI/InChI key aliases to list of queryable properties from NI…
timbernat Jan 17, 2025
ebd6411
Implemented explicit NoneType conversion for empty containers returne…
timbernat Jan 17, 2025
f31f6c7
Wrote service-specific functionali tests for chemdbqueries
timbernat Jan 17, 2025
63aaf18
Added tests for get_chemical_properties() wrapper
timbernat Jan 17, 2025
2961906
Expanded sets of queryable chemical properties (taken from webchem r…
timbernat Jan 17, 2025
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
16 changes: 8 additions & 8 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -38,7 +38,7 @@ Once you have a package manager installed, you may proceed with one of the provi
A fully-featured install in a safe virtual environment (named "polymerist-env", here) can be obtained by running the following terminal commands:

#### Mamba install (basic)
```sh
```bash
mamba create -n polymerist-env python=3.11
mamba activate polymerist-env
pip install polymerist
Expand All @@ -47,7 +47,7 @@ mamba install -c conda-forge openff-toolkit mbuild openbabel

#### Mamba install (extended)
An extended install with [Jupyter Notebook](https://jupyter.org/) support, molecular visualization capability, and chemical data querying capability can be obtained very similarly:
```sh
```bash
mamba create -n polymerist-env python=3.11
mamba activate polymerist-env
pip install polymerist[interactive,chemdb]
Expand All @@ -56,7 +56,7 @@ mamba install -c conda-forge openff-toolkit mbuild openbabel

#### Conda install (not recommended)
Equivalent commands using `conda` (in case `mamba` has not been installed or the user is too stubborn to use it) are given below. These will perform the same installation, just much more slowly:
```sh
```bash
conda create -n polymerist-env python=3.11
conda activate polymerist-env
pip install polymerist[interactive,chemdb]
Expand All @@ -67,7 +67,7 @@ In either case, the final [openff-toolkit](https://github.com/openforcefield/ope

#### 1.1) Testing installation
To see if the installation was successful, one can run the following short set of commands which should yield the outputs shown:
```sh
```python
mamba activate polymerist-env; python
>>> import polymerist as ps
>>> print(ps.pascal(5))
Expand All @@ -82,7 +82,7 @@ mamba activate polymerist-env; python
Assigning atomic partial charges using some flavor of [AM1-BCC](https://docs.eyesopen.com/toolkits/python/quacpactk/molchargetheory.html#am1bcc-charges) with `polymerist` also requires installation of some supplementary toolkits.

One can mix-and-match installing any combination of the toolkits below to taste or (if impatient or indifferent) opt for a "shotgun" approach and install all 3 with the following commands:
```sh
```bash
mamba activate polymerist-env
mamba install -c openeye openeye-toolkits
mamba install -c conda-forge espaloma_charge openff-nagl
Expand All @@ -102,7 +102,7 @@ This is an OpenFF-specific GNN based on similar architecture to Espaloma with a
Polymerist can also be installed directly from the source code in this repository. To install, execute the following set of terminal commands in whichever directory you'd like the installation to live on your local machine:

#### Mamba install (source)
```sh
```bash
git clone https://github.com/timbernat/polymerist
cd polymerist
mamba env create -n polymerist-env -f devtools/conda-envs/release-build.yml
Expand All @@ -111,7 +111,7 @@ pip install .
```

#### Conda install (source, not recommended)
```sh
```bash
git clone https://github.com/timbernat/polymerist
cd polymerist
conda env create -n polymerist-env -f devtools/conda-envs/release-build.yml
Expand All @@ -122,7 +122,7 @@ Once the source install is complete, you no longer need the clone of the polymer

### Developer installation (for advanced users only)
Those developing for `polymerist` may like to have an editable local installation, in which they can make changes to the source code and test behavior changes in real-time. In this case, one requires an "editable build" which mirrors the source files that live in the site_packages directory of the created environment. This type of installation proceeds as follows:
```sh
```bash
git clone https://github.com/timbernat/polymerist
cd polymerist
mamba env create -n polymerist-dev -f devtools/conda-envs/dev-build.yml
Expand Down
23 changes: 19 additions & 4 deletions polymerist/genutils/importutils/dependencies.py
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,7 @@
__author__ = 'Timotej Bernat'
__email__ = 'timotej.bernat@colorado.edu'

from typing import Callable, Optional, ParamSpec, TypeVar
from typing import Callable, Optional, ParamSpec, TypeVar, Union

Params = ParamSpec('Params')
ReturnType = TypeVar('ReturnType')
Expand Down Expand Up @@ -58,7 +58,7 @@ def module_installed(module_name : str) -> bool:

try: # NOTE: opted for this implementation, as it never actually imports the package in question (faster and fewer side-effects)
return find_spec(module_name) is not None
except (ValueError, AttributeError, ModuleNotFoundError): # these could all be raised by
except (ValueError, AttributeError, ModuleNotFoundError): # these could all be raised by a missing module
return False

def modules_installed(*module_names : list[str]) -> bool:
Expand All @@ -80,7 +80,7 @@ def modules_installed(*module_names : list[str]) -> bool:

def requires_modules(
*required_module_names : list[str],
missing_module_error : type[Exception]=ImportError,
missing_module_error : Union[Exception, type[Exception]]=ImportError,
) -> Callable[[TCall[..., ReturnType]], TCall[..., ReturnType]]:
'''
Decorator which enforces optional module dependencies prior to function execution
Expand All @@ -99,12 +99,27 @@ def requires_modules(
Raised if any of the specified packages is not found to be installed
Exception message will indicate the name of the specific package found missing
'''
# meta-check to ensure type of raised Exception is valid
if not isinstance(missing_module_error, Exception):
if not (isinstance(missing_module_error, type) and issubclass(missing_module_error, Exception)):
# DEV: this is potentially brittle, depending on how the specific Exception subtype is implemented?
raise TypeError('Must pass either Exception instance or subtype to "missing_module_error')

def tailored_exception(module_name : str) -> Exception:
'''Accessory function to generate targetted Exceptions based on the provided
mssing_module_error value and the name of a module with no found installation'''
if isinstance(missing_module_error, Exception):
return missing_module_error

if isinstance(missing_module_error, type):
return missing_module_error(f'No installation found for module "{module_name}"')

def decorator(func) -> TCall[..., ReturnType]:
@wraps(func)
def req_wrapper(*args : Params.args, **kwargs : Params.kwargs) -> ReturnType:
for module_name in required_module_names:
if not module_installed(module_name):
raise missing_module_error(f'No installation found for module "{module_name}"')
raise tailored_exception(module_name)
else:
return func(*args, **kwargs)

Expand Down
46 changes: 46 additions & 0 deletions polymerist/genutils/textual/prettyprint.py
Original file line number Diff line number Diff line change
Expand Up @@ -4,8 +4,54 @@
__email__ = 'timotej.bernat@colorado.edu'

from typing import Any

from textwrap import indent
from enum import StrEnum


class Justification(StrEnum):
'''For specifying string justification'''
LEFT = '<'
CENTER = '^'
RIGHT = '>'
Just = Justification # alias for the lazy or hurried

def procrustean_string(
string : str,
length : int,
padding : str=' ',
just : Justification=Justification.LEFT,
) -> int:
'''Takes a string and a target length and returns a new string which begins
with the same characters as the original string but is clamped to the target length,
truncating or padding if the original string is too long or short, respectively
Parameters
----------
string : str
The string to stretch or cut
length : int
The target number of characters in the final string
padding : str, default=" "
A single character which shold be used as padding
when strings are too short, by default just a space
MUST BE EXACTLY ONE CHARACTER!
just : Justification, default=Justification.LEFT
Enum specifier of how to justify a padded string
Options are Justification.LEFT, Justification.CENTER, or Justification.RIGHT
Returns
-------
fmt_str : str
A string which begins with the same characters as "string" but has
precisely the specified length, with specified padding as specified
'''
if not (isinstance(length, int) and (length >= 0)):
raise ValueError(f'Target string length must be a non-negative integer (not {length})')
if not len(padding) == 1:
raise IndexError(f'Padding string must contain exactly one character (passed "{padding}")')

return f'{string[:length]:{padding}{just.value}{length}}'

def dict_to_indented_str(dict_to_stringify : dict[Any, Any], level_delimiter : str='\t', line_sep : str='\n') -> str:
'''Generate a pretty-printable string from a (possibly nested) dictionary,
Expand Down
4 changes: 2 additions & 2 deletions polymerist/polymers/monomers/repr.py
Original file line number Diff line number Diff line change
Expand Up @@ -50,10 +50,10 @@ def _add_monomer(self, resname : str, smarts : Smarts) -> None:
if resname in self.monomers:
existing_resgroup = self.monomers[resname]
if isinstance(existing_resgroup, list) and (smarts not in existing_resgroup):
LOGGER.info(f'Extending existing residue category "{resname}" with SMARTS {smarts}')
LOGGER.debug(f'Extending existing residue category "{resname}" with SMARTS {smarts}')
self.monomers[resname].append(smarts)
else:
LOGGER.info(f'Creating new residue category "{resname}", containing singular SMARTS ["{smarts}"])')
LOGGER.debug(f'Creating new residue category "{resname}", containing singular SMARTS ["{smarts}"])')
self.monomers[resname] = [smarts]

def _add_monomers(self, resname : str, smarts_container : Iterable[Smarts]) -> None:
Expand Down
32 changes: 24 additions & 8 deletions polymerist/polymers/monomers/specification.py
Original file line number Diff line number Diff line change
Expand Up @@ -19,19 +19,28 @@
# CHEMICAL INFO SPECIFICATION
SANITIZE_AS_KEKULE = (Chem.SANITIZE_ALL & ~Chem.SANITIZE_SETAROMATICITY) # sanitize everything EXCEPT reassignment of aromaticity

def expanded_SMILES(smiles : str, assign_map_nums : bool=True, start_from : int=1) -> str:
'''Takes a SMILES string and clarifies chemical information, namely explicit hydrogens, kekulized aromatic bonds, and atom map numbers'''
def expanded_SMILES(
smiles : str,
assign_map_nums : bool=True,
start_from : int=1,
kekulize : bool=True,
) -> str:
'''
Expands and clarifies the chemical information contained within a passed SMILES string
namely explicit hydrogens and bond orders, and (optionally) kekulized aromatic bonds and atom map numbers
'''
assert(is_valid_SMILES(smiles))

rdmol = Chem.MolFromSmiles(smiles, sanitize=True) # TOSELF : determine values of pros/cons of sanitizations (freedom of specificity vs random RDKit errors)
rdmol = Chem.MolFromSmiles(smiles, sanitize=True)
rdmol = Chem.AddHs(rdmol, addCoords=True)
if assign_map_nums:
rdmol = molwise.assign_ordered_atom_map_nums(rdmol, start_from=start_from)

Chem.Kekulize(rdmol, clearAromaticFlags=True)

if kekulize:
Chem.Kekulize(rdmol, clearAromaticFlags=True)
Chem.SanitizeMol(rdmol)

return Chem.MolToSmiles(rdmol, kekuleSmiles=True, allBondsExplicit=True, allHsExplicit=True)
return Chem.MolToSmiles(rdmol, kekuleSmiles=kekulize, allBondsExplicit=True, allHsExplicit=True)


# REGEX TEMPLATES FOR COMPLIANT SMARTS
Expand Down Expand Up @@ -66,7 +75,14 @@ def chem_info_from_match(match : re.Match) -> dict[str, Union[int, str, None]]:


# SMARTS ATOM QUERY GENERATION
def compliant_atom_query_from_info(atomic_num : int, degree : int, atom_map_num : int, formal_charge : int=0, isotope : int=0, as_atom : bool=False) -> Union[str, QueryAtom]:
def compliant_atom_query_from_info(
atomic_num : int,
degree : int,
atom_map_num : int,
formal_charge : int=0,
isotope : int=0,
as_atom : bool=False
) -> Union[str, QueryAtom]:
'''Construct a monomer-spec compliant atom SMARTS string directly from chemical information'''
if not isotope: # handles when isotope is literal 0 or NoneType
isotope = "" # non-specific isotope is not explicitly written in string (left empty)
Expand Down Expand Up @@ -126,7 +142,7 @@ def compliant_mol_SMARTS(smarts : str) -> str:
count=rdmol.GetNumAtoms() # can't possibly replace more queries than there are atoms
)
if num_repl > 0:
LOGGER.warn(f'Cleaned {num_repl} SMARTS atom query aberrations introduced by RDKit')
LOGGER.debug(f'Cleaned {num_repl} SMARTS atom query aberrations introduced by RDKit')
sanitized_smarts = sanitized_smarts.replace('#0', '*') # replace explicit atom number 0 calls with star (easier to do post-processing, as #0 is easier to implement)

return sanitized_smarts
8 changes: 8 additions & 0 deletions polymerist/rdutils/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -2,3 +2,11 @@

__author__ = 'Timotej Bernat'
__email__ = 'timotej.bernat@colorado.edu'

from .rdkdraw import (
set_rdkdraw_size,
enable_substruct_highlights,
disable_substruct_highlights,
enable_kekulized_drawing,
disable_kekulized_drawing,
)
14 changes: 11 additions & 3 deletions polymerist/rdutils/rdkdraw.py
Original file line number Diff line number Diff line change
Expand Up @@ -20,6 +20,10 @@


# GLOBAL PREFERENCES
def set_rdkdraw_size(dim : int=300, aspect : float=3/2):
'''Change image size and shape of RDKit Mol images'''
IPythonConsole.molSize = (int(aspect*dim), dim) # Change IPython image display size

def enable_substruct_highlights() -> None:
'''Turns on highlighting of found substructures when performing substructure matches'''
IPythonConsole.highlightSubstructs = True
Expand All @@ -28,9 +32,13 @@ def disable_substruct_highlights() -> None:
'''Turns off highlighting of found substructures when performing substructure matches'''
IPythonConsole.highlightSubstructs = False

def set_rdkdraw_size(dim : int=300, aspect : float=3/2):
'''Change image size and shape of RDKit Mol images'''
IPythonConsole.molSize = (int(aspect*dim), dim) # Change IPython image display size
def enable_kekulized_drawing() -> None:
'''Turns on automatic kekulization of aromatic bonds before drawing molecules in Jupyter Notebooks'''
IPythonConsole.kekulizeStructures = True

def disable_kekulized_drawing() -> None:
'''Turns off automatic kekulization of aromatic bonds before drawing molecules in Jupyter Notebooks'''
IPythonConsole.kekulizeStructures = False


# SINGLE-MOLECULE DISPLAY OPTIONS
Expand Down
Loading
Loading