diff --git a/README.md b/README.md index ae9138e..c480d1d 100644 --- a/README.md +++ b/README.md @@ -38,7 +38,7 @@ Once you have a package manager installed, you may proceed with one of the provi A fully-featured install in a safe virtual environment (named "polymerist-env", here) can be obtained by running the following terminal commands: #### Mamba install (basic) -```sh +```bash mamba create -n polymerist-env python=3.11 mamba activate polymerist-env pip install polymerist @@ -47,7 +47,7 @@ mamba install -c conda-forge openff-toolkit mbuild openbabel #### Mamba install (extended) An extended install with [Jupyter Notebook](https://jupyter.org/) support, molecular visualization capability, and chemical data querying capability can be obtained very similarly: -```sh +```bash mamba create -n polymerist-env python=3.11 mamba activate polymerist-env pip install polymerist[interactive,chemdb] @@ -56,7 +56,7 @@ mamba install -c conda-forge openff-toolkit mbuild openbabel #### Conda install (not recommended) Equivalent commands using `conda` (in case `mamba` has not been installed or the user is too stubborn to use it) are given below. These will perform the same installation, just much more slowly: -```sh +```bash conda create -n polymerist-env python=3.11 conda activate polymerist-env pip install polymerist[interactive,chemdb] @@ -67,7 +67,7 @@ In either case, the final [openff-toolkit](https://github.com/openforcefield/ope #### 1.1) Testing installation To see if the installation was successful, one can run the following short set of commands which should yield the outputs shown: -```sh +```python mamba activate polymerist-env; python >>> import polymerist as ps >>> print(ps.pascal(5)) @@ -82,7 +82,7 @@ mamba activate polymerist-env; python Assigning atomic partial charges using some flavor of [AM1-BCC](https://docs.eyesopen.com/toolkits/python/quacpactk/molchargetheory.html#am1bcc-charges) with `polymerist` also requires installation of some supplementary toolkits. One can mix-and-match installing any combination of the toolkits below to taste or (if impatient or indifferent) opt for a "shotgun" approach and install all 3 with the following commands: -```sh +```bash mamba activate polymerist-env mamba install -c openeye openeye-toolkits mamba install -c conda-forge espaloma_charge openff-nagl @@ -102,7 +102,7 @@ This is an OpenFF-specific GNN based on similar architecture to Espaloma with a Polymerist can also be installed directly from the source code in this repository. To install, execute the following set of terminal commands in whichever directory you'd like the installation to live on your local machine: #### Mamba install (source) -```sh +```bash git clone https://github.com/timbernat/polymerist cd polymerist mamba env create -n polymerist-env -f devtools/conda-envs/release-build.yml @@ -111,7 +111,7 @@ pip install . ``` #### Conda install (source, not recommended) -```sh +```bash git clone https://github.com/timbernat/polymerist cd polymerist conda env create -n polymerist-env -f devtools/conda-envs/release-build.yml @@ -122,7 +122,7 @@ Once the source install is complete, you no longer need the clone of the polymer ### Developer installation (for advanced users only) Those developing for `polymerist` may like to have an editable local installation, in which they can make changes to the source code and test behavior changes in real-time. In this case, one requires an "editable build" which mirrors the source files that live in the site_packages directory of the created environment. This type of installation proceeds as follows: -```sh +```bash git clone https://github.com/timbernat/polymerist cd polymerist mamba env create -n polymerist-dev -f devtools/conda-envs/dev-build.yml diff --git a/polymerist/genutils/importutils/dependencies.py b/polymerist/genutils/importutils/dependencies.py index fa39f5b..2e555c2 100644 --- a/polymerist/genutils/importutils/dependencies.py +++ b/polymerist/genutils/importutils/dependencies.py @@ -3,7 +3,7 @@ __author__ = 'Timotej Bernat' __email__ = 'timotej.bernat@colorado.edu' -from typing import Callable, Optional, ParamSpec, TypeVar +from typing import Callable, Optional, ParamSpec, TypeVar, Union Params = ParamSpec('Params') ReturnType = TypeVar('ReturnType') @@ -58,7 +58,7 @@ def module_installed(module_name : str) -> bool: try: # NOTE: opted for this implementation, as it never actually imports the package in question (faster and fewer side-effects) return find_spec(module_name) is not None - except (ValueError, AttributeError, ModuleNotFoundError): # these could all be raised by + except (ValueError, AttributeError, ModuleNotFoundError): # these could all be raised by a missing module return False def modules_installed(*module_names : list[str]) -> bool: @@ -80,7 +80,7 @@ def modules_installed(*module_names : list[str]) -> bool: def requires_modules( *required_module_names : list[str], - missing_module_error : type[Exception]=ImportError, + missing_module_error : Union[Exception, type[Exception]]=ImportError, ) -> Callable[[TCall[..., ReturnType]], TCall[..., ReturnType]]: ''' Decorator which enforces optional module dependencies prior to function execution @@ -99,12 +99,27 @@ def requires_modules( Raised if any of the specified packages is not found to be installed Exception message will indicate the name of the specific package found missing ''' + # meta-check to ensure type of raised Exception is valid + if not isinstance(missing_module_error, Exception): + if not (isinstance(missing_module_error, type) and issubclass(missing_module_error, Exception)): + # DEV: this is potentially brittle, depending on how the specific Exception subtype is implemented? + raise TypeError('Must pass either Exception instance or subtype to "missing_module_error') + + def tailored_exception(module_name : str) -> Exception: + '''Accessory function to generate targetted Exceptions based on the provided + mssing_module_error value and the name of a module with no found installation''' + if isinstance(missing_module_error, Exception): + return missing_module_error + + if isinstance(missing_module_error, type): + return missing_module_error(f'No installation found for module "{module_name}"') + def decorator(func) -> TCall[..., ReturnType]: @wraps(func) def req_wrapper(*args : Params.args, **kwargs : Params.kwargs) -> ReturnType: for module_name in required_module_names: if not module_installed(module_name): - raise missing_module_error(f'No installation found for module "{module_name}"') + raise tailored_exception(module_name) else: return func(*args, **kwargs) diff --git a/polymerist/genutils/textual/prettyprint.py b/polymerist/genutils/textual/prettyprint.py index 94d9db7..af89a3d 100644 --- a/polymerist/genutils/textual/prettyprint.py +++ b/polymerist/genutils/textual/prettyprint.py @@ -4,8 +4,54 @@ __email__ = 'timotej.bernat@colorado.edu' from typing import Any + from textwrap import indent +from enum import StrEnum + + +class Justification(StrEnum): + '''For specifying string justification''' + LEFT = '<' + CENTER = '^' + RIGHT = '>' +Just = Justification # alias for the lazy or hurried +def procrustean_string( + string : str, + length : int, + padding : str=' ', + just : Justification=Justification.LEFT, + ) -> int: + '''Takes a string and a target length and returns a new string which begins + with the same characters as the original string but is clamped to the target length, + truncating or padding if the original string is too long or short, respectively + + Parameters + ---------- + string : str + The string to stretch or cut + length : int + The target number of characters in the final string + padding : str, default=" " + A single character which shold be used as padding + when strings are too short, by default just a space + MUST BE EXACTLY ONE CHARACTER! + just : Justification, default=Justification.LEFT + Enum specifier of how to justify a padded string + Options are Justification.LEFT, Justification.CENTER, or Justification.RIGHT + + Returns + ------- + fmt_str : str + A string which begins with the same characters as "string" but has + precisely the specified length, with specified padding as specified + ''' + if not (isinstance(length, int) and (length >= 0)): + raise ValueError(f'Target string length must be a non-negative integer (not {length})') + if not len(padding) == 1: + raise IndexError(f'Padding string must contain exactly one character (passed "{padding}")') + + return f'{string[:length]:{padding}{just.value}{length}}' def dict_to_indented_str(dict_to_stringify : dict[Any, Any], level_delimiter : str='\t', line_sep : str='\n') -> str: '''Generate a pretty-printable string from a (possibly nested) dictionary, diff --git a/polymerist/polymers/monomers/repr.py b/polymerist/polymers/monomers/repr.py index 0aa0d8d..a378a00 100644 --- a/polymerist/polymers/monomers/repr.py +++ b/polymerist/polymers/monomers/repr.py @@ -50,10 +50,10 @@ def _add_monomer(self, resname : str, smarts : Smarts) -> None: if resname in self.monomers: existing_resgroup = self.monomers[resname] if isinstance(existing_resgroup, list) and (smarts not in existing_resgroup): - LOGGER.info(f'Extending existing residue category "{resname}" with SMARTS {smarts}') + LOGGER.debug(f'Extending existing residue category "{resname}" with SMARTS {smarts}') self.monomers[resname].append(smarts) else: - LOGGER.info(f'Creating new residue category "{resname}", containing singular SMARTS ["{smarts}"])') + LOGGER.debug(f'Creating new residue category "{resname}", containing singular SMARTS ["{smarts}"])') self.monomers[resname] = [smarts] def _add_monomers(self, resname : str, smarts_container : Iterable[Smarts]) -> None: diff --git a/polymerist/polymers/monomers/specification.py b/polymerist/polymers/monomers/specification.py index 8561dd4..546fa03 100644 --- a/polymerist/polymers/monomers/specification.py +++ b/polymerist/polymers/monomers/specification.py @@ -19,19 +19,28 @@ # CHEMICAL INFO SPECIFICATION SANITIZE_AS_KEKULE = (Chem.SANITIZE_ALL & ~Chem.SANITIZE_SETAROMATICITY) # sanitize everything EXCEPT reassignment of aromaticity -def expanded_SMILES(smiles : str, assign_map_nums : bool=True, start_from : int=1) -> str: - '''Takes a SMILES string and clarifies chemical information, namely explicit hydrogens, kekulized aromatic bonds, and atom map numbers''' +def expanded_SMILES( + smiles : str, + assign_map_nums : bool=True, + start_from : int=1, + kekulize : bool=True, + ) -> str: + ''' + Expands and clarifies the chemical information contained within a passed SMILES string + namely explicit hydrogens and bond orders, and (optionally) kekulized aromatic bonds and atom map numbers + ''' assert(is_valid_SMILES(smiles)) - rdmol = Chem.MolFromSmiles(smiles, sanitize=True) # TOSELF : determine values of pros/cons of sanitizations (freedom of specificity vs random RDKit errors) + rdmol = Chem.MolFromSmiles(smiles, sanitize=True) rdmol = Chem.AddHs(rdmol, addCoords=True) if assign_map_nums: rdmol = molwise.assign_ordered_atom_map_nums(rdmol, start_from=start_from) - - Chem.Kekulize(rdmol, clearAromaticFlags=True) + + if kekulize: + Chem.Kekulize(rdmol, clearAromaticFlags=True) Chem.SanitizeMol(rdmol) - return Chem.MolToSmiles(rdmol, kekuleSmiles=True, allBondsExplicit=True, allHsExplicit=True) + return Chem.MolToSmiles(rdmol, kekuleSmiles=kekulize, allBondsExplicit=True, allHsExplicit=True) # REGEX TEMPLATES FOR COMPLIANT SMARTS @@ -66,7 +75,14 @@ def chem_info_from_match(match : re.Match) -> dict[str, Union[int, str, None]]: # SMARTS ATOM QUERY GENERATION -def compliant_atom_query_from_info(atomic_num : int, degree : int, atom_map_num : int, formal_charge : int=0, isotope : int=0, as_atom : bool=False) -> Union[str, QueryAtom]: +def compliant_atom_query_from_info( + atomic_num : int, + degree : int, + atom_map_num : int, + formal_charge : int=0, + isotope : int=0, + as_atom : bool=False + ) -> Union[str, QueryAtom]: '''Construct a monomer-spec compliant atom SMARTS string directly from chemical information''' if not isotope: # handles when isotope is literal 0 or NoneType isotope = "" # non-specific isotope is not explicitly written in string (left empty) @@ -126,7 +142,7 @@ def compliant_mol_SMARTS(smarts : str) -> str: count=rdmol.GetNumAtoms() # can't possibly replace more queries than there are atoms ) if num_repl > 0: - LOGGER.warn(f'Cleaned {num_repl} SMARTS atom query aberrations introduced by RDKit') + LOGGER.debug(f'Cleaned {num_repl} SMARTS atom query aberrations introduced by RDKit') sanitized_smarts = sanitized_smarts.replace('#0', '*') # replace explicit atom number 0 calls with star (easier to do post-processing, as #0 is easier to implement) return sanitized_smarts diff --git a/polymerist/rdutils/__init__.py b/polymerist/rdutils/__init__.py index d5a1403..4ffe962 100644 --- a/polymerist/rdutils/__init__.py +++ b/polymerist/rdutils/__init__.py @@ -2,3 +2,11 @@ __author__ = 'Timotej Bernat' __email__ = 'timotej.bernat@colorado.edu' + +from .rdkdraw import ( + set_rdkdraw_size, + enable_substruct_highlights, + disable_substruct_highlights, + enable_kekulized_drawing, + disable_kekulized_drawing, +) \ No newline at end of file diff --git a/polymerist/rdutils/rdkdraw.py b/polymerist/rdutils/rdkdraw.py index ae11fbc..046dd10 100644 --- a/polymerist/rdutils/rdkdraw.py +++ b/polymerist/rdutils/rdkdraw.py @@ -20,6 +20,10 @@ # GLOBAL PREFERENCES +def set_rdkdraw_size(dim : int=300, aspect : float=3/2): + '''Change image size and shape of RDKit Mol images''' + IPythonConsole.molSize = (int(aspect*dim), dim) # Change IPython image display size + def enable_substruct_highlights() -> None: '''Turns on highlighting of found substructures when performing substructure matches''' IPythonConsole.highlightSubstructs = True @@ -28,9 +32,13 @@ def disable_substruct_highlights() -> None: '''Turns off highlighting of found substructures when performing substructure matches''' IPythonConsole.highlightSubstructs = False -def set_rdkdraw_size(dim : int=300, aspect : float=3/2): - '''Change image size and shape of RDKit Mol images''' - IPythonConsole.molSize = (int(aspect*dim), dim) # Change IPython image display size +def enable_kekulized_drawing() -> None: + '''Turns on automatic kekulization of aromatic bonds before drawing molecules in Jupyter Notebooks''' + IPythonConsole.kekulizeStructures = True + +def disable_kekulized_drawing() -> None: + '''Turns off automatic kekulization of aromatic bonds before drawing molecules in Jupyter Notebooks''' + IPythonConsole.kekulizeStructures = False # SINGLE-MOLECULE DISPLAY OPTIONS diff --git a/polymerist/smileslib/chemdbqueries.py b/polymerist/smileslib/chemdbqueries.py index 5631408..bd00b9c 100644 --- a/polymerist/smileslib/chemdbqueries.py +++ b/polymerist/smileslib/chemdbqueries.py @@ -6,47 +6,303 @@ import logging LOGGER = logging.getLogger(__name__) -from typing import Optional - -from ..genutils.importutils.dependencies import modules_installed, MissingPrerequisitePackage -if not modules_installed('cirpy'): - raise MissingPrerequisitePackage( - importing_package_name=__spec__.name, - use_case='Querying the NIH CACTUS Chemical Identifier Resolver (CIR)', - install_link='https://cirpy.readthedocs.io/en/latest/guide/install.html', - dependency_name='cirpy', - dependency_name_formal='CIRpy', - ) -import cirpy - -from ..genutils.importutils.dependencies import modules_installed, MissingPrerequisitePackage -if not modules_installed('pubchempy'): - raise MissingPrerequisitePackage( - importing_package_name=__spec__.name, - use_case='Querying the PubChem Compound database', - install_link='https://pubchempy.readthedocs.io/en/latest/guide/install.html', - dependency_name='pubchempy', - dependency_name_formal='PubChemPy', - ) -import pubchempy as pcp - - -def get_property_from_smiles(smiles : str, prop_name : str='iupac_name') -> Optional[str]: # TODO: abstract each query method via Strategy pattern - '''Takes the SMILES string representing a molecule and attempts to fetch its IUPAC name from NIH CACTUS and/or PubChem - Returns the fetched IUPAC name as a str, or NoneType if both queries fail''' - # Open with NIH query (fastest method), return name if found... - LOGGER.debug(f'Attempting query of property "{prop_name}" from NIH CACTUS') - iupac_name = cirpy.resolve(smiles, prop_name) - if iupac_name is not None: - if isinstance(iupac_name, list): - return iupac_name.pop(0) - return iupac_name - - # ...otherwise, search through PubChem Compound queries for a matching results - pc_prop_name = pcp.PROPERTY_MAP.get(prop_name, prop_name) - LOGGER.debug(f'Attempting query of property "{pc_prop_name}" from PubChem PUGREST') - for prop_query in pcp.get_properties(pc_prop_name, smiles, namespace='smiles'): - if pc_prop_name in prop_query: - return prop_query[pc_prop_name] - else: - return None \ No newline at end of file +from typing import Any, ClassVar, Container, Iterable, Optional, Sequence +from abc import ABC, abstractmethod + +import requests + +from ..genutils.decorators.classmod import register_abstract_class_attrs, register_subclasses +from ..genutils.importutils.dependencies import requires_modules, MissingPrerequisitePackage + + +# CUSTOM EXCEPTIONS +class InvalidPropertyError(Exception): + '''Raised when attempting to query a property that a chemical database service cannot provide''' + pass + +class NullPropertyResponse(Exception): + '''Raised when a chemical database query doesn't fail BUT returns a NoneType where not allowed''' + pass + +class ChemicalDataQueryFailed(Exception): + '''Raised when a chemical data query is unfulfilled by a service''' + pass + +# STRATEGIES BASE FOR QUERYING CHEMICAL DATA +@register_subclasses(key_attr='service_name') +@register_abstract_class_attrs('service_name') +class ChemDBServiceQueryStrategy(ABC): + '''Implementation of queries from a particular chemical database''' + @abstractmethod + def _get_property(self, prop_name : str, representation : str, **kwargs) -> Optional[Any]: + ... + + @classmethod + def dependencies(cls) -> Iterable[str]: + '''For internals, allows dynamic checking for package dependencies (useful for automating unit test boilerplate)''' + ... + + @classmethod + @abstractmethod + def is_online(cls) -> bool: + '''Check if the service being queried is online and can accept requests''' + ... + + @classmethod + @abstractmethod + def queryable_properties(cls) -> Container[str]: + '''List which chemical property names can be queried from the service''' + ... + + @classmethod + @abstractmethod + def queryable_namespaces(cls) -> Container[str]: + '''List which chemical identification types can be searched through by the service''' + ... + + def validate_property(self, prop_name : str) -> None: + '''Pre-check to ensure that a property is queryable from a service before attempting HTTP query''' + if prop_name not in self.queryable_properties(): + prop_options_str = '\n'.join(sorted(self.queryable_properties())) + prop_error_msg = f'Cannot query property "{prop_name}" from {self.service_name}' + LOGGER.error(prop_error_msg) # log briefer error message in cases where the ensuing ValueError is bypassed + + raise InvalidPropertyError(f'{prop_error_msg};\nChoose from one of the following property names:\n{prop_options_str}') + + def get_property( + self, + prop_name : str, + representation : str, + namespace : Optional[str], + keep_first_only : bool=True, + allow_null_return : bool=False, + **kwargs + ) -> Optional[Any]: + '''Fetch a property associated with a molecule from a chemical database query service''' + LOGGER.info(f'Sent query request for property "{prop_name}" to {self.service_name}') + self.validate_property(prop_name=prop_name) + + prop_val = self._get_property(prop_name=prop_name, representation=representation, namespace=namespace, **kwargs) + if not prop_val: + prop_val = None # cast empty lists, strings, etc to NoneType + + if isinstance(prop_val, Container) and not isinstance(prop_val, str) and keep_first_only: # avoid bug where first char of string response is returned + prop_val = prop_val[0] + + if (prop_val is None) and (not allow_null_return): # NOTE: duplicated NoneType check is needed to catch empty containers which are cast to None above + null_error_msg = f'{self.service_name} returned NoneType "{prop_name}", which is declared invalid by call signature' + LOGGER.error(null_error_msg) + + raise NullPropertyResponse(null_error_msg) + LOGGER.info(f'Successfully received property "{prop_name}" from {self.service_name}') + + return prop_val + +# CONCRETE IMPLEMENTATIONS OF CHEMICAL DATABASE SERVICE QUERIES +## NIH CACTUS +cirpy_error = MissingPrerequisitePackage( + importing_package_name=__spec__.name, + use_case='Querying the NIH CACTUS Chemical Identifier Resolver (CIR)', + install_link='https://cirpy.readthedocs.io/en/latest/guide/install.html', + dependency_name='cirpy', + dependency_name_formal='CIRpy', +) + +class NIHCACTUSQueryStrategy(ChemDBServiceQueryStrategy): + ''' + Implementation of chemical query requests to the NIH's CADD group + Cheminformatics Tools and User Services (CACTUS) Chemical Identifier Resolver (CIR) + ''' + service_name : ClassVar[str] = 'NIH CACTUS CIR' + + @classmethod + def dependencies(cls): + return ['cirpy'] + + @classmethod + @requires_modules('cirpy', missing_module_error=cirpy_error) + def queryable_properties(cls) -> set[str]: + import cirpy + + _CIR_PROPS = { # see official docs for more info: https://cactus.nci.nih.gov/chemical/structure_documentation + 'smiles', + 'ficts', + 'ficus', + 'uuuuu', + 'hashisy', + 'names', + 'iupac_name', + 'cas', + 'chemspider_id', + 'image', + 'twirl', + 'mw', + 'formula', + 'h_bond_donor_count', + 'h_bond_acceptor_count', + 'h_bond_center_count', + 'rule_of_5_violation_count', + 'rotor_count', + 'effective_rotor_count', + 'ring_count', + 'ringsys_count', + 'inchi', + 'inchikey', + # shortened aliases of InChI-related properties + 'stdinchi', + 'stdinchikey', + # these were not documented on CACTUS or by cirpy, but scraped from webchem: https://github.com/ropensci/webchem/blob/master/R/cir.R#L168-L174 + 'deprotonable_group_count', + 'heavy_atom_count', + 'heteroatom_count', + 'hydrogen_atom_count', + 'monoisotopic_mass', + 'protonable_group_count', + 'xlogp2', + } + return _CIR_PROPS | cirpy.FILE_FORMATS + + @classmethod + def is_online(cls): + response = requests.head('https://cactus.nci.nih.gov/chemical/structure') + return response.status_code < 500 # NOTE: could also be more stringent and check == 200 for OK; enough to just check server-side error for now + + @classmethod + def queryable_namespaces(cls) -> set[str]: + return { # obtained from https://cirpy.readthedocs.io/en/latest/guide/resolvers.html + 'smiles', + 'stdinchikey', + 'stdinchi', + 'ncicadd_identifier', # (for FICTS, FICuS, uuuuu) + 'hashisy', + 'cas_number', + 'name', # this is not documented but DOES work + 'name_by_opsin', + 'name_by_cir', + } + + @requires_modules('cirpy', missing_module_error=cirpy_error) + def _get_property(self, prop_name : str, representation : str, namespace : Optional[str]=None, **kwargs): + import cirpy + + return cirpy.resolve(representation, prop_name, resolvers=[namespace], **kwargs) + +## PubChem +pubchempy_error = MissingPrerequisitePackage( + importing_package_name=__spec__.name, + use_case='Querying the PubChem Compound database', + install_link='https://pubchempy.readthedocs.io/en/latest/guide/install.html', + dependency_name='pubchempy', + dependency_name_formal='PubChemPy', +) + +class PubChemQueryStrategy(ChemDBServiceQueryStrategy): + ''' + Implementation of chemical query requests to PubChem via the + PUG REST API (https://pubchem.ncbi.nlm.nih.gov/docs/pug-rest) + ''' + service_name : ClassVar[str] = 'PubChem' + + @classmethod + def dependencies(cls): + return ['pubchempy'] + + @classmethod + def is_online(cls): + response = requests.get('https://pubchem.ncbi.nlm.nih.gov/rest/pug/compound/name/aspirin/property/IUPACName/TXT') # sample query which is well-formatted + return response.status_code < 500 # NOTE: enough to just check server-side error for now, but could be more stringent and check if ==200 + + @classmethod + @requires_modules('pubchempy', missing_module_error=pubchempy_error) + def queryable_properties(cls) -> set[str]: + from pubchempy import PROPERTY_MAP + + return set(PROPERTY_MAP.keys()) | set(PROPERTY_MAP.values()) | {'Fingerprint2D'} # also taken from webchem: https://github.com/ropensci/webchem/blob/master/R/pubchem.R#L377C21-L392C55 + + @classmethod + def queryable_namespaces(cls) -> set[str]: + return { # obtained from https://pubchem.ncbi.nlm.nih.gov/docs/pug-rest#section=Input + 'cid', + 'name', + 'smiles', + 'inchi', + 'sdf', + 'inchikey', + 'formula', + 'listkey', + } + + @requires_modules('pubchempy', missing_module_error=pubchempy_error) + def _get_property(self, prop_name : str, representation : str, namespace : Optional[str]='smiles', **kwargs) -> Optional[Any]: + from pubchempy import PROPERTY_MAP, get_properties, PubChemPyError + + official_prop_name = PROPERTY_MAP.get(prop_name, prop_name) # this is done internally, but needed here to extract the property value from the final return dict + try: + pubchem_result = get_properties(official_prop_name, identifier=representation, namespace=namespace, **kwargs) + except PubChemPyError: + raise requests.HTTPError # discards some information in return for making Strategy interface oblivious to pubchempy (i.e. in case it is not installed) + else: + if pubchem_result: + pubchem_result = [ + query_result[official_prop_name] # extract property value from extraneous CID (and any other) info + for query_result in pubchem_result + if official_prop_name in query_result # skip if return doesn't contain the info we specifically requested (happens occasionally for some reason) + ] + return pubchem_result + +# UTILITY FUNCTIONS EMPLOYING GENERIC STRATEG(Y/IES) +def get_chemical_property( + prop_name : str, + representation : str, + namespace : str='smiles', + keep_first_only : bool=True, + allow_null_return : bool=False, + fail_quietly : bool=False, + services : Optional[Sequence['ChemDBServiceQueryStrategy']]=None, + **kwargs, + ) -> Optional[Any]: + '''Attempt to fetch a molecular property from a variety of chemical database services, either + provided manually (in the order they should be checked) or ALL implemented service queries by default + + Will return the first valid returned result or, if all services fail, raise Exception + ''' + # determine services which should be queried + if services is None: + services = [chem_query_strat_type() for chem_query_strat_type in ChemDBServiceQueryStrategy.subclass_registry.values()] + if not services: # check if "services" turns out to be an empty collection (either as-passed or because no subclasses are implemented when defaulting) + raise IndexError('Must provide at least one chemical database querying strategy to "services"') + n_services_to_try : int = len(services) + + # query services sequentially in order of appearance + for i, service in enumerate(services, start=1): + ## validate type of service strategies + if isinstance(service, type): + service = service() # allows ChemDBServiceQueryStrategy types to be passed in lieu of instances + if not isinstance(service, ChemDBServiceQueryStrategy): + raise TypeError(f'Services must be specified as {ChemDBServiceQueryStrategy.__name__} instances, not objects of type {type(service.__name)}') + + ## attempt to query result from service + LOGGER.info(f'Attempting chemical property query to service {i}/{n_services_to_try} ("{service.service_name}"):') + try: + prop_val = service.get_property( + prop_name, + representation, + namespace, + keep_first_only=keep_first_only, + allow_null_return=allow_null_return, + **kwargs, + ) + return prop_val + except requests.HTTPError: + LOGGER.error(f'Query to {service.service_name} failed, either due to connection timeout or invalid request') + continue + except (InvalidPropertyError, NullPropertyResponse): # skip over invalid property names (keep trying other services rather than failing) + # log messages baken in to respective raises for these custom exceptions + continue + else: # take action when None of the provided services turn up fruitful + fail_msg = 'Query could not be fulfilled by any of the provided chemical query services' + if fail_quietly: + LOGGER.error(f'{fail_msg}; returning NoneType') + return None + else: # fail vocally if none of the services can fulfill the property request + raise ChemicalDataQueryFailed(fail_msg) \ No newline at end of file diff --git a/polymerist/tests/genutils/importutils/test_dependencies.py b/polymerist/tests/genutils/importutils/test_dependencies.py index eea4ee1..844edc1 100644 --- a/polymerist/tests/genutils/importutils/test_dependencies.py +++ b/polymerist/tests/genutils/importutils/test_dependencies.py @@ -5,7 +5,7 @@ import pytest -from typing import Any, Callable +from typing import Any, Callable, Union from polymerist.genutils.importutils import dependencies @@ -23,24 +23,42 @@ def test_modules_installed(module_names : list[str], expected_found : bool) -> N '''Check that module install checker correctly identifies present and absent modules''' assert dependencies.modules_installed(*module_names) == expected_found -# Testing requires_modules decorator -@dependencies.requires_modules('os') -def should_pass() -> str: - '''Dummy function to test requires_modules decorator for dependencies that are present''' - return 'I will run!' - -@dependencies.requires_modules('fake--module') -def should_fail() -> str: - '''Dummy function to test requires_modules decorator for dependencies that are present''' - return 'I will xfail :(' - +# Testing requires_modules() decorator @pytest.mark.parametrize( - 'func', + 'module_name,missing_module_error', [ - should_pass, - pytest.param(should_fail, marks=pytest.mark.xfail(raises=ImportError, reason='The required module shouldn\'t be found in the environment', strict=True)), + ('os', ImportError), + ('os', ImportError('This is not the default message!')), + pytest.param( + 'os', 42, # note that module IS valid here but the Exception raised is not (meta-error) + marks=pytest.mark.xfail( + raises=TypeError, + reason='Non Exception-like object passed to missing_module_error', + strict=True, + ) + ), + pytest.param( + 'fake--module', ImportError, + marks=pytest.mark.xfail( + raises=ImportError, # NOTE that this ImportError MUST be the same as the import error passed via args + reason='The required module shouldn\'t be found in the environment', + strict=True, + ) + ), + pytest.param( + 'fake--module', AttributeError('something else'), + marks=pytest.mark.xfail( + raises=AttributeError, # NOTE that this AttributeError MUST be the same as the import error passed via args + reason='The required module shouldn\'t be found in the environment', + strict=True, + ) + ), ] ) -def test_requires_modules(func : Callable[..., Any]) -> None: - '''Test that the requires_modules decortor correctly wraps functions''' +def test_requires_modules(module_name : str, missing_module_error : Union[Exception, type[Exception]]) -> None: + '''Test that the requires_modules decorator correctly wraps functions''' + @dependencies.requires_modules(module_name, missing_module_error=missing_module_error) + def func() -> str: + return 'I am pointless' + _ = func() # no assertion needed, xfail cases should raise Exception while working cases will ternimate without Exception \ No newline at end of file diff --git a/polymerist/tests/smileslib/test_chemdbqueries.py b/polymerist/tests/smileslib/test_chemdbqueries.py new file mode 100644 index 0000000..01e6df1 --- /dev/null +++ b/polymerist/tests/smileslib/test_chemdbqueries.py @@ -0,0 +1,295 @@ +'''Unit tests for `chemdbqueries` package''' + +__author__ = 'Timotej Bernat' +__email__ = 'timotej.bernat@colorado.edu' + +import pytest + +from typing import Any +from dataclasses import dataclass, asdict + +from requests import HTTPError + +from polymerist.genutils.importutils.dependencies import modules_installed, MissingPrerequisitePackage +from polymerist.smileslib.chemdbqueries import ( + get_chemical_property, + InvalidPropertyError, + NullPropertyResponse, + ChemicalDataQueryFailed, + ChemDBServiceQueryStrategy, + # NOTE: these strageies are implemented to be defined even if the packages in question aren't installed + # it's just that instances will raise excception on most of their method calls + NIHCACTUSQueryStrategy, + PubChemQueryStrategy, + get_chemical_property, + +) + +CHEMDB_STRATEGY_ONLINE : dict[str, bool] = {} +CHEMDB_STRATEGY_DEPENDENCIES_MET : dict[ChemDBServiceQueryStrategy, bool] = {} +for ChemDBStrategy in ChemDBServiceQueryStrategy.__subclasses__(): # dynamically determine criteria for which services should be tested + CHEMDB_STRATEGY_ONLINE[ ChemDBStrategy] = ChemDBStrategy.is_online() + CHEMDB_STRATEGY_DEPENDENCIES_MET[ChemDBStrategy] = modules_installed(*ChemDBStrategy.dependencies()) + +@pytest.mark.parametrize( + 'service_type', + [ + pytest.param( + service_type, + marks=pytest.mark.xfail( + raises=MissingPrerequisitePackage, + reason='Unsatisfied dependency needed for chemical database service to be imported', + strict=True, + ) + ) + for service_type, dependencies_met in CHEMDB_STRATEGY_DEPENDENCIES_MET.items() + if not dependencies_met + ] +) +def test_missing_dependency_xfail(service_type : ChemDBServiceQueryStrategy) -> None: + '''Test whether checks for missing prerequisite dependencies are active''' + _ = service_type.queryable_properties() # happen to know this requires respective dependencies at the time of writing + + +@dataclass +class ChemDBQueryExample: + '''For encapsulating the many parameters passable to a chemical database service query''' + prop_name : str + representation : str + namespace : str + keep_first_only : bool + allow_null_return : bool + +# SERVICE_QUERY_EXAMPLES : + +EXAMPLES_BY_SERVICE : list[tuple[ChemDBServiceQueryStrategy, ChemDBQueryExample, Any]] = [ + # for NIH CACTUS + ( ## simple queries known to work for all services + NIHCACTUSQueryStrategy, + ChemDBQueryExample( + 'iupac_name', + 'CCO', + namespace='smiles', + keep_first_only=True, + allow_null_return=False, + ), + 'ethanol' + ), + ( + NIHCACTUSQueryStrategy, + ChemDBQueryExample( + 'inchi', + 'N-methylformamide', + namespace='name', + keep_first_only=True, + allow_null_return=False, + ), + 'InChI=1/C2H5NO/c1-3-2-4/h2H,1H3,(H,3,4)/f/h3H', + ), + ( ## testing that different namespaces can be queried + NIHCACTUSQueryStrategy, + ChemDBQueryExample( + 'mw', + 'benzophenone', + namespace='name', + keep_first_only=True, + allow_null_return=False, + ), + '182.2214' + ), + ( ## testing that returns with multiple data values work + NIHCACTUSQueryStrategy, + ChemDBQueryExample( + 'names', + 'c1ccccc1-C(=S)S', + namespace='smiles', + keep_first_only=False, + allow_null_return=False, + ), + ['Benzenecarbodithioic acid', '121-68-6', 'UPCMLD00WV-104', 'EINECS 204-491-4', 'Benzenecarbodithioic acid', 'Dithiobenzoic acid', 'NSC732246'] + ), + ( ## testing that enabling and disabling None returns is handled properly in both cases + NIHCACTUSQueryStrategy, + ChemDBQueryExample( + 'inchi', + 'bogus-name', # this is obviously fake and should not return anything + namespace='name', + keep_first_only=True, + allow_null_return=True, + ), + None + ), + pytest.param( + NIHCACTUSQueryStrategy, + ChemDBQueryExample( + 'inchi', + 'bogus-name', # this is obviously fake and should not return anything + namespace='name', + keep_first_only=True, + allow_null_return=False, + ), + None, + marks=pytest.mark.xfail( + raises=(NullPropertyResponse, ChemicalDataQueryFailed), + reason='Did not allow response to be NoneType', + strict=True, + ) + ), + pytest.param( ## testing that invalid property values are caught before attempting a query + NIHCACTUSQueryStrategy, + ChemDBQueryExample( + 'in_no_way_a_valid_property', # this should not even be considered a valid property + 'benophenone', + namespace='name', + keep_first_only=True, + allow_null_return=False, + ), + None, + marks=pytest.mark.xfail( + raises=(InvalidPropertyError, ChemicalDataQueryFailed), + reason='Tried to query a property that does not exist', + strict=True, + ) + ), + + # for PubChem + ( ## simple queries known to work for all services + PubChemQueryStrategy, + ChemDBQueryExample( + 'iupac_name', + 'CCO', + namespace='smiles', + keep_first_only=True, + allow_null_return=False + ), + 'ethanol' + ), + ( + PubChemQueryStrategy, + ChemDBQueryExample( + 'inchi', + 'N-methylformamide', + namespace='name', + keep_first_only=True, + allow_null_return=False, + ), + 'InChI=1S/C2H5NO/c1-3-2-4/h2H,1H3,(H,3,4)', + ), + ( ## testing that different namespaces can be queried + PubChemQueryStrategy, + ChemDBQueryExample( + 'MolecularWeight', + 'InChI=1S/C2H5NO/c1-3-2-4/h2H,1H3,(H,3,4)', + namespace='inchi', + keep_first_only=True, + allow_null_return=False, + ), + '59.07', + ), + ( ## testing that returns with multiple data values work + PubChemQueryStrategy, + ChemDBQueryExample( + 'HeavyAtomCount', + 'CCO', + namespace='smiles', + keep_first_only=False, + allow_null_return=False, + ), + [3], # note that this is wrapped in a list, as are all PubChem queries by deualt; I couldn't find a good example which returns more than one value like cirpy does + ), + pytest.param( ## testing sending malformed queries to PubChem + PubChemQueryStrategy, + ChemDBQueryExample( + 'inchi', + 'bogus-name', # this is obviously fake and should not return anything + namespace='smiles', + keep_first_only=True, + allow_null_return=True, + ), + None, + marks=pytest.mark.xfail( + raises=(HTTPError, ChemicalDataQueryFailed), + reason='Invalid request sent to PubChem (queried a name as a SMILES string)', + strict=True, + ) + ), + ( ## testing that enabling and disabling None returns is handled properly in both cases + PubChemQueryStrategy, + ChemDBQueryExample( + 'inchi', + 'bogus-name', # this is obviously fake and should not return anything + namespace='name', + keep_first_only=True, + allow_null_return=True, + ), + None, + ), + pytest.param( + PubChemQueryStrategy, + ChemDBQueryExample( + 'inchi', + 'bogus-name', # this is obviously fake and should not return anything + namespace='name', + keep_first_only=True, + allow_null_return=False, + ), + None, + marks=pytest.mark.xfail( + raises=(NullPropertyResponse, ChemicalDataQueryFailed), + reason='Did not allow response to be NoneType', + strict=True, + ) + ), + pytest.param( ## testing that invalid property values are caught before attempting a query + PubChemQueryStrategy, + ChemDBQueryExample( + 'in_no_way_a_valid_property', # this should not even be considered a valid property + 'benophenone', + namespace='name', + keep_first_only=True, + allow_null_return=False, + ), + None, + marks=pytest.mark.xfail( + raises=(InvalidPropertyError, ChemicalDataQueryFailed), + reason='Tried to query a property that does not exist', + strict=True, + ) + ), +] + +@pytest.mark.parametrize('service_type,query_example,expected_return', EXAMPLES_BY_SERVICE) +class TestChemicalDatabaseServiceQueries: + def test_direct_service_property_query(self, service_type : ChemDBQueryExample, query_example : ChemDBQueryExample, expected_return : Any) -> None: + '''Test if a chemical database query through a given service is executed completely and returns the expected result''' + # prechecks to skip tests + if not CHEMDB_STRATEGY_DEPENDENCIES_MET[service_type]: + pytest.skip(f'{service_type.service_name} is missing Python dependencies') + + if not CHEMDB_STRATEGY_ONLINE[service_type]: + pytest.skip(f'{service_type.service_name} cannot be connected to') + + # initialize and query service + service = service_type() + assert service.get_property(**asdict(query_example)) == expected_return + + def test_get_chemical_property_wrapper(self, service_type : ChemDBQueryExample, query_example : ChemDBQueryExample, expected_return : Any) -> None: + '''Test that requests filtered through the get_chemical_properties() strategy wrapper are executed faithfully''' + # prechecks to skip tests + if not CHEMDB_STRATEGY_DEPENDENCIES_MET[service_type]: + pytest.skip(f'{service_type.service_name} is missing Python dependencies') + + if not CHEMDB_STRATEGY_ONLINE[service_type]: + pytest.skip(f'{service_type.service_name} cannot be connected to') + + # call get_chemical_property wrapper + # try: + print(asdict(query_example)) + assert get_chemical_property(**asdict(query_example), services=[service_type], fail_quietly=False) == expected_return # CRUCIAL that fail_quietly be False; rely on exceptions to match with xfails + # except ChemicalDataQueryFailed: + # print('foo') + + +# @pytest.mark.skipif(not CHEMDB_STRATEGIES_TO_TEST, reason=f'No chemical data server(s) is not available') +# def test_get_chemical_property() -> None: +# a = 4 \ No newline at end of file