Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Integrate OpenFF / OpenFE protein loaders so that entire system has Molecule representations #1182

Open
jchodera opened this issue Apr 18, 2023 · 2 comments

Comments

@jchodera
Copy link
Member

We should try to integrate these loaders so that we can eventually use tools like OpenFF or Espaloma to parameterize the receptor/biomolecule as well as the ligand.

@richardjgowers
Copy link

Just to leave some breadcrumbs, the openfe approach is to try and follow the PDB-recommended route of matching residue-by-residue mmcif templates to the raw file to assign bond orders, aromaticity & formal charges. The repo for this is here: https://github.com/OpenFreeEnergy/pdbinf

An example for loading CDK2 (which features a nonstandard residue) is here: https://github.com/OpenFreeEnergy/pdbinf/blob/main/notebooks/tpo_load.ipynb

I've also played around with questions of, if the monomer has an incorrect label or the atoms have incorrect labels, can you still find/apply the correct template: https://github.com/OpenFreeEnergy/pdbinf/blob/main/notebooks/tpo_guessing_demo.ipynb

It should currently handle standard AAs, RNA, DNA and if you download the chemical component dictionary (or any template) anything which is a standard nonstandard component. This is all still hinging on the residues being correctly delimited, if for example you had a cap that had been merged with the neighbouring residue this wouldn't be handled well.

The OpenFF approach is to provide SMARTS templates + atom names to Topology.from_from_polymer_PDB and doesn't require correct monomer delimiting. Cons are that it (currently) doesn't have a way to create these templates and I think the performance is slower as it's not (ab)using the presence of residues to load molecules.

@jchodera
Copy link
Member Author

Just to leave some breadcrumbs, the openfe approach is to try and follow the PDB-recommended route of matching residue-by-residue mmcif templates to the raw file to assign bond orders, aromaticity & formal charges. The repo for this is here: https://github.com/OpenFreeEnergy/pdbinf

@richardjgowers : As I understand it, the mmcif templates are fully protonated forms of the non-polymeric (non-residue) form of each residue, meaning matching must be done based on canonical residue and atom names. Is this the strategy that OpenFE uses?

This is a PDB-recommended approach, but quickly breaks down when you are dealing with molecules not currently in the chemical component dictionary, like small molecules of interest. In this case, there may be no canonical naming for the entities.

Could you elaborate on the philosophy behind this approach that would enable someone to deal with small molecules or polymeric residues not currently in the CCD? Is the expectation that the user will provide a local set of additions to the chemical component dictionary, establishing their own canonical residue and atom naming schemes that do not conflict with the official PDB CCD? What happens if the PDB updates to include residue names that clash?

I don't think this is a bad approach, but I'd love to better understand how the workflow is envisioned to be usable even under ideal circumstances before diving down into the technical details.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants