Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Additions for version 3 #92

Open
peastman opened this issue Feb 1, 2024 · 3 comments
Open

Additions for version 3 #92

peastman opened this issue Feb 1, 2024 · 3 comments

Comments

@peastman
Copy link
Member

peastman commented Feb 1, 2024

SPICE 2 is close to finished, which means this is a good time to discuss possible additions for version 3. Here are some ideas to get things started.

  • Nonbonded interactions between amino acids. SPICE 2 has a lot of data on nonbonded interactions, but nothing that is specifically for pairs of amino acids. That's very important for proteins. We could base it on PDB structures. Select pairs of amino acids that are close together in space but not bonded to each other, cap them, and calculate their interaction.
  • Dihedral scans or transition states. I'm not sure whether this is needed or not. We can test it by fitting a model to SPICE 2, calculating energies for some of the OpenFF dihedral scans, and see how well it does at the barrier heights.
  • Active learning. We can use this to identify new conformations for existing molecules that would improve accuracy in poorly sampled areas of the energy surface.
  • Nucleic acids.
  • Lipids. I'm thinking of this mainly in the context of simulating membrane proteins.
  • Enamine building blocks. More chemical diversity is always useful. This didn't happen for version 2. Maybe for version 3?
  • Breaking/forming bonds. All our current data only has fully formed bonds. We don't have any data about barriers during bond formation. If we decide to go in this direction, a good place to start might be detaching hydrogens. This would be useful for simulating constant pH. For example, we could use the DES monomers. For every hydrogen, move it away from the atom it's bonded to in 0.1 A steps until it is fully detached.
@peastman
Copy link
Member Author

peastman commented Feb 1, 2024

Responding to a few comments from #67:

Splinter a dataset of protien-ligand interactions.

SPICE 2 includes a collection of amino acid / ligand pairs (#72).

Inspired by the AIMNET2 dataset we could also include molecules with As and Se from PubChem although I have no idea how common they will be.

I think they're pretty rare in drugs? Se does occasionally appear in proteins, though it's rare enough that protein force fields don't usually include it.

@pavankum
Copy link
Collaborator

pavankum commented Feb 5, 2024

Database of 4 Million Medicinal Chemistry-Relevant Ring Systems. The abstract says "99.2% of these rings are novel and not included in molecules in the ChEMBL or PubChem databases.", so I thought this might be useful to fill any gaps in chemical space.

@jchodera
Copy link
Member

We included ligand:amino acid pairs in SPICE 2 (#72), but I just came across this preprint that mentions two interesting datasets:

  • The Splinter dataset is a collection of approximately 1.7 mil- lion systematically generated protein-ligand fragment dimers and interaction energies computed using many-body SAPT based on a Hartree-Fock (HF) representation of monomers (i.e. SAPT0). These and all other SAPT computations carried out in this work are performed in an aug-cc-pV(D+d)Z basis set (abbreviated aDZ), which yields good error cancellation.
  • SAPT-PDB-13K: A diverse, realistic dataset of dimers. The 13,216 dimers in SAPT-PDB-13K consist of an entire ligand interacting with one or two capped amino acids. The protein and ligand geometries are taken from crystallographic Protein Data Bank (PDB) entries, making them meaningful and practical test cases

image

image

The SAPT-PDB-13K seems to be pending deposition somewhere:
"Electronic Supplementary Information (ESI) available: SAPT0 interaction energies and Cartesian coordinates of the 13,216 validation set dimers and nine protein- ligand matched pairs. See DOI: 00.0000/00000000"

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants