Additions for version 3 #92

peastman · 2024-02-01T19:04:52Z

SPICE 2 is close to finished, which means this is a good time to discuss possible additions for version 3. Here are some ideas to get things started.

Nonbonded interactions between amino acids. SPICE 2 has a lot of data on nonbonded interactions, but nothing that is specifically for pairs of amino acids. That's very important for proteins. We could base it on PDB structures. Select pairs of amino acids that are close together in space but not bonded to each other, cap them, and calculate their interaction.
Dihedral scans or transition states. I'm not sure whether this is needed or not. We can test it by fitting a model to SPICE 2, calculating energies for some of the OpenFF dihedral scans, and see how well it does at the barrier heights.
Active learning. We can use this to identify new conformations for existing molecules that would improve accuracy in poorly sampled areas of the energy surface.
Nucleic acids.
Lipids. I'm thinking of this mainly in the context of simulating membrane proteins.
Enamine building blocks. More chemical diversity is always useful. This didn't happen for version 2. Maybe for version 3?
Breaking/forming bonds. All our current data only has fully formed bonds. We don't have any data about barriers during bond formation. If we decide to go in this direction, a good place to start might be detaching hydrogens. This would be useful for simulating constant pH. For example, we could use the DES monomers. For every hydrogen, move it away from the atom it's bonded to in 0.1 A steps until it is fully detached.

peastman · 2024-02-01T19:12:50Z

Responding to a few comments from #67:

Splinter a dataset of protien-ligand interactions.

SPICE 2 includes a collection of amino acid / ligand pairs (#72).

Inspired by the AIMNET2 dataset we could also include molecules with As and Se from PubChem although I have no idea how common they will be.

I think they're pretty rare in drugs? Se does occasionally appear in proteins, though it's rare enough that protein force fields don't usually include it.

pavankum · 2024-02-05T17:49:29Z

Database of 4 Million Medicinal Chemistry-Relevant Ring Systems. The abstract says "99.2% of these rings are novel and not included in molecules in the ChEMBL or PubChem databases.", so I thought this might be useful to fill any gaps in chemical space.

jchodera · 2024-02-25T21:14:14Z

We included ligand:amino acid pairs in SPICE 2 (#72), but I just came across this preprint that mentions two interesting datasets:

The Splinter dataset is a collection of approximately 1.7 mil- lion systematically generated protein-ligand fragment dimers and interaction energies computed using many-body SAPT based on a Hartree-Fock (HF) representation of monomers (i.e. SAPT0). These and all other SAPT computations carried out in this work are performed in an aug-cc-pV(D+d)Z basis set (abbreviated aDZ), which yields good error cancellation.
SAPT-PDB-13K: A diverse, realistic dataset of dimers. The 13,216 dimers in SAPT-PDB-13K consist of an entire ligand interacting with one or two capped amino acids. The protein and ligand geometries are taken from crystallographic Protein Data Bank (PDB) entries, making them meaningful and practical test cases

The SAPT-PDB-13K seems to be pending deposition somewhere:
"Electronic Supplementary Information (ESI) available: SAPT0 interaction energies and Cartesian coordinates of the 13,216 validation set dimers and nine protein- ligand matched pairs. See DOI: 00.0000/00000000"

peastman mentioned this issue Feb 1, 2024

Additions for version 2 #67

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Additions for version 3 #92

Additions for version 3 #92

peastman commented Feb 1, 2024

peastman commented Feb 1, 2024

pavankum commented Feb 5, 2024

jchodera commented Feb 25, 2024

Additions for version 3 #92

Additions for version 3 #92

Comments

peastman commented Feb 1, 2024

peastman commented Feb 1, 2024

pavankum commented Feb 5, 2024

jchodera commented Feb 25, 2024