A novel method to generate 3D polymorphic molecular conformers based on geometry restrictions and modular design
- Junjie Yu
- 2024年4月5日
- 讀畢需時 12 分鐘
Introduction
A conformer is a distinct conformation, i.e., the reasonable arrangement of atoms of a molecule in three-dimensional spatial coordinates. The chemical, biological, and physical properties of a molecule, such as steric constraints, charge density, docking poses [Mark McGann. Fred pose prediction and virtual screening accuracy. Journal of chemical information and modeling, 51(3):578–596, 2011.], shape similarity [Ashutosh Kumar and Kam YJ Zhang. Advances in the development of shape similarity methods and their application in drug discovery. Frontiers in chemistry, 6:315, 2018.], and pharmacophore searching [Christof H Schwab. Conformations and 3d pharmacophore searching. Drug Discovery Today: Technologies, 7(4): e245–e253, 2010.] and so on, are the combination of the properties of the molecule’s conformers accessible at the temperature of the study. Therefore, determining reasonable molecular conformers is very important for studing structure-activity relationship (SAR) and discovering new materials and drugs. However, the approach of determining molecular conformers with experiments is time consuming and very expensive. And moreover, it is extremely difficult to carry out the the experiments under some conditions. As a consequence, computational methods to generate conformers have been developed over the past few decades, which are traditionally categorized as either stochastic or systematic (rule-based) methods.
The computational costs using the stochastic methods based on molecular dynamics or Markov chain Monte Carlo to generate conformers are also expensive, especially for large molecules. Systematic conformer generation methods based on the RR (rigid rotor) approximation was then proposed to enumerate all possible torsions of a molecule. However, the space of possible conformations increases exponentially with the molecule’s size and number of rotatable bonds, thus hindering the enumerating exploration even for relatively small molecules. Rule-based systematic methods are therefore developed and achieve state-of-the-art in commercial software such as OMEGA [Friedrich N-O, de Bruyn Kops C, Flachsenberg F, et al. Benchmarking commercial conformer ensemble generators. Journal of chemical information and modeling, 2017, 57(11): 2719-2728.][Paul CD Hawkins. Conformation generation: the state of the art. Journal of Chemical Information and Modeling, 57(8):1747–1756, 2017.]. The methods limit the conformational space and explore conformers using a knowledge base such as allowed torsion angles, allowed paths, a library of 3D fragment conformations etc. Thus, the size of the conformational space for a molecule is reduced.
However, the rule-based conformation generating methods highly dependent on the knowledge base, whose allowed torsion angles is usually derived by analyzing torsions in solid-state structures from some data bank such as Protein Data Bank (PDB) [Hawkins, P. C. D.; Skillman, A. G.; Warren, G. L.; Ellingson, B. A.; Stahl, M. T. Conformer Generation With OMEGA: Algorithm and Validation Using High Quality Structures from the Protein Data Bank and Cambridge Structural Database. J. Chem. Inf. Model. 2010, 50, 572−584], Cambridge Structural Database (CSD) [Allen, F. H. The Cambridge Structural Database: A Quarter of a Million Crystal Structures and Rising. Acta Crystallogr., Sect. B: Struct. Sci. 2002, B58, 380−388] and so on. And the data in the banks are usually required to be filtrated or modified if some conformers are energetically inaccessible [Guba, W.; Meyder, A.; Rarey, M.; Hert, J. Torsion Library Reloaded: a New Version of Expert-Derived SMARTS Rules For Assessing Conformations of Small Molecules. J. Chem. Inf. Model. 2016, 56, 1−5.]. Moreover, torsion angles in the corresponding fragments are mostly varied independently. If the designer of the conformer generating algorithms does not explicitly know the global interactions between the fragments, it is difficult to generate accurate and reasonable conformers of larger and more flexible molecules. Furthermore, the rules and curated fragments are inadequate for more challenging cases (e.g. transition states or open-shell molecules) [Ganea O, Pattanaik L, Coley C, et al. Geomol: Torsional geometric generation of molecular 3d conformer ensembles. Advances in Neural Information Processing Systems, 2021, 34: 13757-13769.]. Besides, either stochastically or rule-based conformation generating methods need to score and rank the conformations to screen most possible ones using various scoring manners such as electronic structure-based, knowledge- or probability-based methods, or electronic structure methods [Rossi, M.; Chutia, S.; Scheffler, M.; Blum, V. Validation Challenge of Density-Functional Theory for Peptides - Example of Ac-Phe-Ala5-LysH+. J. Phys. Chem. A 2014, 118, 7349−7359].
Both stochastic and systematic methods can be combined with Distance Geometry (DG) techniques to generate the initial 3D conformation. The 3D atom distance matrix is firstly generated from a specialized model or based on a set of constraints. The corresponding 3D atom coordinates are subsequently learned to approximately match the predicted distances. [T. F. Havel, I. D. Kuntz, and G. M. Crippen, The Theory and Practice of Distance Geometry, Bull. Math. Biol., 45,665 (1983)][Havel T F. Distance geometry: Theory, algorithms, and chemical applications. Encyclopedia of Computational Chemistry, 1998, 120: 723-742.][Timothy F Havel, Gordon M Crippen, Irwin D Kuntz, and Jeffrey M Blaney. The combinatorial distance geometry method for the calculation of molecular conformation ii. sample problems and computational statistics. Journal of theoretical biology, 104(3):383–400, 1983a.][Timothy F Havel, Irwin D Kuntz, and Gordon M Crippen. The combinatorial distance geometry method for the calculation of molecular conformation. i. a new approach to an old problem. Journal of theoretical biology, 104(3):359–381, 1983b] Riniker and Landrum, 2015]. Although distance geometry generates structures rapidly in a broad conformational sampling space, it does not preferentially provide low energy structures. Force field (FF) energy function minimization has usually to be applied to refine structures with a limited range of sampling [Blaney J M, Dixon J S. Distance geometry in molecular modeling. Reviews in computational chemistry, 1994: 299-335.][Lagorce, D., Pencheva, T., Villoutreix, B. O., and Miteva, M. A. DG-AMMOS: A New tool to generate 3D conformation of small molecules using Distance Geometry and Automated Molecular Mechanics Optimization for in silico Screening. BMC Chem. Biol., 9:6, 2009. doi: 10.1186/1472-6769-9-6.]. Since FFs are crude approximations of the true molecular potential energy surface and the energy optimization is relatively slow [Ilana Y Kanal, John A Keith, and Geoffrey R Hutchison. A sobering assessment of small-molecule force field methods for low energy conformer predictions. International Journal of Quantum Chemistry, 118(5): e25512, 2018.], the error accumulation in the process of refining structures is increased and the computational cost of DG-based methods is still huge.
Therefore, many commercial software such as OMEGA, ConfGen, iCon, MOE Stochastic and MED-3DMC etc. derived from above mentioned conformer generating methods [Friedrich N-O, de Bruyn Kops C, Flachsenberg F, et al. Benchmarking commercial conformer ensemble generators. Journal of chemical information and modeling, 2017, 57(11): 2719-2728.][Sperandio O, Souaille M, Delfaud F, et al. MED-3DMC: A new tool to generate 3D conformation ensembles of small molecules with a Monte Carlo sampling of the conformational space. European journal of medicinal chemistry, 2009, 44(4): 1405-1409.] [***] usually can only provide accurate conformations of small molecules. With the incresing of moelcular sizes, the computauinal amount of refining or screening molecular conformations exponentially growths. For the large molecules comprising many rotatable bonds that result in flexible conformation changes and many feasible conformations in nature, generating their conformations which are stable and actual presence remains very challenging [Xu M, Luo S, Bengio Y, et al. Learning neural generative dynamics for molecular conformation generation. arXiv preprint arXiv:210210240, 2021.].
The development of artificial intelligence (AI)technique in recent decades provide new approaches for designing molecular conformations based on data bases. Some remarkable achievememnts have been obtained. For example, Google Co. [Jumper J, Evans R, Pritzel A, et al. Highly accurate protein structure prediction with AlphaFold. Nature, 2021, 596(7873): 583-589.]developed a neural network model of AlphaFold. The coordinates of heavy atoms in proteins predicted by the AlphaFold have the accuracy, which are comparable to experimental results in most cases and are greatly better than the accuracy of other methods. The latest version of AlphaFold integrates the physical and biological knowledges on proteins and applies multiple management arrangement into designing deep-learning algorithms. The machine learning method GEOMOL advanced by MIT (Massachusetts Institute of Technology)[Ganea O, Pattanaik L, Coley C, et al. Geomol: Torsional geometric generation of molecular 3d conformer ensembles. Advances in Neural Information Processing Systems, 2021, 34: 13757-13769.] can generate 3D conformation distribution with low energy, and predict 3D structures and tortion angels of local atoms. It avoids unnecessary hyperparameterization. In most occations, GEOMOL is advantaged to the popular open-source algorithms, algorithms used in commercial softwares or newest machine-learning (ML) models from the points of prediction accuracy and operation speed. Baidu Co.[Fang X, Liu L, Lei J, et al. Geometry-enhanced molecular representation learning for property prediction. Nature Machine Intelligence, 2022, 4(2): 127-134.]also achieved breakthrough developments in the fields of applying graph newrual network (GNN) to simulate 3D conformations of molecules. A novel geometry-enhanced molecular representation learning method (GEM) including several dedicated geometry-level self-supervised learning strategies was designed based on GNN architecture to learn the molecular geometry knowledge. The results of using GEM to evaluate molecular properties of different databases show that the prediction of GEM for many basic data significantly exceed comparative benchmarks.
Whethor traditional molecular conformation calculation methods or molecular conformation generating algorithms based on AI highly dependent on data bases. The strategies, knowledges and rules about conformation generating are derivated from databases of known molecular structures, which were determined under some special experimental conditions. As Fraser and Murcko stated [James S. Fraser and Mark A. Murcko, Structure is beauty, but not always truth, Cell, 2024, 187: 517-520.] that a molecular structure provided by existing conformation generating algorithms or software is a model, but not always truth (i.e., not always experimental reality). Even for the ground truth structures, they also contain inaccuracies caused by the experiments that determined them. Then for the structures refined against experimental data, they have larger errors than those of the experimental data. Trying to generate molecular conformations not dependent on experimental data will avoid the above-mentioned problems.
It is well known that the conformations of a molecule are highly correlated to the system the molecule is embedded, and the temperature of the system. All the molecular structures collected in the exisiting databases, which include variety of orgnic small molecules, peptides and proteins with different levels of structure, were measured (determined) under some specific conditions, and the structure of a molecule in the databases is unique. The diversity of molecular conformations generted by the algorithms based on the databases would be limited. Besides, energy calculations of possible structures are usually implemented in the process of generating conformations. Since the freely variable conformations within a molecule, such as rotation and bend of bonds, are not sensitive to energy. i.e., different molecular conformations may have same energy. However, in the refining conformations of a molecule, the molecule is forced to the nearst energic minimum, which is not necessary of low conformation energy [Gregory V. Nikiforovich, Katalin E. KÖvér, Wei-Jun Zhang, and Garland R. Marshall, Cyclopentapeptides as Flexible Conformational Templates, J. Am. Chem. Soc., 2000, 122, 3262-3273.]. The study [Giancarlo Zanotti, Michele Saviano, Gabriella Saviano, Teodorico Tancredi, Filomena Rossi, Carlo Pedone and Ettore Benedetti, Structure of cyclic peptides: the crystal and solution conformation of cyclo(Phe-Phe-Aib-Leu-Pro), J. Pept. Res., 1998, 51, 460-466.] also verified the phenomenon. When the distances between nonbonding atoms are close to the bond-lengths between corresponding atoms, we called these kinds of conformers as molecular conformers violating geometry restrictions (MC-VGR for short). Our practice (as shown in Figure 1) illustrates that the energies of MC-VGR are usually higher than those of MC-AGR (molecular conformers according with geometry restrictions) several grades and the energy ranges of some MCs-VGR are coincide with those of some MCs-AGR. For MCs-AGR, the space relationships of the atoms accord with geometry restrictions. Therefore, their structures are stable and easily exist in reality. But for MCs-VGR, since the space relationships of the atoms violate geometry restrictions, these kinds of conformers are unstable and not exist in reality. However, the energies of some MCs-VGR can be small. When energy calculation is applied to screen reasonable molecular conformers, it is easy to find conformers with lower energy but the relationships between atoms of these structures are unreasonable, these structure with lower energy are not stable and not exist in reality.
Figure 1 Energy comparison of different conformers of a molecule
In summary, it is very necessary to explore new approach to generate MCs-AGR without depending on experimental data bases. And the new algorithms should evaluate and inspect space relationship between nobonding atoms before applying energy princople to screen conformers. Therefore energy calculations are only implemented to MCs-AGR, whose structure are stable and reasonable in view of space relationship of atoms. The calculation work amount of generating molecular conformers would be greatly reduced. New ideas and angle of views are required to realize the above aim. In present work, we proposed a strategy of generating 3D MCs-AGR based on geometry restrictions of atoms and modular design idea. And developed the code of all algorithms on MATLAB and python, which are available on internet.
Methods
Present work generates molecular conformers based on atomic geometry restrictions and the concept of modular design. We nameed the method as GMC-GR&MD for short. It is well known that a molecular structure can be divided to some functional groups. Therefore, a conformation of a molecule can be thought as one of the structures by combining these groups in different reasonable ways. Basing on this consideration, GMC- GR&MD construct molecular conformations by adding groups of a molecule to the first group in sequence. The connection of the groups is realized by reasonable translation, rotation and evaluating if the space relationship between non-bonding atoms of different groups satisfy GR. Finally various possible and reasonable conformations of a molecule are generated. The common chemical groups and cyclic structures are used as essential elements of generating MCACP in this study, and they are listed in table 1 and figure 1 of the supplemental materials. The coordinates of all atoms in each common group are calculated based on the data of bond lengths and bond angles [Aitken R A, Fodi B, Palmer M H, et al. Experimental and theoretical molecular and electronic structures of the N-oxides of pyridazine, pyrimidine and pyrazine, Tetrahedron, 2012, 68(29): 5845-5851.][Bak B, Christensen D, Dixon W B, et al. The complete structure of furan, Journal of Molecular Spectroscopy, 1962, 9: 124-129.][Bak B, Christensen D, Hansen-Nygaard L, et al. The structure of thiophene, Journal of Molecular Spectroscopy, 1961, 7(1-6): 58-63.][Berezin K, Babkov L, Kovner M. Dynamic and structural models of pyrimidine in the lowest excited singlet state, Journal of structural chemistry, 1997, 38(2): 281-286.][O'Sullivan P S, De la Vega J, Hameka H F. Calculations of bond angles in five-membered conjugated ring systems, Chemical Physics Letters, 1970, 5(9): 576-578.][Owen A N, Zdanovskaia M A, Esselman B J, et al. Semi-Experimental Equilibrium (r eSE) and Theoretical Structures of Pyridazine (o-C4H4N2), The Journal of Physical Chemistry A, 2021, 125(36): 7976-7987.] and stored in the computer for use in the process of generating conformations.
In the process of connecting the groups, a geometry restriction should be satisfied: i.e., the distances between any two non-bonding atoms of different groups must be smaller than a reasonable threshold Vr. Vr is defined as 2k times Van der Waals radius of hydrogen atoms, where k could be a figure between 1.5~2. We call the geometry restriction as Vr criterion for abbreviation. This is because if any two non-bonding atoms of two different groups are too close together, the strong repulsion between them makes the structure unstable, so that the structure are not acceptable from the point of space relationship between atoms of a molecule. Accordingly, when a new group is connected, the space relationship between non-bonding atoms of previously connected groups and the atoms in the new added group should be evaluated by Vr criterion to ensure the distances between these atoms in the obtained structure within a reasonable range. If the space relationship of any two non-bonding atoms violates the Vr criterion, we have to rotate some groups in the structure until the distances between any two non-bonding atoms of different groups are smaller than Vr. When the all groups are connected and the space relationship between non-bonding atoms of different groups satisfy the Vr criterion, the initial conformation of the molecule is generated. The research idea is illustrated in Figure 2a.
As shown in Figure 2a, the GMC-GR&MD method uses SMILES (Simplified Molecular Input Line Entry System) as the input information of molecules. For the convenience of generating molecule conformation, the information of SMILES is transformed into connective matrix that reflects connection relationship between atoms in the molecule. When the number of atoms in a molecule is n, the connective matrix of the molecule is a matrix with the dimension of . Each digit in a row of the matrix means the connection relation between the atom represented by the row and the other atoms. If the digit is 1, then the two atoms are connected. If the digit is 0, the two atoms is not connected. The diagonal elements of the matrix are the relative atomic mass of the atoms. The detail algorithm of transforming SMILES to the connective matrices is shown in figure 1 of the supplementals. Basing on the connective matrix, the sequence, types and number of functional groups of the molecule, connection relationship between groups are determined. The python codes of the transformation algorithm are available. The molecule is divided into some essential groups according to the connective relations between atoms provided by the connective matrices.
The second step of GMC-GR&MD generating initial conformation of a molecule includes connecting groups in sequence and evaluating space relationship between atoms of previously connected groups and atoms of the new added group. Starting from the second group, the matrix (Da) consisting of distances between atoms of previously connected groups and those in the new added group are calculated. When the values of non-diagonal elements in Da are all greater than Vr, it means that the distances are reasonable and the space relationship between atoms in the previously connected groups and atoms in the new added group accord with the Vr criterion. Then the next group can be connected. If some non-diagonal elements of Da are smaller than Vr, it means that the position of the new added groups is unreasonable because some atoms in the group are too close to the atoms in the previously connected groups, the interaction between these non-bonding atoms would be very strong. So that the structure is relatively unstable and impossible to exist in reality. Then the structure should be rotated until all non-diagonal elements of Da are greater than Vr. Now the rotation is stopped and the next group can be connected. Repeat the second step until all groups are connected. Then the initial conformation of the molecule according with Vr criterion is generated.
Figure 2a The sketch diagram of generating initial MC-AGR
To obtain polymorphic molecular conformations, GMA-GR&MD sets different torsion angles for each rotatable bond. The detail steps of generating polymorphic molecular conformations based on the initial conformation is shown in Figure 2b. The transformation between different conformations of a molecule is demonstrated by VMD (Visual Merchandising) program.
Figure 2b The sketch diagram of generating polymorphic MC-AGR
Comments