Systems chemistry: using thermodynamically controlled networks to assess molecular similarity
© Saggiomo et al; licensee Chemistry Central Ltd. 2013
Received: 7 December 2012
Accepted: 23 January 2013
Published: 12 February 2013
The assessment of molecular similarity is a key step in the drug discovery process that has thus far relied almost exclusively on computational approaches. We now report an experimental method for similarity assessment based on dynamic combinatorial chemistry.
In order to assess molecular similarity directly in solution, a dynamic molecular network was used in a two-step process. First, a clustering analysis was employed to determine the network’s innate discriminatory ability. A classification algorithm was then trained to enable the classification of unknowns. The dynamic molecular network used in this work was able to identify thin amines and ammonium ions in a set of 25 different, closely related molecules. After training, it was also able to classify unknown molecules based on the presence or absence of an ethylamine group.
This is the first step in the development of molecular networks capable of predicting bioactivity based on an assessment of molecular similarity.
KeywordsDynamic combinatorial chemistry Systems chemistry Molecular networks Data mining Clustering analysis
Molecular similarity relates to the extent to which molecules have similar structures or properties. Hence, molecular similarity and any quantification of it are both strongly context dependent. Assessing molecular similarity is a key element in the drug discovery process as structural similarity is believed to be correlated to activity with respect to a given target [1–4]. However, assessing molecular similarity is not trivial. The most common approaches involve computational methods, including the use of molecular fingerprints , simple calculated properties such as solvent accessible surface area, number of hydrogen-bond donor and acceptor groups, etc. [5–7] or shape comparisons [8–10]. Three-dimensional methods, such as CoMFA  and CoMSIA , map favorable and unfavorable interaction regions around or onto the structure of a molecule, requiring prior knowledge of the appropriate conformations of this molecule.
We reasoned that a realistic measure of molecular similarity may be obtained by interrogating the molecules in solution experimentally. The closest to an experimental approach to analysing molecular similarity are sensing systems, where the objective is usually the detection and quantification of a specific analyte or the discrimination between different analytes. Such assays have been set up in array format [13–16] and more recently also using dynamic combinatorial chemistry [17, 18]. However, we are not aware of any examples of the use of these approaches for determining similarity.
We now report the adaptation of dynamic combinatorial chemistry for similarity assessment. The central premise of our approach is that the extent of binding of a molecule by a synthetic receptor contains information about the structure of the molecule. While binding by a single receptor will provide only very limited information, a more comprehensive description of the molecular structure may be obtainable by using a systems chemistry [19–24] approach, utilising the binding to multiple receptors. Specifically, we employed a dynamic molecular network containing a variety of potential synthetic receptors. These receptors are connected through reversible covalent bonds and therefore continuously exchange their constituent building blocks. Through work on dynamic combinatorial libraries [25–29], it is well established that dynamic molecular networks will change their composition in response to molecular recognition by an introduced effector, leading to a redistribution of the building blocks in favor of those receptors with affinity for the effector. This effect has so far mainly been exploited as a tool for identifying individual receptors and for constructing sensor networks [30–35]. We now show how such a network can make a rudimentary assessment of molecular similarity. In this approach there is no need to synthesise all the receptors separately; they are generated in one step when preparing the dynamic combinatorial library. Yet it is possible to identify the individual receptors in the mixture using LC-MS.
Results and discussion
We determined the amplification factors (i.e. the ratio of the HPLC peak areas in the presence and absence of the effectors) for the six dominant network members and for all 25 effector molecules (Additional file 1).
Inspection of the nature of the clustered molecules revealed that the network has an innate ability to discriminate the relatively thin amines and ammonium ions from a range of different amines and ammonium ions that are either more bulky or carry negative (partial) charge (Figure 1). Having established the discriminatory ability of the network, we investigated whether we could use the network for the classification of “unknown” molecules. More specifically, we investigated the possibility of using the network’s response to predict whether molecules contain the ethylamine group. Our network seemed highly suitable for this, since, with the exception of effector 9, all molecules in class A contain an ethylamine group, while, with the exception of 18, none of those in class B do.
In a classification analysis (supervised learning) unknown objects are classified based on the comparison of their variables with those of a training set with predefined classes. We opted for the use of the naïve Bayes classifier . The naïve Bayes is a simple probabilistic classifier that requires a small amount of training data to estimate the parameters for the comparison. All the variables contribute independently to the assignment of an unknown object to a predetermined class. For this analysis a training dataset was set up with all the amplification factors of all effectors except one, used as unknown (4-8, 10-11 and 18 for class A and effectors 9, 12-17 and 19-28 for class B). Then the amplification factors of the unknown effector were subjected to naïve Bayesian classification analysis in Weka  for the class assignment. In 23 out of 25 cases the unknown was assigned to the right class (92% correct assignment). Only effectors 9 and 18 were wrongly assigned, the first was assigned to class A while the second to class B, as in the clustering experiment. This cross validation experiment establishes that the molecular network, when properly trained, is able to classify unknowns.
We have shown how a simple molecular network can perform a rudimentary assessment of molecular similarity and can successfully classify unknowns. To the best of our knowledge this is the first experimental approach to assess molecular similarity and these results represent the first step towards developing networks that may be able to discriminate and assess similarity of biologically active molecules and drugs and potentially predict bioactivity. However, there is still a long road ahead. In the present clustering approach the similarity that is assessed is dictated by the innate discriminatory ability of the network, and is only known after the response of the molecular network to a series of effectors has been analysed. Many more such studies on different dynamic networks are needed before we will be able to design molecular networks that will perform well in clustering molecules based on a pre-defined similarity parameter. In contrast, a classification analysis may yield useful results more readily, as the scientist can decide the parameter on which the classification should be based, whereafter the algorithm selects the data that is most discriminatory for this particular parameter. We are currently working towards this vision by using more molecular networks that exhibit increased structural diversity.
Building blocks 1, 2, 3, and effector 22 were synthesised following literature procedures. All other effectors were obtained from commercial sources and used without further purification. HPLC analysis was performed on Agilent 1050 or 1100 systems coupled to a UV detector. LC-MS analysis was performed using an Agilent XCT ion trap MSD mass spectrometer. Mass spectra (negative ion mode) were acquired in ultra-scan mode using a drying temperature of 350°C, a nebuliser pressure of 35.00 psi, drying gas flow of 9 L/min, capillary voltage 4000 V and an ICC target of 10,000 ions. Agilent Chemstation software (Rev A.10.02) and Bruker Daltonik LC/MSD Trap software 5.2 (Build 374) was used to operate the LC-MS and analyse the data. For the LC and LC-MS a Zorbax XDB-C8, 2.1 × 150 mm column was used at 40°C with a gradient (flow rate 0.2 mL/min) from 5% to 95% of acetonitrile in water (both solvents containing 0.1% of formic acid).
Dynamic network preparation and analysis
Building blocks 1-3 were dissolved together in a 50 mM borate buffer solution (pH 8.0) with a total final concentration of 5 mM. Effectors were added separately at a concentration of 2.5 mM. The libraries were stirred for 3 days and then analysed by HPLC and LC-MS.
Weka (GNU GPL) ver. 3.7.1 was used on Mac OSX. A text file with all amplification factors of each effector was used as input file for the k-means clustering analysis with Weka using Euclidian distances applying the parameters: “weka. clusterers.SimpleKMeans-N2-A "weka.core.EuclideanDistance - R first-last" -I 500 -S 10”. The same file except one effector was used as training set for the naïve Bayes classification analysis. The effector removed from the training set input data was used as unknown. The standard parameters used for this analysis were: “weka.classifiers.bayes.NaiveBayes”.
We thank EPSRC, COST CM0703, COST CM1005, Marie Curie RTN DCC, The Netherlands Organization for Scientific Research (NWO) (V.S.) and P.T. Corbett for useful discussions.
- Horvath D, Jeandenans C: Neighborhood Behavior of In Silico Structural Spaces with respect to In Vitro Activity Spaces – A Novel Understanding of the Molecular Similarity Principle in the Context of Multiple Receptor Binding Profiles. J Chem Inf Comput Sci 2003, 43: 680–690. 10.1021/ci025634zView ArticleGoogle Scholar
- Bender A, Glen RC: Molecular Similarity: a Key Technique in Molecular Informatics. Org Biomol Chem 2004, 2: 3204–3218. 10.1039/b409813gView ArticleGoogle Scholar
- Boström J, Hogner A, Schmitt S: Do Structurally Similar Ligands Bind in a Similar Fashion? J Med Chem 2006, 49: 6716–6725. 10.1021/jm060167oView ArticleGoogle Scholar
- Eckert A, Bajorath J: Molecular Similarity Analysis in Virtual Screening: Foundations, Limitations, and novel Approaches. Drug Discov Today 2007, 12: 225–233. 10.1016/j.drudis.2007.01.011View ArticleGoogle Scholar
- Martin EJ, Blaney JM, Siani MA, Spellmeyer DC, Wong AK, Moos WH: Measuring Diversity: Experimental Design of Combinatorial Libraries for Drug Discovery. J Med Chem 1995, 38: 1431–1436. 10.1021/jm00009a003View ArticleGoogle Scholar
- Amin EA, Welsh WJ: A preliminary in Silico Lead Series of 2-phthalimidinoglutaric Acid Analogues Sesigned as MMP-3 Inhibitors. J Chem Inf Model 2006, 46: 2104–2109. 10.1021/ci0601362View ArticleGoogle Scholar
- Jennings A, Tennant M: Selection of Molecules Based on Shape and Electrostatic Similarity: Proof of Concept of "Electroforms". J Chem Inf Model 2007, 47: 1829–1838. 10.1021/ci600549qView ArticleGoogle Scholar
- Grant JA, Gallardo MA, Pickup B: A Fast Method of Molecular Shape Comparison: a Simple Application of a Gaussian Description of Molecular Shape. J Comput Chem 1996, 17: 1653–1666. 10.1002/(SICI)1096-987X(19961115)17:14<1653::AID-JCC7>3.0.CO;2-KView ArticleGoogle Scholar
- Rush TS, Grant JS, Mosyak L, Nicholls A: A Shape-Based 3-D Scaffold Hopping Method and Its Application to a Bacterial Protein−Protein Interaction. J Med Chem 2005, 48: 1489–1495. 10.1021/jm040163oView ArticleGoogle Scholar
- Ballester PG, Richards WG: Ultrafast Shape Recognition to Search Compound Databases for Similar Molecular Shapes. J Comput Chem 2007, 28: 1711–1723. 10.1002/jcc.20681View ArticleGoogle Scholar
- Cramer RD III, Patterson DE, Bunce JD: Comparative Molecular Field Analysis (CoMFA). Effect of Shape on Binding of Steroids to Carrier Proteins. J Am Chem Soc 1988, 110: 5959–5967. 10.1021/ja00226a005View ArticleGoogle Scholar
- Klebe G, Abraham U, Mietzner T: Molecular Similarity Indexes in a Comparative-Analysis (Comsia) of Drug Molecules to Correlate and Predict their Biological Activity. J Med Chem 1994, 37: 4130–4146. 10.1021/jm00050a010View ArticleGoogle Scholar
- Janowski V, Severin K: Carbohydrate Sensing with a Metal-Based Indicator Displacement Assay. Chem Commun 2011, 47: 8521–8523. 10.1039/c1cc12232kView ArticleGoogle Scholar
- Shabbir SH, Joyce LA, DeCruz GM, Lynch VM, Sorey S, Anslyn EV: Pattern-Based Recognition for the Rapid Determination of Identity, Concentration and Enantiomeric Excess of Subtly Different Diols. J Am Chem Soc 2009, 131: 13125–13131. 10.1021/ja904545dView ArticleGoogle Scholar
- Hewage HS, Anslyn EV: Pattern-Based Recognition of Thiols and Metals Using a Single Squarane Indicator. J Am Chem Soc 2009, 131: 13099–13106. 10.1021/ja904045nView ArticleGoogle Scholar
- Nguyen BT, Anslyn EV: Indicator- Displacement Assays. Coord Chem Rev 2005, 250: 3118–3127. and refs thereinView ArticleGoogle Scholar
- Rochat S, Severin K: Pattern-Based Sensing with Metal−Dye Complexes: Sensor Arrays versus Dynamic Combinatorial Libraries. J Comb Chem 2010, 12: 595–599. 10.1021/cc1000727View ArticleGoogle Scholar
- Montenegro J, Bonvin P, Takeuchi T, Matile S: Dynamic Octopus Amphiphiles as Powerful Activators of DNA Transporters: Differential Fragrance Sensing and Beyond. Chem Eur J 2010, 16: 14159–14166. 10.1002/chem.201001352View ArticleGoogle Scholar
- Whitesides GM, Ismagilov RF: Complexity in Chemistry. Science 1999, 284: 89–92. 10.1126/science.284.5411.89View ArticleGoogle Scholar
- Ludlow RF, Otto S: Systems Chemistry. Chem Soc Rev 2008, 37: 101–108. 10.1039/b611921mView ArticleGoogle Scholar
- Peyralans JJP, Otto S: Recent Highlights in Systems Chemistry. Curr Opin Chem Biol 2009, 13: 705–713. 10.1016/j.cbpa.2009.08.006View ArticleGoogle Scholar
- Nitschke JR: Systems Chemistry: Molecular Networks Come of Age. Nature 2009, 462: 736–738. 10.1038/462736aView ArticleGoogle Scholar
- Gibb BC: Teetering Towards Chaos and Complexity. Nat Chem 2009, 1: 17–18. 10.1038/nchem.148View ArticleGoogle Scholar
- von Kiedrowski G, Otto S, Herdewijn P: Welcome Home, Systems Chemists! J Syst Chem 2010, 1: 1–16. 10.1186/1759-2208-1-1View ArticleGoogle Scholar
- Corbett PT, Leclaire J, Vial L, West KR, Wietor J-L, Sanders JKM, Otto S: Dynamic Combinatorial Chemistry. Chem Rev 2006, 106: 3652–3711. 10.1021/cr020452pView ArticleGoogle Scholar
- Ladame S: Dynamic Combinatorial Chemistry: on the Road to Fulfilling the Promise. Org Biomol Chem 2008, 6: 219–226. 10.1039/b714599cView ArticleGoogle Scholar
- Reek JHR, Otto S: Dynamic Combinatorial Chemistry. Weinheim: Wiley-VCH; 2010.View ArticleGoogle Scholar
- Miller BL: Dynamic Combinatorial Chemistry An Introduction, in Dynamic Combinatorial Chemistry: In Drug Discovery, Bioorganic Chemistry, and Materials Science. Hoboken: Wiley & Sons; 2010.Google Scholar
- Hunt RAR, Otto S: Dynamic Combinatorial Libraries: New Opportunities in Systems Chemistry. Chem Commun 2011, 47: 847–858. 10.1039/c0cc03759aView ArticleGoogle Scholar
- Besenius P, Cormack PAG, Ludlow RF, Otto S, Sherrington DC: Affinity Chromatography in Dynamic Combinatorial Libraries: One-Pot Amplification and Isolation of a Strongly Binding Receptor. Org Biomol Chem 2010, 8: 2414–2418. 10.1039/c000333fView ArticleGoogle Scholar
- Klein JM, Saggiomo V, Reck L, McPartlin M, Dan Pantoş G, Lüning U, Sanders JKM: A Remarkably Flexible and Selective Receptor for Ba2+ Amplified from a Hydrazone Dynamic Combinatorial Library. Chem Commun 2011, 47: 3371–3373. 10.1039/c0cc04863aView ArticleGoogle Scholar
- Buryak A, Pozdnoukhov A, Severin K: Pattern-Based Sensing of Nucleotides in Aqueous Solution with a Multicomponent Indicator Displacement Assay. Chem Commun 2007, 23: 2366–2368.View ArticleGoogle Scholar
- Buryak A, Zaubitzer F, Pozdnoukhov A, Severin K: Indicator Displacement Assays as Molecular Timers. J Am Chem Soc 2008, 130: 11260–11261. 10.1021/ja8037118View ArticleGoogle Scholar
- Zaubitzer F, Riis-Johannessen T, Severin K: Sensing of Peptide Hormones with Dynamic Combinatorial Libraries of Metal–Dye Complexes: the Advantage of Time-Resolved Measurements. Org Biomol Chem 2009, 7: 4598–4603. 10.1039/b912400dView ArticleGoogle Scholar
- Montenegro J, Fin A, Matile S: Comprehensive Screening of Octopus Amphiphiles as DNA Activators in Lipid Bilayers: Implications on Transport, Sensing and Cellular Uptake. Org Biomol Chem 2011, 9: 2641–2647. 10.1039/c0ob00948bView ArticleGoogle Scholar
- Otto S, Furlan RLE, Sanders JKM: Dynamic Combinatorial Libraries of Macrocyclic Disulfides in Water. J Am Chem Soc 2000, 122: 12063–12064. 10.1021/ja005507oView ArticleGoogle Scholar
- Otto S, Furlan RLE, Sanders JKM: Selection and Amplification of Hosts from Dynamic Combinatorial Libraries of Macrocyclic Disulfides. Science 2002, 297: 590–593. 10.1126/science.1072361View ArticleGoogle Scholar
- West K, Bake K, Otto S: Dynamic Combinatorial Libraries of Disulfide Cages in Water. Org Lett 2005, 7: 2615–2618. 10.1021/ol0507524View ArticleGoogle Scholar
- Witten IH, Frank E: “Iterative distance-based clustering” in Data Mining. 2nd edition. San Francisco: Elsevier; 2005:137–138.Google Scholar
- The Euclidean distance of two points is defined as the length of the line segment connecting them Google Scholar
- Witten IH, Frank E: “Clustering for classification” in Data Mining. 2nd edition. San Francisco: Elsevier; 2005:337–338.Google Scholar
- Hall M, Frank E, Holmes G, Pfahringer B, Reutemann P, Witten IH: The WEKA Data Mining Sofware: an Update. SIGKDD Explorations 2009, 11: 10–19. 10.1145/1656274.1656278View ArticleGoogle Scholar
- Staab HA, Kirrstetter RGH: [2.2](2,7)Pyrenophan als Excimeren-Modell: Synthese und Spektroskopische Eigenschaften. Liebigs Ann Chem 1979, 886–898.Google Scholar
- Vial L, Ludlow RF, Leclaire J, Pérez-Fernández R, Otto S: Controlling the Biological Effects of Spermine Using a Synthetic Receptor. J Am Chem Soc 2006, 128: 10253–10257. 10.1021/ja062536bView ArticleGoogle Scholar
- Kondo Y, Uematsu R, Nakamura Y, Kusabayashi S: Empirical Analysis on the Constituent Terms of Transfer Enthalpies. J Chem Soc Faraday Trans 1 1988, 84: 111–116. 10.1039/f19888400111View ArticleGoogle Scholar
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.