Systems chemistry: using thermodynamically controlled networks to assess molecular similarity

Saggiomo, Vittorio; Hristova, Yana R; Ludlow, R Frederick; Otto, Sijbren

doi:10.1186/1759-2208-4-2

Research article
Open access
Published: 12 February 2013

Systems chemistry: using thermodynamically controlled networks to assess molecular similarity

Vittorio Saggiomo¹,
Yana R Hristova²,
R Frederick Ludlow²^nAff3 &
…
Sijbren Otto¹

Journal of Systems Chemistry volume 4, Article number: 2 (2013) Cite this article

13k Accesses
12 Citations
11 Altmetric
Metrics details

Abstract

Background

The assessment of molecular similarity is a key step in the drug discovery process that has thus far relied almost exclusively on computational approaches. We now report an experimental method for similarity assessment based on dynamic combinatorial chemistry.

Results

In order to assess molecular similarity directly in solution, a dynamic molecular network was used in a two-step process. First, a clustering analysis was employed to determine the network’s innate discriminatory ability. A classification algorithm was then trained to enable the classification of unknowns. The dynamic molecular network used in this work was able to identify thin amines and ammonium ions in a set of 25 different, closely related molecules. After training, it was also able to classify unknown molecules based on the presence or absence of an ethylamine group.

Conclusions

This is the first step in the development of molecular networks capable of predicting bioactivity based on an assessment of molecular similarity.

Background

Molecular similarity relates to the extent to which molecules have similar structures or properties. Hence, molecular similarity and any quantification of it are both strongly context dependent. Assessing molecular similarity is a key element in the drug discovery process as structural similarity is believed to be correlated to activity with respect to a given target [1–4]. However, assessing molecular similarity is not trivial. The most common approaches involve computational methods, including the use of molecular fingerprints [2], simple calculated properties such as solvent accessible surface area, number of hydrogen-bond donor and acceptor groups, etc. [5–7] or shape comparisons [8–10]. Three-dimensional methods, such as CoMFA [11] and CoMSIA [12], map favorable and unfavorable interaction regions around or onto the structure of a molecule, requiring prior knowledge of the appropriate conformations of this molecule.

We reasoned that a realistic measure of molecular similarity may be obtained by interrogating the molecules in solution experimentally. The closest to an experimental approach to analysing molecular similarity are sensing systems, where the objective is usually the detection and quantification of a specific analyte or the discrimination between different analytes. Such assays have been set up in array format [13–16] and more recently also using dynamic combinatorial chemistry [17, 18]. However, we are not aware of any examples of the use of these approaches for determining similarity.

We now report the adaptation of dynamic combinatorial chemistry for similarity assessment. The central premise of our approach is that the extent of binding of a molecule by a synthetic receptor contains information about the structure of the molecule. While binding by a single receptor will provide only very limited information, a more comprehensive description of the molecular structure may be obtainable by using a systems chemistry [19–24] approach, utilising the binding to multiple receptors. Specifically, we employed a dynamic molecular network containing a variety of potential synthetic receptors. These receptors are connected through reversible covalent bonds and therefore continuously exchange their constituent building blocks. Through work on dynamic combinatorial libraries [25–29], it is well established that dynamic molecular networks will change their composition in response to molecular recognition by an introduced effector, leading to a redistribution of the building blocks in favor of those receptors with affinity for the effector. This effect has so far mainly been exploited as a tool for identifying individual receptors and for constructing sensor networks [30–35]. We now show how such a network can make a rudimentary assessment of molecular similarity. In this approach there is no need to synthesise all the receptors separately; they are generated in one step when preparing the dynamic combinatorial library. Yet it is possible to identify the individual receptors in the mixture using LC-MS.

Results and discussion

We selected a set of amines and ammonium ions (4 28), shown in Figure 1, as effector molecules, featuring functional groups that are common in many drugs. We constructed the dynamic molecular network of potential receptors from dithiol building blocks 1-3. These building blocks feature carboxylic acid groups that can potentially recognise the amine and ammonium groups of the effectors through hydrogen bonding and electrostatic interactions. They also contain aromatic rings that may engage in hydrophobic interactions with the set of effector molecules. Each building block features two thiol groups, which can be oxidised to disulfides, giving rise to a mixture of macrocycles that can equilibrate through disulfide exchange [36–38].

Thus, exposing an equimolar solution of 1-3 (5 mM total) in borate buffer (50 mM, pH 8.0) to atmospheric oxygen for three days gave a mixture of disulfide macrocycles dominated by (1)(3)₂, (3)₄, (3)₃, (1)(2)(3), (1)₂(3) and (1)₂(2)₂. We analysed the response of this small molecular network to the introduction of the individual effectors (2.5 mM) by LC-MS (Representative chromatograms are shown in Figure 2).

We determined the amplification factors (i.e. the ratio of the HPLC peak areas in the presence and absence of the effectors) for the six dominant network members and for all 25 effector molecules (Additional file 1).

The approach we took for investigating the ability of the network to assess molecular similarity is summarised in Figure 3. We first probed the network’s innate ability to discriminate between different classes of effectors using a clustering analysis. We then used the thus uncovered classification into clusters to train a classification algorithm. Finally, we tested the performance of the dynamic molecular network by challenging it with “unknowns”.

In a clustering analysis a given set of measurements is divided into two or more clusters based only on the distances of all the points in n-dimension, where n is the number of variables. Without requiring any transformation of the original dataset this analysis iteratively finds the centroids of two or more clusters, following two main rules: a centroid must be close to the largest possible number of points and, at the same time, must be far away from the other centroid(s). We used an unsupervised method, i.e. classes were assigned autonomously during the analysis without requiring any user input. The dataset composed of the amplification factors of the various receptors upon addition of different effector molecules 4-28 was subjected to a k-means clustering analysis [39]. K-means is a partitioning algorithm with a chosen number of cluster centroids k, that tries to minimise the sum of within-cluster-variances. Each object is assigned to a k midpoint on the basis of Euclidean distance [40]. This k midpoint is then recalculated based on the average of all points assigned to it. These processes are iteratively repeated until each k is at the centre of the cluster. The number of centroids starts at two and increases until a cluster with only one or two points is found. In our case already the third cluster had only two points. Two main clusters were identified: class A (consisting of effectors 4-11) and class B (consisting of effectors 12-28). Figure 4 shows a graphical representation of those clusters reduced to only two dimensions.

Inspection of the nature of the clustered molecules revealed that the network has an innate ability to discriminate the relatively thin amines and ammonium ions from a range of different amines and ammonium ions that are either more bulky or carry negative (partial) charge (Figure 1). Having established the discriminatory ability of the network, we investigated whether we could use the network for the classification of “unknown” molecules. More specifically, we investigated the possibility of using the network’s response to predict whether molecules contain the ethylamine group. Our network seemed highly suitable for this, since, with the exception of effector 9, all molecules in class A contain an ethylamine group, while, with the exception of 18, none of those in class B do.

In a classification analysis (supervised learning) unknown objects are classified based on the comparison of their variables with those of a training set with predefined classes. We opted for the use of the naïve Bayes classifier [41]. The naïve Bayes is a simple probabilistic classifier that requires a small amount of training data to estimate the parameters for the comparison. All the variables contribute independently to the assignment of an unknown object to a predetermined class. For this analysis a training dataset was set up with all the amplification factors of all effectors except one, used as unknown (4-8, 10-11 and 18 for class A and effectors 9, 12-17 and 19-28 for class B). Then the amplification factors of the unknown effector were subjected to naïve Bayesian classification analysis in Weka [42] for the class assignment. In 23 out of 25 cases the unknown was assigned to the right class (92% correct assignment). Only effectors 9 and 18 were wrongly assigned, the first was assigned to class A while the second to class B, as in the clustering experiment. This cross validation experiment establishes that the molecular network, when properly trained, is able to classify unknowns.

Conclusions

We have shown how a simple molecular network can perform a rudimentary assessment of molecular similarity and can successfully classify unknowns. To the best of our knowledge this is the first experimental approach to assess molecular similarity and these results represent the first step towards developing networks that may be able to discriminate and assess similarity of biologically active molecules and drugs and potentially predict bioactivity. However, there is still a long road ahead. In the present clustering approach the similarity that is assessed is dictated by the innate discriminatory ability of the network, and is only known after the response of the molecular network to a series of effectors has been analysed. Many more such studies on different dynamic networks are needed before we will be able to design molecular networks that will perform well in clustering molecules based on a pre-defined similarity parameter. In contrast, a classification analysis may yield useful results more readily, as the scientist can decide the parameter on which the classification should be based, whereafter the algorithm selects the data that is most discriminatory for this particular parameter. We are currently working towards this vision by using more molecular networks that exhibit increased structural diversity.

Methods

General methods

Building blocks 1[37], 2[43], 3[44], and effector 22[45] were synthesised following literature procedures. All other effectors were obtained from commercial sources and used without further purification. HPLC analysis was performed on Agilent 1050 or 1100 systems coupled to a UV detector. LC-MS analysis was performed using an Agilent XCT ion trap MSD mass spectrometer. Mass spectra (negative ion mode) were acquired in ultra-scan mode using a drying temperature of 350°C, a nebuliser pressure of 35.00 psi, drying gas flow of 9 L/min, capillary voltage 4000 V and an ICC target of 10,000 ions. Agilent Chemstation software (Rev A.10.02) and Bruker Daltonik LC/MSD Trap software 5.2 (Build 374) was used to operate the LC-MS and analyse the data. For the LC and LC-MS a Zorbax XDB-C8, 2.1 × 150 mm column was used at 40°C with a gradient (flow rate 0.2 mL/min) from 5% to 95% of acetonitrile in water (both solvents containing 0.1% of formic acid).

Dynamic network preparation and analysis

Building blocks 1-3 were dissolved together in a 50 mM borate buffer solution (pH 8.0) with a total final concentration of 5 mM. Effectors were added separately at a concentration of 2.5 mM. The libraries were stirred for 3 days and then analysed by HPLC and LC-MS.

Statistical methods

Weka (GNU GPL) ver. 3.7.1 was used on Mac OSX. A text file with all amplification factors of each effector was used as input file for the k-means clustering analysis with Weka using Euclidian distances applying the parameters: “weka. clusterers.SimpleKMeans-N2-A "weka.core.EuclideanDistance - R first-last" -I 500 -S 10”. The same file except one effector was used as training set for the naïve Bayes classification analysis. The effector removed from the training set input data was used as unknown. The standard parameters used for this analysis were: “weka.classifiers.bayes.NaiveBayes”.

References

Horvath D, Jeandenans C: Neighborhood Behavior of In Silico Structural Spaces with respect to In Vitro Activity Spaces – A Novel Understanding of the Molecular Similarity Principle in the Context of Multiple Receptor Binding Profiles. J Chem Inf Comput Sci 2003, 43: 680–690. 10.1021/ci025634z
Article CAS Google Scholar
Bender A, Glen RC: Molecular Similarity: a Key Technique in Molecular Informatics. Org Biomol Chem 2004, 2: 3204–3218. 10.1039/b409813g
Article CAS Google Scholar
Boström J, Hogner A, Schmitt S: Do Structurally Similar Ligands Bind in a Similar Fashion? J Med Chem 2006, 49: 6716–6725. 10.1021/jm060167o
Article Google Scholar
Eckert A, Bajorath J: Molecular Similarity Analysis in Virtual Screening: Foundations, Limitations, and novel Approaches. Drug Discov Today 2007, 12: 225–233. 10.1016/j.drudis.2007.01.011
Article CAS Google Scholar
Martin EJ, Blaney JM, Siani MA, Spellmeyer DC, Wong AK, Moos WH: Measuring Diversity: Experimental Design of Combinatorial Libraries for Drug Discovery. J Med Chem 1995, 38: 1431–1436. 10.1021/jm00009a003
Article CAS Google Scholar
Amin EA, Welsh WJ: A preliminary in Silico Lead Series of 2-phthalimidinoglutaric Acid Analogues Sesigned as MMP-3 Inhibitors. J Chem Inf Model 2006, 46: 2104–2109. 10.1021/ci0601362
Article CAS Google Scholar
Jennings A, Tennant M: Selection of Molecules Based on Shape and Electrostatic Similarity: Proof of Concept of "Electroforms". J Chem Inf Model 2007, 47: 1829–1838. 10.1021/ci600549q
Article CAS Google Scholar
Grant JA, Gallardo MA, Pickup B: A Fast Method of Molecular Shape Comparison: a Simple Application of a Gaussian Description of Molecular Shape. J Comput Chem 1996, 17: 1653–1666. 10.1002/(SICI)1096-987X(19961115)17:14<1653::AID-JCC7>3.0.CO;2-K
Article CAS Google Scholar
Rush TS, Grant JS, Mosyak L, Nicholls A: A Shape-Based 3-D Scaffold Hopping Method and Its Application to a Bacterial Protein−Protein Interaction. J Med Chem 2005, 48: 1489–1495. 10.1021/jm040163o
Article CAS Google Scholar
Ballester PG, Richards WG: Ultrafast Shape Recognition to Search Compound Databases for Similar Molecular Shapes. J Comput Chem 2007, 28: 1711–1723. 10.1002/jcc.20681
Article CAS Google Scholar
Cramer RD III, Patterson DE, Bunce JD: Comparative Molecular Field Analysis (CoMFA). Effect of Shape on Binding of Steroids to Carrier Proteins. J Am Chem Soc 1988, 110: 5959–5967. 10.1021/ja00226a005
Article CAS Google Scholar
Klebe G, Abraham U, Mietzner T: Molecular Similarity Indexes in a Comparative-Analysis (Comsia) of Drug Molecules to Correlate and Predict their Biological Activity. J Med Chem 1994, 37: 4130–4146. 10.1021/jm00050a010
Article CAS Google Scholar
Janowski V, Severin K: Carbohydrate Sensing with a Metal-Based Indicator Displacement Assay. Chem Commun 2011, 47: 8521–8523. 10.1039/c1cc12232k
Article CAS Google Scholar
Shabbir SH, Joyce LA, DeCruz GM, Lynch VM, Sorey S, Anslyn EV: Pattern-Based Recognition for the Rapid Determination of Identity, Concentration and Enantiomeric Excess of Subtly Different Diols. J Am Chem Soc 2009, 131: 13125–13131. 10.1021/ja904545d
Article CAS Google Scholar
Hewage HS, Anslyn EV: Pattern-Based Recognition of Thiols and Metals Using a Single Squarane Indicator. J Am Chem Soc 2009, 131: 13099–13106. 10.1021/ja904045n
Article CAS Google Scholar
Nguyen BT, Anslyn EV: Indicator- Displacement Assays. Coord Chem Rev 2005, 250: 3118–3127. and refs therein
Article Google Scholar
Rochat S, Severin K: Pattern-Based Sensing with Metal−Dye Complexes: Sensor Arrays versus Dynamic Combinatorial Libraries. J Comb Chem 2010, 12: 595–599. 10.1021/cc1000727
Article CAS Google Scholar
Montenegro J, Bonvin P, Takeuchi T, Matile S: Dynamic Octopus Amphiphiles as Powerful Activators of DNA Transporters: Differential Fragrance Sensing and Beyond. Chem Eur J 2010, 16: 14159–14166. 10.1002/chem.201001352
Article CAS Google Scholar
Whitesides GM, Ismagilov RF: Complexity in Chemistry. Science 1999, 284: 89–92. 10.1126/science.284.5411.89
Article CAS Google Scholar
Ludlow RF, Otto S: Systems Chemistry. Chem Soc Rev 2008, 37: 101–108. 10.1039/b611921m
Article CAS Google Scholar
Peyralans JJP, Otto S: Recent Highlights in Systems Chemistry. Curr Opin Chem Biol 2009, 13: 705–713. 10.1016/j.cbpa.2009.08.006
Article CAS Google Scholar
Nitschke JR: Systems Chemistry: Molecular Networks Come of Age. Nature 2009, 462: 736–738. 10.1038/462736a
Article CAS Google Scholar
Gibb BC: Teetering Towards Chaos and Complexity. Nat Chem 2009, 1: 17–18. 10.1038/nchem.148
Article CAS Google Scholar
von Kiedrowski G, Otto S, Herdewijn P: Welcome Home, Systems Chemists! J Syst Chem 2010, 1: 1–16. 10.1186/1759-2208-1-1
Article Google Scholar
Corbett PT, Leclaire J, Vial L, West KR, Wietor J-L, Sanders JKM, Otto S: Dynamic Combinatorial Chemistry. Chem Rev 2006, 106: 3652–3711. 10.1021/cr020452p
Article CAS Google Scholar
Ladame S: Dynamic Combinatorial Chemistry: on the Road to Fulfilling the Promise. Org Biomol Chem 2008, 6: 219–226. 10.1039/b714599c
Article CAS Google Scholar
Reek JHR, Otto S: Dynamic Combinatorial Chemistry. Weinheim: Wiley-VCH; 2010.
Book Google Scholar
Miller BL: Dynamic Combinatorial Chemistry An Introduction, in Dynamic Combinatorial Chemistry: In Drug Discovery, Bioorganic Chemistry, and Materials Science. Hoboken: Wiley & Sons; 2010.
Google Scholar
Hunt RAR, Otto S: Dynamic Combinatorial Libraries: New Opportunities in Systems Chemistry. Chem Commun 2011, 47: 847–858. 10.1039/c0cc03759a
Article CAS Google Scholar
Besenius P, Cormack PAG, Ludlow RF, Otto S, Sherrington DC: Affinity Chromatography in Dynamic Combinatorial Libraries: One-Pot Amplification and Isolation of a Strongly Binding Receptor. Org Biomol Chem 2010, 8: 2414–2418. 10.1039/c000333f
Article CAS Google Scholar
Klein JM, Saggiomo V, Reck L, McPartlin M, Dan Pantoş G, Lüning U, Sanders JKM: A Remarkably Flexible and Selective Receptor for Ba2+ Amplified from a Hydrazone Dynamic Combinatorial Library. Chem Commun 2011, 47: 3371–3373. 10.1039/c0cc04863a
Article CAS Google Scholar
Buryak A, Pozdnoukhov A, Severin K: Pattern-Based Sensing of Nucleotides in Aqueous Solution with a Multicomponent Indicator Displacement Assay. Chem Commun 2007, 23: 2366–2368.
Article Google Scholar
Buryak A, Zaubitzer F, Pozdnoukhov A, Severin K: Indicator Displacement Assays as Molecular Timers. J Am Chem Soc 2008, 130: 11260–11261. 10.1021/ja8037118
Article CAS Google Scholar
Zaubitzer F, Riis-Johannessen T, Severin K: Sensing of Peptide Hormones with Dynamic Combinatorial Libraries of Metal–Dye Complexes: the Advantage of Time-Resolved Measurements. Org Biomol Chem 2009, 7: 4598–4603. 10.1039/b912400d
Article CAS Google Scholar
Montenegro J, Fin A, Matile S: Comprehensive Screening of Octopus Amphiphiles as DNA Activators in Lipid Bilayers: Implications on Transport, Sensing and Cellular Uptake. Org Biomol Chem 2011, 9: 2641–2647. 10.1039/c0ob00948b
Article CAS Google Scholar
Otto S, Furlan RLE, Sanders JKM: Dynamic Combinatorial Libraries of Macrocyclic Disulfides in Water. J Am Chem Soc 2000, 122: 12063–12064. 10.1021/ja005507o
Article CAS Google Scholar
Otto S, Furlan RLE, Sanders JKM: Selection and Amplification of Hosts from Dynamic Combinatorial Libraries of Macrocyclic Disulfides. Science 2002, 297: 590–593. 10.1126/science.1072361
Article CAS Google Scholar
West K, Bake K, Otto S: Dynamic Combinatorial Libraries of Disulfide Cages in Water. Org Lett 2005, 7: 2615–2618. 10.1021/ol0507524
Article CAS Google Scholar
Witten IH, Frank E: “Iterative distance-based clustering” in Data Mining. 2nd edition. San Francisco: Elsevier; 2005:137–138.
Google Scholar
The Euclidean distance of two points is defined as the length of the line segment connecting them
Witten IH, Frank E: “Clustering for classification” in Data Mining. 2nd edition. San Francisco: Elsevier; 2005:337–338.
Google Scholar
Hall M, Frank E, Holmes G, Pfahringer B, Reutemann P, Witten IH: The WEKA Data Mining Sofware: an Update. SIGKDD Explorations 2009, 11: 10–19. 10.1145/1656274.1656278
Article Google Scholar
Staab HA, Kirrstetter RGH: [2.2](2,7)Pyrenophan als Excimeren-Modell: Synthese und Spektroskopische Eigenschaften. Liebigs Ann Chem 1979, 886–898.
Google Scholar
Vial L, Ludlow RF, Leclaire J, Pérez-Fernández R, Otto S: Controlling the Biological Effects of Spermine Using a Synthetic Receptor. J Am Chem Soc 2006, 128: 10253–10257. 10.1021/ja062536b
Article CAS Google Scholar
Kondo Y, Uematsu R, Nakamura Y, Kusabayashi S: Empirical Analysis on the Constituent Terms of Transfer Enthalpies. J Chem Soc Faraday Trans 1 1988, 84: 111–116. 10.1039/f19888400111
Article CAS Google Scholar

Download references

Acknowledgements

We thank EPSRC, COST CM0703, COST CM1005, Marie Curie RTN DCC, The Netherlands Organization for Scientific Research (NWO) (V.S.) and P.T. Corbett for useful discussions.

Author information

R Frederick Ludlow
Present address: Astex Pharmaceuticals, 436 Cambridge Science Park, Cambridge, CB4 0QA, United Kingdom

Authors and Affiliations

Centre for Systems Chemistry, Stratingh Institute, University of Groningen, Nijenborgh 4, 9747 AG, Groningen, The Netherlands
Vittorio Saggiomo & Sijbren Otto
Department of Chemistry, University of Cambridge, Lensfield Road, Cambridge, CB2 1EW, United Kingdom
Yana R Hristova & R Frederick Ludlow

Authors

Vittorio Saggiomo
View author publications
You can also search for this author in PubMed Google Scholar
Yana R Hristova
View author publications
You can also search for this author in PubMed Google Scholar
R Frederick Ludlow
View author publications
You can also search for this author in PubMed Google Scholar
Sijbren Otto
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Sijbren Otto.

Additional information

Competing interests

The authors declare that they have no competing interests.

Authors’ contributions

R.F.L. conceived the idea, Y.R.H. prepared and analysed the molecular networks. V.S. performed the data analysis, analysed the results and wrote the paper with S.O. All authors read and approved the final manuscript.

Vittorio Saggiomo, Yana R Hristova contributed equally to this work.

Electronic supplementary material

13322_2012_23_MOESM1_ESM.pdf

Additional file 1: Supporting information contains the effector-induced amplification factors of the six receptors in the molecular network. (PDF 49 KB)

Authors’ original submitted files for images

Below are the links to the authors’ original submitted files for images.

Authors’ original file for figure 1

Authors’ original file for figure 2

Authors’ original file for figure 3

Authors’ original file for figure 4

Rights and permissions

Open Access This article is distributed under the terms of the Creative Commons Attribution 2.0 International License ( https://creativecommons.org/licenses/by/2.0 ), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Reprints and permissions

About this article

Cite this article

Saggiomo, V., Hristova, Y.R., Ludlow, R.F. et al. Systems chemistry: using thermodynamically controlled networks to assess molecular similarity. J Syst Chem 4, 2 (2013). https://doi.org/10.1186/1759-2208-4-2

Download citation

Received: 07 December 2012
Accepted: 23 January 2013
Published: 12 February 2013
DOI: https://doi.org/10.1186/1759-2208-4-2

Systems chemistry: using thermodynamically controlled networks to assess molecular similarity

Abstract

Background

Results

Conclusions

Background

Results and discussion

Conclusions

Methods

General methods

Dynamic network preparation and analysis

Statistical methods

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Competing interests

Authors’ contributions

Electronic supplementary material

13322_2012_23_MOESM1_ESM.pdf

Authors’ original submitted files for images

Authors’ original file for figure 1

Authors’ original file for figure 2

Authors’ original file for figure 3

Authors’ original file for figure 4

Rights and permissions

About this article

Cite this article

Share this article

Keywords