Autocatalytic sets in a partitioned biochemical network
 Joshua I Smith^{1},
 Mike Steel^{1} and
 Wim Hordijk^{2}Email author
https://doi.org/10.1186/1759220852
© Smith et al.; licensee Chemistry Central Ltd. 2014
Received: 11 October 2013
Accepted: 23 February 2014
Published: 3 March 2014
Abstract
Background
In previous work, RAF theory has been developed as a tool for making theoretical progress on the origin of life question, providing insight into the structure and occurrence of selfsustaining and collectively autocatalytic sets within catalytic polymer networks. We present here an extension in which there are two “independent” polymer sets, where catalysis occurs within and between the sets, but there are no reactions combining polymers from both sets. Such an extension reflects the interaction between nucleic acids and peptides observed in modern cells and proposed forms of early life.
Results
We present theoretical work and simulations which suggest that the occurrence of autocatalytic sets is robust to the partitioned structure of the network. We also show that autocatalytic sets remain likely even when the molecules in the system are not polymers, and a low level of inhibition is present. Finally, we present a kinetic extension which assigns a rate to each reaction in the system, and show that identifying autocatalytic sets within such a system is an NPcomplete problem.
Conclusions
Recent experimental work has challenged the necessity of an RNA world by suggesting that peptidenucleic acid interactions occurred early in chemical evolution. The present work indicates that such a peptideRNA world could support the spontaneous development of autocatalytic sets and is thus a feasible alternative worthy of investigation.
Keywords
Background
Understanding the origin of life on Earth is an important and fascinating problem [1]. In order to shed light on the structure of early replicators and their mechanism of formation, various experimental approaches have been explored [2–5]. Due to the enormity of the task, experimental work alone seems unlikely to answer the question, and this has motivated several theoretical investigations [6–9]. While one goal of theoretical work is to accelerate experimental progress (either in topdown construction of a minimal cell [10], or the spontaneous formation of a selfreplicating protocell from abiotic precursor molecules), links between theory and experiment have been scarce. Naturally, theoretical models are simplifications of real chemistry, and while such simplification enables progress, it may limit the conversation between theorists and experimentalists until the models more accurately reflect the complexity of real biochemical systems.
The combinatorial and stochastic aspects of theoretical work on the origin of life mean mathematics has an important role to play. The intuitive analogy between sets of reacting compounds and directed graphs was the motivation for Bollobas and Rasmussen’s work on directed cycles in random graphs [11]. In previous work [8, 12–15], RAF theory has been developed as an effective tool for making progress on theoretical questions about the origin of life, based on initial work by Kauffman [7, 16]. In particular, it appears the emergence of collectively autocatalytic and selfsustaining sets of chemical reactions (RAF sets, defined later) is necessary for the origin of life to occur. Previous work has investigated the structure of such sets and the probability of their formation, leading to theoretical and empirical (simulationbased) results.
The general ideas behind RAF theory are not unique, and there are several related formalisms [6, 9, 17]. However, in some cases, questions within the RAF framework have proven tractable while an equivalent question posed within an alternative formalism has not, perhaps because of the simplicity of the RAF model. On the other hand, it has been suggested that such simplicity limits our ability to draw conclusions about “real” biochemical systems. However, the recent demonstration of the ability of RAF theory to link theoretical and experimental results [4, 18], together with the ongoing development of fresh theoretical ideas [19], suggests that this framework continues to enable progress.
In this paper, we present a biologically relevant extension to the wellstudied polymer model, formalising a network of molecules in which there are two “independent” types of polymer, which are able to catalyse each others’ (and their own) reactions, but cannot combine to form hybrid polymers. The motivation for this is the nature of the interaction between peptides and nucleic acids in the metabolic networks of modern cells. The importance of an extension addressing this mutually catalytic arrangement was highlighted in Kauffman’s 1986 paper, in note (vii) (p. 14): “An independent then symbiotic coexistence of autocatalytic protein sets and template replicative polynucleotides would obviously be useful in prebiotic evolution.” (While the present work does not address the templating ability of nucleic acids, this aspect has been studied previously [14, 20]). Moreover, this extension is highly relevant in the light of recent experimental results from Li et al. [5]. In their paper, the authors propose that interactions between polypeptides and polynucleotides occurred very early in chemical evolution, providing an alternative to the hypothesis that life began in an RNA World [21]. The authors state “The striking reciprocity of proteins and RNA in biology is consistent with our proposal: proteins exclusively catalyze nucleic acid synthesis; RNA catalyzes protein synthesis; and genetic messages are interpreted by the small ribosomal subunit, a ribonucleoprotein.” The reciprocity described here provides a clear motivation for theoretical investigation into the properties of these “symbiotic” polymer systems.
We present theoretical results showing that RAF sets are just as likely to emerge in such systems as in those previously studied [14], and it turns out that the result holds even for a more general system in which the molecules are not necessarily polymers, a small amount of inhibition is allowed, and the amount of catalysis varies freely across the reaction network. In previous work, catalysis has been assigned randomly with equal probability between each molecule and each reaction. The current work shows that RAF sets remain highly probable even under heterogenous catalysis, which is what we might expect to find in real biochemical networks.
As a step toward increased chemical realism, we introduce the concept of a kinetic chemical reaction system, in which every reaction has an associated rate, and all molecules are lost via diffusion into the environment at a constant rate. We can in principle then search for RAFs in the system (as in previous work [8]) with the additional requirement that every molecule in the RAF must be produced at least as fast as it is used up or diffuses away  we call such an RAF a kinetically viable RAF (kRAF).
Definitions
We will use the notation of Hordijk and Steel [8]. Consider a triple $(X,\mathcal{R},F)$, where

X={x_{1},x_{2},… } is a (finite) set of molecular species or molecule types;

F⊂X is a distinguished subset of molecular species known as the food set, the set of all species initially available in the environment;

$\mathcal{R}=\{{r}_{1},{r}_{2},\dots \phantom{\rule{0.3em}{0ex}}\}$ is a (finite) set of chemically allowed reactions;

Each reaction $r\in \mathcal{R}$ is an ordered pair (A,B), where A⊆X is a multiset of reactants and B⊆X is a multiset of products. We can represent a reaction as a_{1}+a_{2}+⋯+a_{ n }→b_{1}+b_{2}+…b_{ m }. Note that the reactants a_{ i } are not necessarily distinct, and neither are the products b_{ i }. Also note that reversible reactions can be modelled as two (formally) separate reactions $(A,B),(B,A)\in \mathcal{R}$.
The triple $(X,\mathcal{R},F)$ is therefore a set of molecular species together with the reactions that occur between them, intuitively visualised as a directed graph. For brevity, we will often use the term “molecule” in place of “molecular species” or “molecule type”. We also define ρ(r) to be the set of all distinct reactants of the reaction r, and π(r) to be the set of all distinct products of r. Then for any subset ${\mathcal{R}}^{\prime}$ of , $\rho \left({\mathcal{R}}^{\prime}\right):=\bigcup _{r\in {\mathcal{R}}^{\prime}}\rho \left(r\right)$ and $\pi \left({\mathcal{R}}^{\prime}\right):=\bigcup _{r\in {\mathcal{R}}^{\prime}}\pi \left(r\right)$. Another useful concept will be the support of a reaction r, supp(r):=ρ(r)∪π(r). Similarly, $\text{supp}\left({\mathcal{R}}^{\prime}\right):=\rho \left({\mathcal{R}}^{\prime}\right)\cup \pi \left({\mathcal{R}}^{\prime}\right)$ for any subset ${\mathcal{R}}^{\prime}$ of . Informally, the support of a set of reactions is the set of all molecules consumed or produced by those reactions.
The final important concept is that of the closure of the food set relative to a subset of reactions ${\mathcal{R}}^{\prime}\subseteq \mathcal{R}$, denoted ${\text{cl}}_{{\mathcal{R}}^{\prime}}\left(F\right)$ and formally defined as the minimal subset W⊆X which contains F and satisfies ρ(r)∈W⇒π(r)∈W for all $r\in {\mathcal{R}}^{\prime}$. Informally, ${\text{cl}}_{{\mathcal{R}}^{\prime}}\left(F\right)$ is the set of all molecules that can be built up from the food set using only reactions in ${\mathcal{R}}^{\prime}$ (ignoring catalysis).
Following [13], we say that a subset ${\mathcal{R}}^{\prime}$ of forms a reflexively autocatalytic and foodgenerated set (an RAF set) for provided that ${\mathcal{R}}^{\prime}$ is nonempty and that:

All the reactants of each reaction in ${\mathcal{R}}^{\prime}$ are contained in ${\text{cl}}_{{\mathcal{R}}^{\prime}}\left(F\right)$ (foodgenerated);

For each $r\in {\mathcal{R}}^{\prime}$, there exists (x,r)∈C such that $x\in {\text{cl}}_{{\mathcal{R}}^{\prime}}\left(F\right)$ (reflexively autocatalytic).
We commonly use “Fgenerated” in place of “foodgenerated”, and “RAF” in place of “RAF set”. Informally, property (i) requires that the reactions in ${\mathcal{R}}^{\prime}$ must be able to sustain themselves from the food set alone. Property (ii) requires that every reaction in ${\mathcal{R}}^{\prime}$ must be catalysed, and furthermore that the catalysts must themselves be generated from the food set by that same set of reactions.
These definitions are intended to capture properties of chemical networks that may have been important in the emergence of early replicators. Uncatalysed reactions in general proceed extremely slowly. We require catalysis so that molecules accumulate in concentrations sufficient to perform useful biochemical tasks. Otherwise, they would diffuse away before being able to play any role in the emergence of the first replicator. Moreover, not only do catalysts greatly increase the reaction rates, they also lead to an equally dramatic reduction in the variance of the rate of reactions (c.f.[22], figure six); this last feature would seem to be important for obtaining some degree of synchronicity in both early and presentday metabolism. However, to allow the catalysts to come out of nowhere would be begging the question. So in addition, we require that the reactions generate their own catalysts from the food set (the set of all molecules available in a particular environment on early Earth).
The idea of a set being Fgenerated requires that no molecules are required as reactants before they have been produced. A set that fails to be Fgenerated could never have spontaneously built itself up from the molecules available on early earth (the food set), which is clearly a necessary condition for the development of early replicators from prebiotic chemistry. Note however that while the reflexivelyautocatalytic requirement guarantees that an RAF set of reactions eventually produces a catalyst for every reaction, the definition of Fgenerated allows a reaction to proceed prior to the production of any of its catalysts. We consider this to be reasonable (and realistic) for the following reason. Reactions can proceed uncatalysed (albeit at a much lower rate), which may soon lead to the production of a catalyst for the reaction, establishing a positive feedback loop which quickly increases the rate of the reaction (consider the production of the molecule 0011 in Figure 1; this molecule is the sole catalyst for its own production). In previous work [13] we have studied a stronger type of autocatalytic set in which a catalyst must be present before a reactions can progress at all. These sets, referred to as constructively autocatalytic and Fgenerated sets (CAFs) have quite different properties to RAFs; indeed, they are less likely to appear spontaneously.
Figure 1 illustrates some ways in which a set can fail to be an RAF. The subset {r_{1},r_{2},r_{3},r_{5},r_{7}} fails to be reflexively autocatalytic (and so fails to be an RAF) since r_{7} is uncatalysed. In the subset {r_{1},r_{2},r_{3},r_{5},r_{6}} all reactions are catalysed, however the catalyst of r_{6} is outside ${\text{cl}}_{\{{r}_{1},{r}_{2},{r}_{3},{r}_{5},{r}_{6}\}}\left(F\right)$ (the reactions do not collectively generate all of their own catalysts), so this subset also fails to be reflexively autocatalytic. The subset {r_{1},r_{2},r_{3},r_{4},r_{5}} is reflexively autocatalytic (since every reaction is catalysed, and all the catalysts are in ${\text{cl}}_{\{{r}_{1},\dots ,{r}_{5}\}}\left(F\right)$), but it is not Fgenerated, since the reactant 101 of r_{4} is not in the closure set (it cannot be created from the food set by the reactions {r_{1},…,r_{5}}). However, the subset {r_{1},r_{2},r_{3},r_{5}} is an RAF. In fact, it is the largest RAF in the system, equal to the union of all RAFs in the system. Such an RAF is referred to as the maximal RAF subset or the maxRAF.
Given any catalytic reaction system $\mathcal{Q}=(X,\mathcal{R},F,C)$, there is a fast (polynomialtime) algorithm which determines whether or not contains an RAF, and if so the algorithm constructs the maxRAF [8]. We use this algorithm in section “Simulations of partitioned chemical reactions systems” to study the emergence of RAFs within simulations of the partitioned polymer system, defined in the following section.
Note that the definitions of a CRS and of an RAF do not explicitly include consideration of reaction rates or concentrations. Therefore, the RAF formalism cannot address the more specific question of whether or not a population of molecules can remain stable enough to catalyze its own growth from the food set and growth over time to allow reproduction of the set, issues that are obviously of interest in an origin of life scenario. For example, an RAF might include an exceedingly rare reaction, the rate of which could never support the growth of the system, or a very fast reaction, which depletes an essential molecule. However, this purely algebraic approach has allowed the development of several important results that would not have been easy to deduce from a more detailed model. Nonetheless, once an RAF set is discovered, it can then be checked for dynamical stability: previous work [15, 18] has involved molecular flow simulations of RAF sets using the Gillespie algorithm [23]. Also, in section “Kinetic RAF framework” we consider an extension of the formal RAF framework which does take reaction rates into account.
Partitioned polymer system
Consider a triple $(X,\mathcal{R},F)$ within the polymer model. Let X, and F be partitioned as X={X_{1},X_{2}}, $\mathcal{R}=\{{\mathcal{R}}_{1},{\mathcal{R}}_{2}\}$ and F={F_{1},F_{2}}, where

X_{1},X_{2} are disjoint sets of polymers;

F_{1}⊂X_{1} and F_{2}⊂X_{2} are disjoint sets of food molecules;

${\mathcal{R}}_{i}$ is a set of ligation and cleavage reactions such that supp$\left({\mathcal{R}}_{i}\right)\subseteq {X}_{i}$.
A partitioned CRS is now defined as a triple (partitioned as above) together with a catalysation assignment C. We will use the word module to refer to the set of molecules X_{1} together with the associated reactions ${\mathcal{R}}_{1}$, and similarly for X_{2} and ${\mathcal{R}}_{2}$. Hence, a partitioned CRS consists of two modules, and catalysis can occur both within (intramodular) and between (intermodular) the modules (the specific pattern of catalysis will depend on the nature of C). Note however that due to the condition $\text{supp}\left({\mathcal{R}}_{i}\right)\subseteq {X}_{i}$, there can be no reactions involving molecules from both X_{1} and X_{2}. We also allow X_{1} and X_{2} to be sets of polymers over different sized monomer alphabets. For example, let the size of these alphabets be k_{1} and k_{2}: then to model the interaction between a set of peptides (X_{1}) and a set of RNA polymers (X_{2}), set k_{1}=20, k_{2}=4.
Results and discussion
The probability of RAFs in general catalytic reaction systems
It was shown in [13] that for a CRS within the polymer model, the level of catalysis (expected number of reactions catalysed per molecule) necessary and sufficient to produce RAF sets with a given probability increases linearly with n, the maximum length of polymers in the system. Here we extend this result to a general CRS in which the molecules are not necessarily polymers, and we invoke slightly weaker assumptions by allowing the catalysation rates to vary between reactions; in a later section this approach also allows for a limited degree of inhibition.
For convenience, we will assume that the set of reactions is the disjoint union of two sets ${\mathcal{R}}^{+}$ and ${\mathcal{R}}^{}$, where every reaction in ${\mathcal{R}}^{+}$ is of the form a+b→c (two reactants and one product), and ${\mathcal{R}}^{}$ consists entirely of the corresponding reverse reactions c→a+b, so that $\left{\mathcal{R}}^{+}\right=\left{\mathcal{R}}^{}\right$. We refer to the reactions in ${\mathcal{R}}^{+}$ as ‘forward’ reactions. Thus pairs of corresponding reactions from ${\mathcal{R}}^{+}$ and ${\mathcal{R}}^{}$ can be considered as a single reversible reaction. We will also assume that a molecule catalyses $r\in {\mathcal{R}}^{+}$ if and only if that molecule also catalyses the corresponding $r\in {\mathcal{R}}^{}$, which reflects the reality of biological catalysis. These assumptions can be weakened, but doing so complicates slightly the statement and proofs of the results that follow, and they apply readily to the partitioned system that we study, as do the further conditions listed below.
In our generalised model we make two main assumptions concerning catalysation:

The events $\mathcal{E}(x,r)$ that molecule x catalyses (forward) reaction r are independent across all pairs $(x,r)\in X\times {\mathcal{R}}^{+}$.

For some constant K≥1, the expected number of molecular species that catalyse any reaction is at most K times the expected number of molecular species that catalyse any other reaction.
Note that (C1) allows different molecule types to catalyse different numbers of reactions in expectation, since the probability that molecule type x catalyses reaction r can vary according to both x and r (in [13] it was assumed that the probability of $\mathcal{E}(x,r)$ depends only on x, not on r).
Before stating the main result of this section, we require the following definition. We say that a triple $(X,\mathcal{R},F)$ has a species stratification if and only if there is a nested sequence α_{1}⊆α_{2}⊆⋯⊆α_{ m }=X such that the following conditions hold: (i) F=α_{ t } for some t<m; (ii) If the reaction f→a+b is in where f∈F then a and b are also elements of α_{ t }; (iii) The number of forward reactions involving any two food molecules as reactants is at most some fixed constant M; (iv) if we let X(1):=α_{1} and X(s):=α_{ s }−α_{s−1} for s∈{2,…,m} then:

The number of molecules in α_{ s } grows no faster than geometrically with s. That is, X(s)≤k^{ s } for some fixed k≥1, for all s∈{1,…,m};

Every molecule in X(s) can be constructed from molecules in α_{s−1} by a number of forward reactions that grows at least linearly with s−1. More precisely, for some fixed ν>0, the following holds: For each s∈{t+1,…,m}, and for all x∈X(s) we have:$\left\right\{r\in {\mathcal{R}}^{+}:x\in \pi \left(r\right)\text{and}\rho \left(r\right)\subseteq \alpha (s1)\left\}\right\ge \nu (s1).$
We now show that for any triple $(X,\mathcal{R},F)$ the probability that $\mathcal{Q}=(X,\mathcal{R},F,C)$ (where the random assignment C satisfies (C1) and (C2)) has an RAF (denoted $P(\exists \text{RAF for}\mathcal{Q})$) is, under certain conditions, determined by how the average catalysation rate compares to the simple ratio of the total number of forward reactions to the total number of molecules.
The proof of part (a) of the following theorem is presented in the Appendix; part (b) follows immediately from a stronger result stated later (Theorem 2) and the proof of that later result is also in the Appendix
Theorem 1.
For any triple $(X,\mathcal{R},F)$ that has a species stratification, consider the random CRS $\mathcal{Q}=(X,\mathcal{R},F,C)$ formed by an assignment of catalysation (C) under any stochastic process satisfying (C1) and (C2).

If $\overline{\mu}\le \lambda \xb7\frac{\left{\mathcal{R}}^{+}\right}{\leftX\right}$ then the probability that there exists an RAF for is at most ϕ(λ), where $\varphi \left(\lambda \right)=1{(1\frac{\lambda}{K})}^{\tau}\to 0$ as λ→0, and where τ is a constant dependent only on k and t.

If $\overline{\mu}\ge \lambda \xb7\frac{\left{\mathcal{R}}^{+}\right}{\leftX\right}$ then the probability that there exists an RAF for is at least 1−ψ(λ), where $\psi \left(\lambda \right)=\frac{k{\left(k{e}^{\mathrm{\nu \lambda}/K}\right)}^{t}}{1k{e}^{\mathrm{\nu \lambda}/K}}\to 0$ exponentially fast as λ→∞.
The results in section “Simulations of partitioned chemical reactions systems” show that as the level of catalysis is increased past some threshold there is a transition in the probability of the existence of RAFs. This is to be expected as it is well known in combinatorics that every monotone increasing property of subsets of a set has an associated threshold function [24]. Consideration of the definitions of reflexively autocatalytic and Fgenerated reveals that the RAF property is monotone on the subsets of the set of possible catalysis arcs from molecules to reactions in a CRS, so the RAF property has a threshold function. In the original binary polymer model, the threshold function for catalysis is linear in n (the maximal sequence length). However, in the more general setting considered here, molecules do not come equipped with a intrinsic length. Nevertheless, Theorem 1 shows that the ratio of ‘reactionstomolecules’ plays essentially the same role as n in a threshold function for the RAF property.
Remarks

The proof of part (b) involves the construction of an RAF involving every molecule in X (that is, $\text{supp}\left({\mathcal{R}}^{\prime}\right)=X$). However, in general, this RAF will involve only a subset of the reactions in ${\mathcal{R}}^{+}$.

In general, the definition of a species stratification seems rather artificial: while a CRS within the simple (unpartitioned) polymer model naturally admits a species stratification (since we just let α_{ s } be the set of all polymers up to length s), it would be a nontrivial exercise to find a species stratification for a CRS with molecules that are not polymers. Nevertheless, Theorem 1 shows that the molecules in a CRS being polymers is sufficient but not necessary, and we will see shortly that in the partitioned polymer model a species stratification also applies.
The probability of RAFs in a partitioned CRS
In light of Theorem 1, in order to show that the same linear catalysis requirement that applies for an unpartitioned CRS holds for a partitioned one, we need only show that a partitioned CRS has a species stratification, and construct a set C satisfying (C1), (C2). In what follows, we will consider a partitioned CRS that satisfies the same assumptions that were made in the proof of Theorem 1 (i.e. $\mathcal{R}={\mathcal{R}}^{+}\cup {\mathcal{R}}^{}$, and corresponding reactions from ${\mathcal{R}}^{+}$ and ${\mathcal{R}}^{}$ are always catalysed together). Also, let a molecule $r\in {\mathcal{R}}^{+}$ belong to ${\mathcal{R}}_{1}$ if and only if the corresponding $r\in {\mathcal{R}}^{}$ does too, and let ${\mathcal{R}}_{1}^{+}$ denote the subset of all forward reactions in ${\mathcal{R}}_{1}$. Applying similar restrictions to ${\mathcal{R}}_{2}$, we thus consider a partitioned CRS in which is the disjoint union of four sets; ${\mathcal{R}}_{1}^{+},{\mathcal{R}}_{1}^{},{\mathcal{R}}_{2}^{+}$ and ${\mathcal{R}}_{2}^{}$, so that each module consists of an equal number of forward and reverse reactions together with the associated molecules (of course, the modules may contain different numbers of reactions to each other).
we would expect to observe around ten times more catalysis within modules than between them, and twice as much catalysis of reactions in ${\mathcal{R}}_{1}$ by molecules in X_{2} than of reactions in ${\mathcal{R}}_{2}$ by molecules in X_{1}.
In what follows, consider a partitioned CRS $\mathcal{Q}=(X,\mathcal{R},C,F)$ which is complete: that is, both X_{1} and X_{2} contain every possible polymer up to length n_{1} and n_{2} (respectively), and ${\mathcal{R}}_{1}$ (respectively ${\mathcal{R}}_{2}$) contains every possible forward and reverse reaction between the molecules in X_{1} (respectively X_{2}). Let F_{1} (respectively F_{2}) be all the molecules in X_{1} (respectively X_{2}) up to some length t< min{n_{1},n_{2}}. Finally, for a molecule x∈X, let x denote the length of x (i.e. the number of monomer units in x).
and let X_{2}(s) be defined similarly to X_{1}(s). Note that $\left{X}_{i}\right(s\left)\right={k}_{i}^{s}$. Defining n_{min}:= min{n_{1},n_{2}} and n_{max}:= max{n_{1},n_{2}}, these stratifications are combined into a single stratification γ_{1},γ_{2}…, of the set X as follows:

for 1≤s≤n_{min}, γ_{ s }:=α_{ s }∪β_{ s };

for n_{min}<s≤n_{max},${\gamma}_{s}:=\left\{\begin{array}{cc}{\alpha}_{s},& \text{if}\phantom{\rule{2.77626pt}{0ex}}{n}_{1}>{n}_{2}\\ {\beta}_{s},& \text{if}\phantom{\rule{2.77626pt}{0ex}}{n}_{2}>{n}_{1}\end{array}\right..$
Note that F=γ_{ t }, which is condition (i) in the definition of a species stratification; conditions (ii) and (iii) also clearly hold (with M=2 for condition (iii)), so it remains to establish condition (iv), namely that the stratification satisfies (S1) and (S2). Define X(1):=γ(1) and for s∈{2,…,n_{max}},X(s):=γ(s)−γ(s−1), and consider the size of each set X(s). Since X(s) does not exceed ${k}_{1}^{s}+{k}_{2}^{s}$ for any value of s, (k_{1}+k_{2})^{ s } is strictly greater than X(s) for all s∈{1,…,n_{max}}, so the partitioned CRS satisfies (S1). To see that it also satisfies (S2), we need only note that for any molecule type x∈X(s) where s∈{t+1,…,n_{max}}, x=s, so there are a maximum of s−1 ways x could be constructed from shorter molecule types (i.e. molecule types in γ_{s−1}). Since ${\mathcal{R}}_{1}$ and ${\mathcal{R}}_{2}$ are both complete, every such reaction exists and there are in fact precisely s−1 forward reactions generating x from γ_{s−1}, so take ν=1. We conclude that the complete partitioned CRS has a species stratification.
clearly c_{1},c_{2} are finite. Hence taking K = max{c_{1}/c_{2},c_{2}/c_{1}} shows that (C2) holds also. We conclude that Theorem 1 applies to a partitioned CRS.
Simulations of partitioned chemical reactions systems
Previous simulations of chemical reaction systems [8, 14] have focussed on those which are complete (X contains every molecule up to some maximum length n, and contains every possible cleavage/ligation reaction between the molecules of X) and those in which catalysis is assigned randomly such that every molecule has the same fixed probability of catalysing any reaction. In [13, 14], it was shown both theoretically and computationally that in a ‘classic’ CRS with only one module, the level of catalysis (expected number of reactions catalysed per molecule) necessary and sufficient to generate RAFs with a given probability (e.g. 0.5) increases linearly with n. Furthermore, simulations show that the linear relationship is not steep: when n=10, the required level of catalysis is around 1.29, and when n=20, the required level of catalysis increases only to 1.48 [14]. Based on the finding that many enzymes catalyse multiple reactions [25], and the results of a recent search for RAF sets in the metabolic network of E. coli (Sousa FL, Hordijk W, Steel M, Martin W: Autocatalytic sets in the metabolic network of E. coli, in preparation)., this level of catalysis appears to be biologically feasible. Hence, the above results suggest that RAFs might be expected for real biochemical polymer networks, even under a random assignment of catalysis.
The uniqueness of the results from the intramodular model suggests that the property unique to this model – the complete absence of intermodular catalysis – has a discrete effect on the probability of RAF formation. Note that the uniform and intermodular models both have intermodular catalysis, but the latter has twice the level of the former, as well as a lack of intramodular catalysis. Despite these difference, their results appear to be identical. Taken together, these results suggest that the presence or absence of intermodular catalysis has more of an effect on the probability of RAF formation than the actual level of intermodular catalysis.
Figure 8 shows how the average size of the maxRAF (in terms of number of reactions, and number of molecules) changes as the level of catalysis is increased. Whereas for n=10 the maxRAF initially grew most quickly for the intramodular model, these plots do not show the same early growth spurt: instead, all models appear to begin the transition at around the same level of catalysis. It is possible that the resolution was not high enough to detect the phenomenon: simulating smaller increments in p and a greater number of instances around this transition zone may reveal that it still occurs at n=15. Other than this, the plots are similar to those in Figure 6. RAF sets in the uniform and intermodular models grow faster both in number of reactions and number of molecules, and not until a level of catalysis around 2.0 does the intramodular model catch up. This is much later than in the n=10 case, which is particularly interesting given that at this level of catalysis the intramodular model is developing RAFs with higher probability than the other models (Figure 7).
Discussion
We chose here to investigate only the cases when n=10 and n=15, since computational constraints limit the feasibility of repeating the experiments for more and/or larger values of n. However, inferences can be made about other values of n, especially in the light of Theorem 1, which shows that a linear increase (with n) in the level of catalysis is necessary and sufficient to maintain RAFs with a given probability in a partitioned CRS. After producing similar results to the above for further values of n, it would be interesting to use least squares regression to explicitly express the linear dependence (on n) of the level of catalysis required to give 50% probability of RAF formation for various patterns of catalysis, and compare these with the linear formulae produced in [14] for the original model. Based of Figures 5 and 7, we expect to see a steeper relationship for the uniform and intermodular models than for the intramodular model.
While all three models begin to develop RAFs with high probability above the threshold level of catalysis, it is clear that the intramodular model develops RAFs somewhat more reliably (with higher probability at lower catalysis levels) than the other models. Furthermore, this difference is more apparent at n=15 than n=10, and in the light of the result of Theorem 1, the difference looks likely to become more marked as n increases. On the other hand, as pointed out by philosopher Roger White [26], the probability of a mechanism proposed to play a role in the origin of life may not be a sound metric by which to judge the validity of that mechanism (Elliott Sober makes a related argument in response to Richard Dawkins in [27] pp.5051). In terms of RAF theory, this means that the probability of RAF formation might not be the best way to decide which models have the most potential to shed light on the origin of life question.
However, the results show another difference between the models that is worth noting. Figures 6 and 8 both suggest that the size of RAF sets is significantly lower in the intramodular model than in the uniform and intermodular models (excluding the brief window immediately around the threshold level of catalysis in which RAFs in the intramodular model grow faster at n=10). This larger size of RAF sets in the uniform and intermodular models is interesting: since RAF sets can often be decomposed into constituent RAFs (subRAFs), larger RAFs are likely to contain more of these autocatalytic subsets. It was suggested in [28, 29] that this modular structure might be important for the potential evolvability of RAF sets. Specifically, the ability of large RAF sets to gain and lose smaller subRAFs might be a mechanism by which RAF sets can evolve and compete with each other, a process which might favour characteristic combinations of subRAFs, in a primitive form of selection. This transition from a purely selfreplicating set of molecules to a complex autocatalytic set which replicates imperfectly while remaining robust to changes in the environment is essential, if RAF sets are to give rise to a replicator capable of gradual, openended Darwinian evolution.
We have investigated three different patterns of catalysis. Due to the inherent flexibility of the partitioned model, there are various other qualitatively different patterns that could be explored. In each of the above systems, the catalysis matrix P is symmetric. Even with this restriction in place, there is a continuum between exclusively intermodular and exclusively intramodular catalysis, and we examined only the middle point and the two extremes of that continuum here. We expect to observe a similar pattern of RAF emergence in other systems, where both intra and intermodular catalysis occur, but not in equal amounts. Based on Figures 5 and 7, if we were to begin with an exclusively intramodular system $\left(\mathbf{\text{P}}=p\left(\begin{array}{cc}1& 0\\ 0& 1\end{array}\right)\right)$and gradually increase the level of intermodule catalysis (the offdiagonal entries of P), while holding the overall level of catalysis constant, we should expect to see a shift in the pattern of RAF development, becoming more like the uniform and intermodular models examined here. This change should be complete by the time the catalysis becomes uniform, so must occur somewhere between ‘intramodular’ $\left(\mathbf{\text{P}}=p\left(\begin{array}{cc}1& 0\\ 0& 1\end{array}\right)\right)$and ‘uniform’ $\left(\mathbf{\text{P}}=\frac{p}{2}\left(\begin{array}{cc}1& 1\\ 1& 1\end{array}\right)\right)$. It would be interesting to determine at what point this transition occurs, and how sharp it is. A further extension would be to investigate systems in which P is not symmetric: for example, where one module dominates as a source of catalysts for the system $\left(\mathrm{e.g.}\mathbf{\text{P}}=\frac{p}{10}\left(\begin{array}{cc}9& 9\\ 1& 1\end{array}\right)\right)$. Given the main motivation behind this investigation, and the observation that peptides appear to be far more catalytically active than nucleic acids [25], this particular extension seems highly relevant.
Based on structural complementarity between polypeptide and RNA helices [30] and more recent experimental work demonstrating high catalytic proficiency of ancestrally related primitive forms of enzymes involved in translation [5, 31, 32], Carter and colleagues have suggested that the interactions between polypeptides and RNA may have played a key role in early chemical evolution in a “peptideRNA world”. Our theoretical results show that a system with two different types of polymer with reciprocity of function similar to that of proteins and RNA, produces autocatalytic sets at similarly realistic levels of catalysis to a simpler system composed of a single type of polymer (such as an RNAworld or system of peptides). Therefore, the results presented here suggest that the alternative scenario proposed by Carter and colleagues is feasible.
Extensions: closure, inhibition and reaction rates
The current definition of an RAF is limited because it ignores inhibition and reaction rates. The latter is problematic because those reactions generating required reactants which proceed too slowly, or those which use up required reactants and proceed too fast, may prevent an RAF set from persisting in a dynamic environment. While the lack of inhibition and kinetics may be seen as a severe restriction, it is useful because it allows us to compute RAFs in polynomial time. These RAFs could then be examined to test if they are viable given known inhibition or reaction rate data.
Alternatively, we could build this into the definition of a stronger type of RAF and ask if there is an efficient algorithm to find them. In this section we explore the latter approach. We consider RAFs that are viable under reaction rates and show that determining whether or not they exist in an arbitrary catalytic reaction system turns out to be NPcomplete.
Consideration of these factors (inhibition and reaction rates) requires distinguishing between RAFs that are ‘closed’ and those that are not (this distinction is not important in the absence of inhibition and dynamics). Thus we first introduce and discuss this property, before considering the definition and properties of RAFs that allow inhibition or reaction rates.
Closed RAFs
Given a CRS $\mathcal{Q}=(X,\mathcal{R},C,F)$, a subset ${\mathcal{R}}^{\prime}$ of is a closed RAF if and only if the following conditions hold:

${\mathcal{R}}^{\prime}$ is an RAF;

for every $r\in \mathcal{R}$ for which there is a pair (x,r)∈C such that $\left\{x\right\}\cup \rho \left(r\right)\subseteq {\text{cl}}_{{\mathcal{R}}^{\prime}}\left(F\right)$, $r\in {\mathcal{R}}^{\prime}$.
Informally, a closed RAF captures the idea that “any reaction that can occur, will occur”. If all the reactants and at least one catalyst of a reaction $r\in \mathcal{R}$ are generated by the reactions in ${\mathcal{R}}^{\prime}$, then it seems reasonable to expect that the reaction r will occur, and so we should expect that r is included in ${\mathcal{R}}^{\prime}$. If r is not included, then it is natural to consider adding it to ${\mathcal{R}}^{\prime}$, in order that the extended set ${\mathcal{R}}^{\prime}\cup \left\{r\right\}$ comes closer to containing all the reactions for which it generates all the necessary molecules. In order to formalise this notion, we introduce the idea of the closure of an RAF, defined as the smallest closed RAF which contains the RAF. Given an RAF ${\mathcal{R}}^{\prime}$, we can construct its closure $\overline{{\mathcal{R}}^{\prime}}$ as follows: let ${\mathcal{R}}^{\prime}={K}_{0}$, and let K_{i+1}=K_{ i }∪L_{ i }, where L_{ i } is the set of all $r\in \mathcal{R}\setminus {K}_{i}$ such that there exists a pair (x,r)∈C and $\left\{x\right\}\cup \rho \left(r\right)\subseteq {\text{cl}}_{{K}_{i}}\left(F\right)$. Then, $\overline{{\mathcal{R}}^{\prime}}$ is the final set K_{ n } in the sequence of nested sets ${\mathcal{R}}^{\prime}={K}_{0}\subset {K}_{1}\subset {K}_{2}\subset \cdots \subset {K}_{n}$, where n is the first value of i for which K_{ i }=K_{i+1}.
Note that an RAF ${\mathcal{R}}^{\prime}$ is a closed RAF if and only if ${\mathcal{R}}^{\prime}=\overline{{\mathcal{R}}^{\prime}}$. Note also that while the union of two RAFs is also an RAF, the union of two closed RAFs is not necessarily a closed RAF (though it is an RAF).
One notable property of closed RAFs is that, unlike RAFs that are not closed, we can reconstruct the network of reactions given only a “list” of the molecules involved in the network, as follows.
Lemma 1
A closed RAF ${\mathcal{R}}^{\prime}\subseteq \mathcal{R}$ is determined entirely by the subset of molecules $F\cup \text{supp}\left({\mathcal{R}}^{\prime}\right)$ and the CRS $\mathcal{Q}=(X,\mathcal{R},C,F)$.
Proof
Consider the set ${\mathcal{R}}^{\ast}$ reconstructed from $F\cup \text{supp}\left({\mathcal{R}}^{\prime}\right)$ as follows:

Add to ${\mathcal{R}}^{\ast}$ every reaction $r\in \mathcal{R}$ for which $\text{supp}\left(r\right)\subseteq F\cup \text{supp}\left({\mathcal{R}}^{\prime}\right)$.

Remove from ${\mathcal{R}}^{\ast}$ any reaction r for which there does not exist an $x\in F\cup \text{supp}\left({\mathcal{R}}^{\prime}\right)$ such that (x,r)∈C.
By definition of reflexively autocatalytic, it follows from (1) that for all $r\in {\mathcal{R}}^{\prime}$, there exists $x\in F\cup \text{supp}\left({\mathcal{R}}^{\prime}\right)$ such that (x,r)∈C. Therefore every reaction in ${\mathcal{R}}^{\prime}$ fits the criteria for inclusion in ${\mathcal{R}}^{\ast}$, and we conclude that ${\mathcal{R}}^{\prime}\subseteq {\mathcal{R}}^{\ast}$.
Next consider some $r\in {\mathcal{R}}^{\ast}$. Then by the rules of construction of ${\mathcal{R}}^{\ast}$, $\text{supp}\left(r\right)\subseteq F\cup \text{supp}\left({\mathcal{R}}^{\prime}\right)$ and there exists an $x\in F\cup \text{supp}\left({\mathcal{R}}^{\prime}\right)$ such that (x,r)∈C. By (1), such an x is in ${\text{cl}}_{{\mathcal{R}}^{\prime}}\left(F\right)$, and also by (1), $\text{supp}\left(r\right)\subseteq {\text{cl}}_{{\mathcal{R}}^{\prime}}\left(F\right)$ so certainly $\rho \left(r\right)\subseteq {\text{cl}}_{{\mathcal{R}}^{\prime}}\left(F\right)$. Then since ${\mathcal{R}}^{\prime}$ is a closed RAF, $r\in {\mathcal{R}}^{\prime}$ by definition. We conclude that ${\mathcal{R}}^{\ast}\subseteq {\mathcal{R}}^{\prime}$, which together with the previous result proves that ${\mathcal{R}}^{\ast}={\mathcal{R}}^{\prime}$.
Corollary 1
If ${\mathcal{R}}^{\prime}$ is an RAF, then given only $F\cup \text{supp}\left({\mathcal{R}}^{\prime}\right)$ and the CRS $\mathcal{Q}=(X,\mathcal{R},C,F)$, we can construct its closure $\overline{{\mathcal{R}}^{\prime}}$.
Proof.
If ${\mathcal{R}}^{\prime}$ is a closed RAF, then ${\mathcal{R}}^{\prime}=\overline{{\mathcal{R}}^{\prime}}$ so the assertion holds trivially by the previous lemma.
Hence suppose ${\mathcal{R}}^{\prime}$ is not closed. Then there is at least one reaction ${r}^{\ast}\in \mathcal{R}\setminus {\mathcal{R}}^{\prime}$ such that there exists a pair (x,r^{∗})∈C and $\left\{x\right\}\cup \rho \left({r}^{\ast}\right)\in {\text{cl}}_{{\mathcal{R}}^{\prime}}\left(F\right)$. Construct the set of reactions ${\mathcal{R}}^{\ast}$ from $F\cup \text{supp}\left({\mathcal{R}}^{\prime}\right)$ (as in Lemma 1). Since we did not use the fact that the RAF was closed in the first part of the proof of the lemma, we can apply the same argument to see that ${\mathcal{R}}^{\prime}\subseteq {\mathcal{R}}^{\ast}$.
Now consider some $r\in {\mathcal{R}}^{\ast}$. Then by the rules of construction of ${\mathcal{R}}^{\ast}$, $\text{supp}\left(r\right)\subseteq F\cup \text{supp}\left({\mathcal{R}}^{\prime}\right)$, and there exists some $x\in F\cup \text{supp}\left({\mathcal{R}}^{\prime}\right)$ such that (x,r)∈C. Then by Equation (1) in the proof of Lemma 1 (again, this applies since we did not assume the RAF was closed in that part of the proof), ${\mathcal{R}}^{\ast}$ contains every ${r}^{\ast}\in \mathcal{R}\setminus {\mathcal{R}}^{\prime}$ such that there exists a pair (x,r)∈C and $\left\{x\right\}\cup \rho \left({r}^{\ast}\right)\in {\text{cl}}_{{\mathcal{R}}^{\prime}}\left(F\right)$. At this point, we identify ${\mathcal{R}}^{\prime}$ with the set K_{0} and ${\mathcal{R}}^{\ast}$ with the set K_{1}=K_{0}∪L_{0} described in the preamble to Lemma 1. We can then follow the same process described in the preamble, constructing a sequence of nested sets ${\mathcal{R}}^{\prime}={K}_{0}\subset \cdots \subset {K}_{n}$, where K_{ n } is by definition equal to $\overline{{\mathcal{R}}^{\prime}}$.
Inhibition
In order to discuss the impact of molecules inhibiting reactions, we begin with the following definitions.
Given a CRS $\mathcal{Q}=(X,\mathcal{R},F,C)$ an inhibition assignment is a subset I of $X\times \mathcal{R}$ where (x,r)∈I means that molecular species x inhibits reaction r. We say that a subset ${\mathcal{R}}^{\prime}$ of is an Iviable RAF for if and only if all of the following hold:
The motivation for insisting that ${\mathcal{R}}^{\prime}$ be closed is as follows: Suppose that ${\mathcal{R}}^{\prime}$ involves a reaction that is inhibited by some product x^{′} of a reaction r^{′} that is not in ${\mathcal{R}}^{\prime}$. Now if the reactants, and at least one catalyst of r^{′} are present as products of reactions in ${\mathcal{R}}^{\prime}$ (or elements of F) then there is no reason for r^{′} not to proceed and for x^{′} not to be produced. In that case ${\mathcal{R}}^{\prime}\cup \left\{{r}^{\prime}\right\}$, and any set containing it, would no longer be an RAF.
The concept of an RAF subject to inhibition was formalized and studied briefly in [13], but there condition (b) was not imposed. This paper established that the problem of determining whether or not a CRS contains an RAF that is Iviable for is an NPcomplete problem. It is pertinent therefore to ask whether the addition of condition (b) alters this result, or affects the proof. In fact, it can be shown that it does not, since the reduction in [13] involves the construction of an RAF that is automatically closed.
It is also of interest to know how inhibition affects the probability of forming a viable RAF, when I is a random assignment. Notice that inhibition is a much stronger notion than catalysation  since if a reaction is inhibited by just one molecule, then no matter how many molecules might catalyse that reaction, it is prevented from taking place. Thus we might expect that even low rates of inhibition could be a major obstruction to the formation of a viable RAF. However, we show here that provided the inhibition rate is sufficiently small, Theorem 2 still holds. To state this we first formalize the model by extending (C1) and (C2) to the following three conditions (which reduce to (C1) and (C2) upon setting ε=0).

The events $\mathcal{E}(x,r)$ that x catalyses reaction r, and the events $\mathcal{F}(x,r)$ that x inhibits reaction r are independent across all pairs (x,r) in $X\times {\mathcal{R}}^{+}$.

As stated previously near the start of section “The probability of RAFs in general catalytic reaction systems”.

For some constant ε≥0, the expected number of molecular species that inhibit any given reaction is at most ε.
Notice that part (a) of Theorem 1 applies automatically to the more restrictive notion of an inhibition viable RAF. However part (b) does not, and here we present a stronger result, which implies Theorem 1(b) (upon taking ε=0). The proof of this theorem is presented in the Appendix.
Theorem 2.
Consider a CRS that satisfies the extended conditions (C1)–(C3), and has a species stratification. Suppose further that the inhibition rate ε in (C3) satisfies: $0\le \epsilon \le exp(K\overline{c})$, where $\overline{c}$ is the average (over all reactions) expected number of molecular species that catalyse each reaction.

If $\overline{\mu}\ge \lambda \xb7\frac{\left{\mathcal{R}}^{+}\right}{\leftX\right}$ then the probability that there exists an RAF for is at least 1−ψ(λ), where $\psi \left(\lambda \right)=\frac{k{\left(2k{e}^{\mathrm{\nu \lambda}/K}\right)}^{t}}{12k{e}^{\mathrm{\nu \lambda}/K}}\to 0$ exponentially fast as λ→∞.

When ε=0 (no inhibition) the factor of 2 in the numerator and denominator of ψ(λ) can be removed.
Kinetic RAF framework
Here we extend previous work by introducing the concept of a kinetic CRS, in which every reaction has an associated rate, and all molecules diffuse away into the environment at constant rate. We then define a kinetic RAF, which, informally, is an RAF in which every molecule is produced at least as fast as it is lost (to diffusion, or by consumption in other reactions). This represents the idea that being able to build up a sufficient local concentration of molecules is a necessary condition for RAFs to form.
Definition: A kinetic CRS is a tuple $Q=(X,\mathcal{R},F,C,v)$ where $X,\mathcal{R}$, F and C are defined in the same way as for a simple CRS, and $v:\mathcal{R}\to {R}_{\ge 0}$ is a rate function, where for each $r\in \mathcal{R}$, v(r) is the rate of r.
For any subset ${\mathcal{R}}^{\prime}\subseteq \mathcal{R}$, the stoichiometric matrix${\mathbf{\text{S}}}_{{\mathcal{R}}^{\prime}}$ is the $\left\text{supp}\right({\mathcal{R}}^{\prime})\setminus F\times \left{\mathcal{R}}^{\prime}\right$ matrix with rows indexed by the nonfood molecule types involved in ${\mathcal{R}}^{\prime}$ and columns indexed by the reactions in ${\mathcal{R}}^{\prime}$, where ${\mathbf{\text{S}}}_{\mathit{\text{ij}}}\in Z$ is the net number of molecule type i produced by reaction j. The rate vector${\mathbf{\text{v}}}_{{\mathcal{R}}^{\prime}}={\left[v\right({r}_{1}),v({r}_{2}),\dots ,v({r}_{\left{\mathcal{R}}^{\prime}\right}\left)\right]}^{T}$ lists the rates of each reaction in ${\mathcal{R}}^{\prime}$. Then, ${\mathbf{\text{S}}}_{{\mathcal{R}}^{\prime}}{\mathbf{\text{v}}}_{{\mathcal{R}}^{\prime}}$ is a vector of the net rates of production of each molecule type in supp$\left({\mathcal{R}}^{\prime}\right)\setminus F$. Let δ≥0 be the diffusion rate.
A subset ${\mathcal{R}}^{\prime}\subseteq \mathcal{R}$ is a kinetic RAF (kRAF) if and only if the following properties hold (where 1 is a $\left\text{supp}\right({\mathcal{R}}^{\prime})\setminus F\times 1$ column vector of 1s):
Note that we do not include food molecules in the rows of ${\mathbf{\text{S}}}_{{\mathcal{R}}^{\prime}}$. An RAF ${\mathcal{R}}^{\prime}$ is not guaranteed to contain any reactions which generate food molecules, but will necessarily contain at least one reaction with at least one food reactant. In that case, if we were to include the rows corresponding to those food molecules, they would have only negative entries, causing the RAF ${\mathcal{R}}^{\prime}$ (which might otherwise satisfy the properties of a kRAF) to formally fail to be a kRAF.
The diffusion rate δ represents the rate at which molecules diffuse away into the environment. Diffusion is unavoidable in chemical systems, and as molecules diffuse away, their concentrations drop until they are no longer available to sustain local reactions. A CRS occurring in the ocean or a “pond” might have a larger δ than one occurring in a hydrothermal vent, which may in turn have a larger δ than a CRS confined within a lipid membrane [33].
The idea of searching for kRAFs within a kinetic CRS is related to the idea in chemical organisation theory (COT) of searching for selfsustaining chemical organisations within an algebraic chemistry [9]. The definitions of the stoichiometric matrix coincide, and the qualifying condition (2) for a kRAF is similar to the qualifying condition for an organisation to be selfsustaining [9] (however in COT there is no diffusion term; note that ${\mathbf{\text{S}}}_{{\mathcal{R}}^{\prime}}{\mathbf{\text{v}}}_{{\mathcal{R}}^{\prime}}>0$ is necessary but not sufficient for a subset ${\mathcal{R}}^{\prime}\subseteq \mathcal{R}$ to be a kRAF). Furthermore, in COT the entries of the vector v are not fixed  we are free to choose a set of values that makes the system selfsustaining, and indeed the definition of selfsustaining is simply that such a set of values can be found. In contrast, the reactions rates in a kinetic CRS are predetermined constraints within which we can (in principle) go looking for a subset ${\mathcal{R}}^{\prime}$ of reactions that satisfies (2). While we propose that this set up is more relevant to the origin of life, the following theorem shows that such a search is unlikely to be useful in general. We show that determining whether or not contains a kRAF is NPcomplete when δ=0 (we expect a similar result applies when δ>0 but our proof, presented in the Appendix, applies to the zero diffusion case).
Theorem 3.
Given a kinetic CRS $Q=(X,\mathcal{R},C,F,v)$ with diffusion rate δ=0, the problem of determining whether or not contains a kRAF is NPcomplete.
The closely related problem in COT of deciding whether or not an algebraic chemistry contains an organisation is also NPcomplete [34]. Although Theorem 3 shows that we cannot hope to efficiently find kRAFs within a kinetic CRS, it is easy to check (in polynomial time) whether or not a given RAF is a kRAF, and since RAFs can be found in polynomial time [8], it may be feasible to discover kRAFs in a kinetic CRS by first ignoring the rate function v and finding a sample of RAFs, then deciding whether or not any are viable under v.
One weakness of the kRAF concept is that reaction rates are fixed  in real systems, the rate of a reaction is a function of the concentrations of its reactants, catalysts and inhibitors. Although the concept of concentration currently has no direct meaning in the RAF framework, previous work has used dynamical simulations to study the changes in concentrations of molecules in small RAF sets [15, 18].
Conclusions
Due to the utility of polymers in modern life, much of the theoretical and experimental work on the origin of life problem has focussed on system of polymers, and in [13] it was shown that the level of catalysis need only increase linearly as the number of molecules increases, in order to maintain a high probability of RAFs occurring. We have presented a generalisation of this result, showing that under mild assumptions, the same linear bound applies to a system in which the molecules are not necessarily polymers. Furthermore, partitioned systems were shown to support the development of RAFs similarly to typical systems containing only one type of polymer, and the effect of the pattern of catalysis on the emergence of RAF sets was explored. Previous research into templatebased catalysis [14, 20] and recent work incorporating more realistic patterns of catalysis [35] have indicated that the emergence of RAFs is quite robust to the the structure of the underlying reaction system, a conclusion which this paper supports.
This research was performed in an effort to better understand the “symbiotic coexistence” of peptides and nucleic acids in living organisms, as well as the potential role of this reciprocity in early chemical evolution (as highlighted recently by [5]). While the results presented here are a far cry from deep insights revealing fundamental truths about the origin of life, this extension of previous work on chemical reaction systems represents an incremental gain in understanding, which can hopefully contribute to an eventual bigger picture. In particular, this paper supports the experimental work of Li et al. [5] and encourages further experimental work on the topic.
We have also introduced and studied two new concepts in RAF theory: closed RAF sets, and kinetic chemical reaction systems. A closed RAF set is an RAF set in the standard sense, with the additional property that “every reaction that can occur, does occur”. More specifically, this means that if the existing subset of reactions is able to produce all the reactants and at least one catalyst of a reaction outside of the subset, then that reaction should be included the subset. A closed RAF is a subset of reactions that has “absorbed” every such reaction.
The kinetic RAF framework was developed in response to criticism levelled at RAF theory for not accounting for the fact that reactions progress at different rates. Kinetics is a fundamental part of real chemistry, so while the strength of RAF theory perhaps lies in its simplicity, the development of a kinetic extension is appropriate. A centerpiece of previous RAF theory investigations has been the search algorithm from [8], which runs in polynomial time and which has allowed chemical reaction systems of various sizes and properties to be investigated computationally [8, 14]. Therefore, a similar algorithm for detecting kinetically viable RAFs inside a kinetic CRS would be a promising start for the development of a theory of kinetic RAFs. Unfortunately, a reduction from the NPcomplete problem 3SAT showed that detecting a kinetic RAF within a kinetic CRS is unlikely to be productive in general. However, it is possible to construct RAFs efficiently, and for each RAF found one can readily test whether it is also a kRAF and therefore potentially capable of true autocatalytic growth.
Endnotes
^{1} Peptide nucleic acid (PNA) does exist, however this polymer has a backbone of N(2aminoethyl)glycine (AEG) monomers linked by peptide bonds, with nucleobases attached to each monomer, rather than being composed of both nucleotide and amino acid monomers. Interestingly, the recent discovery of AEG production in diverse taxa of cyanobacteria may suggest an informationcarrying role for PNA in early life [36].
^{2} tRNA aminoacylation or “charging” involves the esterification of an amino acid monomer to the relevant tRNA, prior to translation at the ribosome. This is of course an example of a reaction which combines molecules from both “independent” sets.
Appendix
Proof of Theorem 1 and Theorem 2
Next we establish the following variation on a lemma from [13].
Lemma 2.
Consider a random CRS $\mathcal{Q}=(X,\mathcal{R},C,I,F)$, satisfying (C1)–(C3). For a reaction $r\in \mathcal{R}$ let q_{ r } be the probability that either no species in X catalyses r or at least one species in X inhibits reaction r.

q_{ r }≥1−c_{ u },

q_{ r }≤ exp(−c_{ l })+ε.
and $\sum _{x\in X}p(x,r)$ is the expected number of species that catalyse r, which by (C2) is at most c_{ u }. Thus, q_{ r }≥1−c_{ u } which establishes part (i).
(by (C2) and (C3)) we obtain the claimed inequality in part (ii).
Consequently, if $\overline{\mu}\le \lambda \xb7\frac{\left{\mathcal{R}}^{+}\right}{\leftX\right}$ then $\lambda \le \overline{c}$, and so we arrive at Theorem 1(a), with ϕ(λ)=1−(1−λ/K)^{ τ }.
By Lemma 2(ii), for any s≥t (recalling that F=α_{ t }), the probability that a species x∈X(s+1) cannot be produced from reactants in α_{ s } is at most (q_{−})^{ c s } (since by (S2) we know that there exist at least cs reactions producing x from reactants in α_{ s }, so the only way for x to fail to be produced is if each such reaction has either no catalyst in X or an inhibitor in X).
and noting that ψ(λ)→0 as λ→∞, part (b) follows (observe that this RAF is closed, since it involves all molecules in X). Finally, note that when ε=0 the inequality in (6) can be improved to ${q}_{}=exp(\overline{{c}_{l}})\le exp(\overline{c}K)$ which eliminates the factor of 2. This completes the proof.
Proof of Theorem 3
For example, $P=({y}_{1}\vee {\overline{y}}_{2}\vee {y}_{3})\wedge ({y}_{2}\vee \overline{{y}_{3}}\vee {\overline{y}}_{4})\wedge ({\overline{y}}_{1}\vee {y}_{2}\vee {\overline{y}}_{4})$ would be an instance of 3SAT for Y={y_{1},y_{2},y_{3},y_{4}}.
Here T(i) and F(i) are subsets of {1,…,n} that describe which elements of Y are in C_{ i } as a literal or a negated literal (respectively). Since each clause has at most three variables, T(i)+F(i)≤3. We say that P has a satisfying assignment if there is a function S:Y→{true,false} so that for each clause C_{ i } in P, there exists j∈T(i) for which f(y_{ j })=true or a j∈F(i) for which f(y_{ j })=false. In the example above, setting S(y_{1})=true, S(y_{2})=S(y_{4})=false, and S(y_{3}) to be either true or false provides a satisfying assignment for P.
Given P we will construct a catalytic reaction system $(X,\mathcal{R},C)$, food set F, and rate function v so that ${\mathcal{Q}}_{P}=(X,\mathcal{R},C,F,v)$ has a kRAF if and only if P has a satisfying assignment.
First suppose that ${\mathcal{Q}}_{P}$ contains a kRAF ${\mathcal{R}}^{\prime}$; we will show that P has a satisfying assignment. Since ${\mathcal{R}}^{\prime}$ is an RAF, it is nonempty. Therefore, the molecule T must be produced, since every reaction in is catalysed by either T or some molecule that is produced from T. This in turn requires that for each 1≤i≤k, θ_{ i } is produced, and therefore, for each 1≤i≤k there exists j∈T(i) such that y_{ j } is produced or j∈F(i) such that ${\overline{y}}_{j}$ is produced. Furthermore, for each value of 1≤j≤n, at most one of the molecules ${y}_{j},{\overline{y}}_{j}$ is produced, since otherwise by the closure property the j th reaction described by (15) would be contained in ${\mathcal{R}}^{\prime}$, which would destroy both y_{ j } and ${\overline{y}}_{j}$ faster than either is produced and violate the rate property of the kRAF ${\mathcal{R}}^{\prime}$. A satisfying assignment S for P is now provided by setting S(y_{ j }) to be true (respectively false) if y_{ j } is produced by some reaction in ${\mathcal{R}}^{\prime}$ (respectively not produced by some reaction in ${\mathcal{R}}^{\prime}$). Note that S is a satisfying assignment even in the case where neither of ${y}_{j},{\overline{y}}_{j}$ is produced for some j∈{1,…,n}, since in that case S(y_{ j }) can be chosen arbitrarily.
Conversely suppose that P has a satisfying assignment S; we will show that ${\mathcal{Q}}_{P}$ contains a kRAF. Let ${\mathcal{R}}^{\prime}$ consist of reaction (14) together with the following reactions:

for each j∈{1,…,n} such that S(y_{ j })=true, include the j th reaction from (8), the j th reaction from (10), and every reaction from (12) such that j∈T(i);

for each j∈{1,…,n} such that S(y_{ j })=false, include the j th reaction from (9), the j th reaction from (11), and every reaction from (13) such that j∈T(i).
so ${\mathcal{R}}^{\prime}$ is Fgenerated. Moreover, every reaction is catalysed by exactly one molecule from the set $\left\{T\right\}\cup \{{y}_{j}T:S(\phantom{\rule{0.3em}{0ex}}{y}_{j})=\mathtt{\text{true}}\}\cup \{{\overline{y}}_{j}T:S(\phantom{\rule{0.3em}{0ex}}{y}_{j})=\mathtt{\text{false}}\}$, and since this union is a subset of ${\text{cl}}_{{\mathcal{R}}^{\prime}}\left(F\right)$, ${\mathcal{R}}^{\prime}$ is also reflexively autocatalytic and is therefore an RAF set.
${\mathcal{R}}^{\prime}$ is closed if there are no reactions $r\in \mathcal{R}\setminus {\mathcal{R}}^{\prime}$ such that there exists (x,r)∈C with $\left\{x\right\}\cup \rho \left(r\right)\subseteq {\text{cl}}_{{\mathcal{R}}^{\prime}}\left(F\right)$. By the construction of ${\mathcal{R}}^{\prime}$, $\mathcal{R}\setminus {\mathcal{R}}^{\prime}$ contains the following reactions:

f_{ j }→y_{ j } for each j such that S(y_{ j })=false (catalysed by y_{ j }T);

${f}_{j}\to {\overline{y}}_{j}$ for each j such that $S\left({\overline{y}}_{j}\right)=$true (catalysed by ${\overline{y}}_{j}T$);
(the catalysts of these reactions are not contained in ${\text{cl}}_{{\mathcal{R}}^{\prime}}\left(F\right)$)

y_{ j }+T→y_{ j }T for each j such that S(y_{ j })=false;

${\overline{y}}_{j}+T\to {\overline{y}}_{j}T$ for each j such that $S\left({\overline{y}}_{j}\right)=\mathtt{\text{true}}$;

y_{ j }→θ_{ i } for each pair (i,j) with j∈T(i) and S(y_{ j })=false;

${\overline{y}}_{j}\to {\theta}_{i}$ for each pair (i,j) with j∈F(i) and S(y_{ j })=true;
(Other than T, the reactants of these reactions are not contained in ${\text{cl}}_{{\mathcal{R}}^{\prime}}\left(F\right)$)

${y}_{j}+{\overline{y}}_{j}\to \omega \phantom{\rule{2.77626pt}{0ex}}\text{for each}\phantom{\rule{2.77626pt}{0ex}}j\in \{1,\dots ,k\}$.
For each value of j, exactly one of the two reactants of this last reaction is contained in ${\text{cl}}_{{\mathcal{R}}^{\prime}}\left(F\right)$. Hence, ${\mathcal{R}}^{\prime}$ is closed.
the elements of which are given by (16).
The molecules {y_{ j }:S(y_{ j })=true} are each produced at rate k+1 from f_{ j }, used up at rate ε>0 to produce y_{ j }T, and used up at rate 1 by each of the reactions {y_{ j }→θ_{ i }:j∈F(i)}. Since there are k clauses, there are at most k values of i for which j∈T(i). Hence the overall rate of production of each molecule y_{ j } is at least k+1−(k+ε)=1−ε>0, which satisfies the rate condition. A similar argument can be made to show that the molecules $\{{\overline{y}}_{j}:S({\overline{y}}_{j})=\mathtt{\text{false}}\}$ also satisfy the condition.
The molecule T is produced at rate 1 by the reaction θ_{1}+⋯+θ_{ k }→T, and used up at rate 0<ε<1/n by each of the n reactions forming y_{ j }T or ${\overline{y}}_{j}T$. Hence the overall rate of production of T is guaranteed to be positive.
Consider the molecules θ_{1},…,θ_{ k }. θ_{ i } is produced at rate 1 by each reaction from (12) or (13) that is included in ${\mathcal{R}}^{\prime}$, of which there are at least one (since P has a satisfying assignment). θ_{ i } is also used up at rate 1 by reaction (14), hence the overall rate of formation of θ_{ i } is nonnegative.
Finally, noting that the molecules y_{ j }T and ${\overline{y}}_{j}T$ are all produced at rate ε>0 and are not used by any reaction, we see that every molecule in $\text{supp}\left({\mathcal{R}}^{\prime}\right)\phantom{\rule{0.3em}{0ex}}\phantom{\rule{0.3em}{0ex}}\setminus \phantom{\rule{0.3em}{0ex}}\phantom{\rule{0.3em}{0ex}}F$ is produced at least as fast as it is used up. This shows that ${\mathcal{R}}^{\prime}$ is a kRAF, and so completes the reduction.
Declarations
Acknowledgements
We thank the University of Canterbury’s BlueFern supercomputing unit for technical support and the use of the IBM Power755 cluster on which the partitioned model simulations were performed. We also thank the Allan Wilson Centre for supporting this research.
Authors’ Affiliations
References
 Smith JM, Szathmáry E: The Major Transitions in Evolution. Oxford, UK: Freeman; 1995.Google Scholar
 Miller S. L: A production of amino acids under possible primitive earth conditions. Science 1953,117(3046):528–529.View ArticleGoogle Scholar
 Hanczyc MM, Fujikawa SM, Szostak JW: Experimental models of primitive cellular compartments: Encapsulation, growth and division. Science 2003,302(5645):618–622.View ArticleGoogle Scholar
 Vaidya N, Manapat ML, Chen IA, XulviBrunet R, Hayden EJ, Lehman N: Spontaneous network formation among cooperative RNA replicators. Nature 2012,491(7422):72–77.View ArticleGoogle Scholar
 Li L, Francklyn C, Jr Carter CW: Aminoacylating urzymes challenge the RNA world hypothesis. J Biol Chem 2013,288(37):26856–26863.View ArticleGoogle Scholar
 Eigen M, Schuster P: The hypercycle: a principle of natural selforganisation. part A: Emergence of the hypercycle. Naturwissenschaften 1977,64(11):541–565.View ArticleGoogle Scholar
 Kauffman S: Autocatalytic sets of proteins. J Theor Biol 1986,119(1):1–24.View ArticleGoogle Scholar
 Hordijk W, Steel M: Detecting autocatalytic, selfsustaining sets in chemical reaction systems. J Theor Biol 2004,227(4):451–461.View ArticleGoogle Scholar
 Dittrich P, di Fenizio P. S: Chemical organisation theory. Bull Math Biol 2007,69(4):1199–1231.View ArticleGoogle Scholar
 Forster AC, Church GM: Towards synthesis of a minimal cell. Mol Syst Biol 2006, 2: 45.View ArticleGoogle Scholar
 Bollobás B, Rasmussen S: First cycles in random directed graph processes. Discrete Math 1989, 75: 55–68.View ArticleGoogle Scholar
 Steel M: The emergence of a selfcatalysing structure in abstract originoflife models. Appl Math Lett 2000, 3: 91–95.View ArticleGoogle Scholar
 Mossel E, Steel M: Random biochemical networks and the probability of selfsustaining autocatalysis. J Theor Biol 2005,223(3):327–336.View ArticleGoogle Scholar
 Hordijk W, Kauffman SA, Steel M: Required levels of catalysis for emergence of autocatalytic sets in models of chemical reaction systems. Int J Mol Sci 2011, 12: 3085–3101.View ArticleGoogle Scholar
 Hordijk W, Steel M: Autocatalytic sets extended: Dynamics, inhibition, and a generalization. J Syst Chem 2012, 3: 5.View ArticleGoogle Scholar
 Kauffman S: The Origins of, Order (SelfOrganization and Selection in Evolution). USA: Oxford University Press; 1993.Google Scholar
 Murata T: Petri nets: Properties, analysis and applications. Proc IEEE 1989, 77: 541–580.View ArticleGoogle Scholar
 Hordijk W, Steel M: A formal model of autocatalytic sets emerging in an RNA replicator system. J Syst Chem 2013, 4: 3.View ArticleGoogle Scholar
 Steel M, Hordijk W, Smith J: Minimal autocatalytic networks. J Theor Biol 2013, 332: 96–107.View ArticleGoogle Scholar
 Hordijk W, Steel M: Predicting templatebased catalysis rates in a simple catalytic reaction model. J Theor Biol 2012, 295: 132–138.View ArticleGoogle Scholar
 Gilbert W: Origin of life: The RNA world. Nature 1986, 319: 618.View ArticleGoogle Scholar
 Wolfenden R, Snider MJ: The depth of chemical time and the power of enzymes as catalysts. Acc Chem Res 2001, 34: 938–945.View ArticleGoogle Scholar
 Gillespie DT: A general method for numerically simulating the stochastic time evolution of coupled chemical reactions. J Comput Phys 1976, 22: 403–434.View ArticleGoogle Scholar
 Bollobás B, Thomason AG: Threshold functions. Combinatorica 1987, 7: 35–38.View ArticleGoogle Scholar
 Jeffery CJ: Moonlighting proteins. Trends Biochem Sci 1999, 24: 8–11.View ArticleGoogle Scholar
 White R: Does origins of life research rest of a mistake? Noû 2007, 41: 453–477.View ArticleGoogle Scholar
 Sober E: Evidence and Evolution: the Logic Behind the Science. Cambridge, UK: Cambridge University Press; 2008.View ArticleGoogle Scholar
 Hordijk W, Steel MA, Kauffman S: The structure of autocatalytic sets: Evolvability, enablement and emergence. Acta Biotheoretica 2012, 60: 379–392.View ArticleGoogle Scholar
 Vasas V, Fernando C, Santos M, Kauffman S, Szathmáry E: Evolution before genes. Biol Direct 2012, 7: 1.View ArticleGoogle Scholar
 Jr Carter CW, Kraut J: A proposed model for interaction of polypeptides with RNA. Proc Nat Acad Sci USA 1974,71(2):283–287.View ArticleGoogle Scholar
 Pham Y, Li L, Kim A, Erdogan O, Weinreb V, Butterfoss GL, Kuhlman B, Jr Carter CW: A minimal TrpRS catalytic domain supports sense/antisense ancestry of class I and class II aminoacyl tRNA synthetases. Mol Cell 2007, 25: 851–862.View ArticleGoogle Scholar
 Li L, Weinreb V, Francklyn C, Jr Carter CW: HistidyltRNA synthetase Urzymes: class I and II aminoacyltRNA synthetase Urzymes have comparable catalytic activities for cognate amino acid activation. J Biol Chem 1038, 286: 7–10395.Google Scholar
 Lodish H, Berk A, Zipursky SL, Matsudaira P, Baltimore D, Darnell J: Molecular Cell Biology. New York, USA: W. H. Freeman; 2000.Google Scholar
 Centler F, Kaleta C, di Fenizio PS, Dittrich P: Computing chemical organisations in biological networks. Bioinformatics 2008,24(14):1611–1618.View ArticleGoogle Scholar
 Hordijk W, Wills PR, Steel MA: Autocatalytic sets and biological specificity. Bull Math Biol 2014,76(1):201–224.View ArticleGoogle Scholar
 Banack SA, Metcalf JS, Jiang L, Craighead D, Ilag LL, Cox PA: Cyanobacteria produce N(2aminoethyl)glycine, a backbone for peptide nucleic acids which may have been the first genetic molecules for life on earth. PLoS ONE 2012,7(11):49043.View ArticleGoogle Scholar
Copyright
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.