Open Access

Autocatalytic sets extended: Dynamics, inhibition, and a generalization

Journal of Systems Chemistry20123:5

DOI: 10.1186/1759-2208-3-5

Received: 25 May 2012

Accepted: 10 August 2012

Published: 14 August 2012



Autocatalytic sets are often considered a necessary (but not sufficient) condition for the origin and early evolution of life. Although the idea of autocatalytic sets was already conceived of many years ago, only recently have they gained more interest, following advances in creating them experimentally in the laboratory. In our own work, we have studied autocatalytic sets extensively from a computational and theoretical point of view.


We present results from an initial study of the dynamics of self-sustaining autocatalytic sets (RAFs). In particular, simulations of molecular flow on autocatalytic sets are performed, to illustrate the kinds of dynamics that can occur. Next, we present an extension of our (previously introduced) algorithm for finding autocatalytic sets in general reaction networks, which can also handle inhibition. We show that in this case detecting autocatalytic sets is fixed parameter tractable. Finally, we formulate a generalized version of the algorithm that can also be applied outside the context of chemistry and origin of life, which we illustrate with a toy example from economics.


Having shown theoretically (in previous work) that autocatalytic sets are highly likely to exist, we conclude here that also in terms of dynamics such sets are viable and outcompete non-autocatalytic sets. Furthermore, our dynamical results confirm arguments made earlier about how autocatalytic subsets can enable their own growth or give rise to other such subsets coming into existence. Finally, our algorithmic extension and generalization show that more realistic scenarios (e.g., including inhibition) can also be dealt with within our framework, and that it can even be applied to areas outside of chemistry, such as economics.


The idea of collectively autocatalytic sets has been introduced more or less independently several times[13], and was subsequently used in a number of origin of life models[47]. Recent experimental advances in creating such sets in the laboratory[811] have generated a renewed interest in autocatalytic sets. Moreover, there is growing evidence that simple autocatalytic cycles may indeed have been at the core of the origin of life[12].

In our own work, we have studied autocatalytic sets extensively from a computational and theoretical point of view[1319]. We briefly review some of the main definitions and results here. First, we define a chemical reaction system (CRS) as a tuple Q = { X , R , C } consisting of a set of molecule types X, a set of reactions R (transforming reactants to products), and a catalysis set C indicating which molecule types catalyze which reactions. We also include the notion of a food set F X of molecule types assumed to be freely available from the environment. In a particular model of a CRS, known as the binary polymer model[1, 20, 21], molecule types are represented as bit strings up to a certain length n, reactions are simply ligation and cleavage, and catalysis is assigned at random according to some parameter p (the probability that a given molecule type catalyzes a given reaction). The food set consists of all molecule types up to a certain length t n.

Informally, an autocatalytic set that is self-sustaining (or an RAF set, in our terminology) is now defined as a subset R R of reactions (and associated molecule types) in which:
  1. 1.

    each reaction r R is catalyzed by at least one molecule type involved in R ;

  2. 2.

    all reactants in R can be produced from the food set F by using a series of reactions only from R itself.


A formal definition is provided in[14, 17], where we also introduced a polynomial-time (in the size of the reaction set R ) algorithm for finding RAF sets in a general CRS. Note that our framework is somewhat different from that of[22], for which it was shown that maximizing the output flow and recognizing autocatalysis is NP-complete.

Some of our main results are that autocatalytic sets are highly likely to exist, even at very moderate levels of catalysis. For example, in the binary polymer model, each molecule needs only catalyze between one and two reactions, on average, to have a high probability of RAF sets emerging[14, 15]. Also, more realistic assumptions, such as template-based catalysis (as opposed to merely random catalysis) can be built into the framework easily. In this case, a molecule can only act as a catalyst if it matches (somewhere along its length) a template made up of several bits around the reaction site (which actually prevents the smallest molecules from being catalysts). However, this restriction does not significantly change the main results[17]. In fact, required levels of catalysis for RAF sets to form in the template-based model can be predicted analytically from the (known) required levels in the base (random) model[18]. And finally, RAF sets can often be decomposed into smaller RAF subsets (possibly even exponentially many), which can provide a mechanism for the evolvability of autocatalytic sets[19, 23].

Here, we continue our studies of autocatalytic sets with various extensions of our framework. First, we investigate actual dynamics of autocatalytic sets. We present some initial but insightful results from simulating molecular flow on RAF sets. Next, we present an extension of our algorithm for detecting autocatalytic sets when inhibition is also considered, i.e., molecules that can potentially prevent a reaction from happening. In an earlier paper we proved that the general problem of detecting autocatalytic sets when inhibition is present, is NP-complete[15]. However, here we show that the problem is actually fixed parameter tractable, i.e., if the number of inhibiting molecules is not too large, autocatalytic sets (or their absence) can still be determined in polynomial time. Finally, in a recent paper we speculated about a generalized theory of autocatalytic sets beyond the context of chemistry and origin of life[19]. Here, we make a first concrete step in this direction by formulating a generalized version of our RAF algorithm which does not depend on the specifics of chemistry (i.e., molecules and reactions), and can be applied in a more general setting. These results are presented, in three parts, in the following section.

Results and discussion

Part I: Dynamics

In our work so far, we have mostly looked at autocatalytic sets in terms of their graph theoretical properties. However, this has ignored dynamics, i.e., actual molecular flow on autocatalytic sets. Here, we fill this gap by presenting initial results on studying the dynamics of RAF sets. In particular, we provide two examples, a constructed one and a realistic one, to show several aspects of the molecular flow that (can) occur. To a large degree, these dynamical results confirm what had already been analyzed, concluded, and speculated in our earlier (structural) studies, but they also shed some new light on autocatalytic sets and their behavior. Note that a related dynamical study was reported recently[24], although here we focus more directly on the actual molecular flow on RAF sets themselves.

A constructed example

Consider the simple chemical reaction system (CRS) Q = { X , R , C } within the binary polymer model, of which the reaction graph is shown in Figure1, and with a food set F = {00,01,10,11}. This CRS consists of four reactions, each one being a bi-directional ligation/cleavage reaction, either combining two food molecules into a unique molecule of length four (in the “forward”, or ligation reaction), or splitting up a molecule of length four into two food molecules (in the “backward”, or cleavage reaction). The two reactions at the top are mutually catalyzed by each others ligation product, and form a 2-reaction autocatalytic (RAF) set. The two bottom reactions are not catalyzed, and are thus not part of any RAF set. However, these two sets of reactions (the top RAF one and the bottom non-RAF one) compete with each other for the food molecules.
Figure 1

A constructed example reaction graph. The reaction graph of the constructed CRS with two sets of reactions: an RAF set (top two reactions) and a non-RAF set (bottom two reactions).

Using the Gillespie algorithm[25, 26], we simulate the flow of molecules on this constructed reaction graph. Food molecules are assumed to be always available, and are kept at a minimum concentration of five molecules each (i.e., if after one of the ligation reactions the concentration of a food molecule has dropped below five, it is immediately replenished). One rationale for this is that the reaction system can be assumed to be “contained” inside some compartment, for example a lipid layer[27] or simply naturally occurring cavities in the soil[28, 29]. So, even though the food molecules are in “unlimited” supply in the environment, they still need to be taken up and brought inside the compartment to be used as reactants.

The presence of a catalyst increases the probability that a reaction will happen in direct proportion to the catalyst’s current concentration. However, with this constructed example we are specifically interested in the effects of auto catalysis, and we ignore the fact that a catalyst normally also increases the basic reaction rate. So, for this example, the reaction rates of catalyzed and uncatalyzed reactions are kept equal (at k = 1, in arbitrary units) for all reactions (we relax this assumption again in the more realistic example in the next subsection). The volume is also set to V = 1 (arbitrary units).

To confirm that the simulation produces correct results, we first consider the reactions as uni-directional ligation reactions only. In this case, we expect a linear growth rate over time in the concentrations of the products 0011 and 0110 of the bottom two (non-RAF) reactions, but an exponential growth rate in the concentrations of the products of the top two (RAF) reactions, given that they form an autocatalytic set. Figure2 shows the results, and indeed confirms this expectation (note that the y-axis is on a log-scale, so the exponential growth shows as a straight line). Since this is a simple model setting, the time units (x-axis) are arbitrary.
Figure 2

Dynamics on the ligation-only reaction graph. The molecular concentrations over time for the products of the four reactions in the constructed example CRS when only ligation reactions are considered.

Next, we consider the full system, including the “backward” (cleavage) reactions. In this case, the molecule concentrations cannot grow unlimited, as they start breaking down at a rate proportional to their concentration. So, one would expect them to reach some equilibrium distribution. Figure3 shows the result (simulating 10,000 reaction events). As expected, the molecular concentrations do indeed seem to reach an equilibrium distribution (instead of unlimited growth as with the uni-directional reactions in Figure2). However, the two reactions forming an RAF set still have a large advantage over the two non-RAF reactions. The growth rate in concentrations of the molecules 0001 and 1011 (red and green lines) is much higher (until it levels off) than that of the molecules 0011 and 0110 (blue and purple lines). Also, the RAF set is able to maintain a much higher concentration of its ligation products than the non-RAF set. The light blue line shows the concentration of one of the food molecules over time (for reference). The concentrations of the other food molecules are similar due to the symmetry in the system.
Figure 3

Dynamics on the ligation and cleavage reaction graph. The molecular concentrations over time for the products of the four reactions when both ligation and cleavage reactions are considered. The RAF set clearly has an advantage over the non-RAF set.

This result clearly shows that the advantage of RAF sets over non-RAF sets is due to the particular, catalytically closed, structure of an RAF set. Even if uncatalyzed reactions have the same (basic) reaction rate as catalyzed reactions, as in this simulation, RAF sets still outcompete non-RAF sets due to the self-reinforcing autocatalytic feedback. However, the equilibrium distribution that is reached does depend largely on the ratio of the reaction rates between the ligation and the cleavage reactions. If this ratio is large enough, the concentrations of the product molecules can be maintained at a high level, as in Figure3. However, reducing this ratio causes the level of the equilibrium concentrations to drop, until at some point there is no advantage anymore for the RAF set over the non-RAF set. Figure4 shows such a situation (again simulating 10,000 reaction events, but setting V = 5, which effectively reduces the mentioned reaction rate ratio by a factor of 5).
Figure 4

Dynamics with low ligation to cleavage reaction ratio. The molecular concentrations over time for the products of the four reactions with a lower ligation to cleavage reaction rate ratio. The rate at which product molecules are broken down is too high for the RAF set to maintain an advantage over the non-RAF set.

A realistic example

Next, we consider an example of an actual autocatalytic (RAF) set that was found by our RAF algorithm in an instance of the binary polymer model with n = 5, t = 2, and p = 0.0045 (with these parameter values, there is a probability of P n = 0.5 that a model instance contains an RAF set). Figure5 shows this RAF set, which consists of eight bi-directional (ligation/cleavage) reactions. The food set is F = {0,1,00,01,10,11}.
Figure 5

A realistic RAF set. A maximal RAF set as found by our RAF algorithm in an instance of the binary polymer model for n = 5. The different subRAFs are indicated by colored boxes.

This maximal RAF set actually consists of several RAF subsets (in[19] we show formally how RAF sets can be decomposed into, possibly exponentially many, RAF subsets). First there are two simple (1-reaction) irreducible RAF sets contained inside the yellow and purple boxes, respectively. Given that their reactants and catalysts are all food molecules, these subRAFs will always be present. Then there is the 3-reaction subRAF contained inside the red box. This subRAF actually includes the purple (1-reaction) irrRAF, but can only “grow” into the full 3-reaction red subRAF once molecule type 1010 is present. This molecule type catalyzes its own ligation from two instances of the food molecule 10, so this reaction will have to happen spontaneously (uncatalyzed) first, before the red subRAF can come into full self-sustaining existence (in fact, this reaction is actually an irrRAF in itself, but for the purposes of the dynamical analysis here, we do not consider it separately as such, as it immediately gives rise to the full red subRAF as soon as it comes into existence). Next, there is the 3-reaction irreducible RAF set contained inside the blue box. This blue subRAF also needs to be seeded, by one of the three reactions happening uncatalyzed (or one of the required molecules coming from elsewhere). Finally, there is the reaction contained in the green box, which strictly speaking is not an RAF by itself, but once molecule type 111 (produced by the blue subRAF) is available, it can become an “extension” of the blue subRAF. However, since the green reaction is catalyzed by its own product, it also needs to happen spontaneously at least once, before it can maintain its own existence autocatalytically.

Using the Gillespie algorithm again, we now study the molecular flow on this maximal RAF of Figure5. In this simulation, we do make a difference between the reaction rates of catalyzed and uncatalyzed reactions, to show the effect of some of the subRAFs needing to be seeded by spontaneous reactions. In particular, if for a given reaction the reactants are present but not the catalyst, the reaction can still go ahead uncatalyzed, but at a reduced rate. For the sake of the simulation, we used a small reduction factor of 20 (i.e., k = 0.05 for uncatalyzed reactions and k = 1, as before, for catalyzed reactions). A higher, more realistic, factor is of course possible, but does not change the qualitative results, and simply means we need somewhat larger time-scales to observe similar behavior. Figure6 shows the concentrations over time (simulating 25,000 reaction events this time) for the (ligation) products of the eight reactions making up the maximal RAF set.
Figure 6

Dynamics on the RAF set. The molecular concentrations of the ligation products over time for the 8-reaction maximum RAF set.

The dynamics of the molecular concentrations are a direct reflection of the particular structure of the maximal RAF set in terms of its subRAFs. First of all, the concentrations of the products of the two 1-reaction irrRAFs (indicated, as in Figure5, with yellow and purple lines, respectively), immediately start growing at a steady rate (although not exponentially, as they are catalyzed by food molecules, which remain in relatively low concentrations). However, the other subRAFs all need to be seeded by a spontaneous reaction. The first such event happens around time 0.3, when one of the reactions in the blue subRAF happens uncatalyzed. But once this has happened, the blue subRAF as a whole can come into existence and grow in concentration. Note that the two product types 010 (solid blue line) and 11100 (dashed blue line) immediately grow rapidly in concentration, but 111 (dotted blue line) has a damped growth, as it is also used again as a reactant.

The next spontaneous event happens around time 0.5. Recall that around time 0.3 the molecule type 111 came into existence, but for the green reaction to become an extension of the blue subRAF, it will still need to happen uncatalyzed at least once (given that it is catalyzed by its own product). However, when this happens (around time 0.5), the concentration of its product type 01111 (green line), supported by a product of the blue subRAF, immediately starts to grow rapidly. Finally, a last required spontaneous event happens around time 0.55, when molecule type 1010 is created, which then gives rise to the red subRAF coming into full existence (given that the purple irrRAF it contains was already present).

Some additional observations can be made about these dynamics. First, molecule type 00100 (dashed red line), a product of the red subRAF, was actually already present before the full red subRAF came into existence, as a result of spontaneous (uncatalyzed) reactions. However, its concentration only really starts growing once molecule type 1010 (its catalyst; solid red line) is present. Next, the concentration of the product of the purple irrRAF (100, purple line) starts decreasing again as soon as the red subRAF comes into existence, as this molecule type is used as a reactant within the red subRAF. And finally, note that the three molecule types that seem to grow in concentration without limit (00100, 11100, and 01111) are the ones that actually have a non-food molecule as one of their building blocks (reactants). Food molecules remain present in relatively low concentrations (although they are replenished when they fall below a concentration of five), but non-food molecules reach higher concentrations, and thus increase, in direct proportion to their concentration, the rate at which reactions that use them as reactants will happen. However, at some point the growth of these three molecule types also levels off, because of the backward (cleavage) reactions happening more and more often as well (similar to what happens in Figure3); for readability of the graph, though, concentrations above 100 molecules are not shown in Figure6.

The reason we have used a stochastic dynamical simulation here (instead of solving a set of ODEs), is that we are specifically interested in the transient behavior of the system, i.e., how subRAFs come into existence and (sometimes) depend on each other. Looking at the equilibrium distribution resulting from the corresponding ODEs does not provide this information. Furthermore, we have shown only one particular instance (realization) of the simulation model in Figure6. Other realizations show very similar behaviors overall, except that the waiting times and order in which the various subRAFs come into existence may differ between simulation runs (due to their stochastic nature). However, averaging the concentrations over many runs would not show these specific behaviors of interest, so we have chosen to show one particular instance as a representative for a whole set of simulations.

These initial results are, of course, only a first step towards a more complete study of the dynamics of autocatalytic sets. However, they already provide some very useful and interesting insights into the kinds of dynamics one can observe in RAF sets, and also confirm some of the claims made recently on how subRAFs can enable their own growth and each others coming into existence[19]. Moreover, there are many directions in which such a dynamical analysis can be extended. For example, one can consider having autocatalytic (sub)sets enclosed in different compartments, able to grow and reproduce (once a threshold concentration of certain molecule types is reached). Variation can then be introduced by only passing on a (perhaps random) subset of the molecules from the parent to the offspring, i.e., offspring compartments can possibly have different combinations of existent subRAFs, enabling an evolutionary process to happen[23]. As another example, one can ask what will happen if there are inhibitors present in the system, i.e., molecules that can actually prevent a reaction from happening. In the next section, we describe an extension of our RAF algorithm for dealing with such a situation.

Part II: Inhibition

Given a chemical reaction system, Q = ( X , R , C ) , with food set F, suppose we have a collection ( X 1 , R 1 ) , ( X 2 , R 2 ) , , ( X k , R k ) where X i X, and R i R . The interpretation of the pair ( X i , R i ) is that every molecule xX i inhibits every reaction r R i . Notice that any pattern of inhibition can be represented this way, for example by numbering the reactions, and taking R i = { r i } and X i to be the set of molecules that inhibit r i (or we may number the molecules, and take X i = {x i } and R i to be the set of reactions inhibited by x i ). We wish, however, to consider ‘types’ of molecules that will inhibit ‘types’ of reactions so that k can be chosen to be not too large.

We say that a subset R R forms an uninhibited RAF, or more briefly a u-RAF, if R is an RAF (in the usual sense) and R contains no reaction that is inhibited by any molecule that is involved in R . For a more formal definition, let supp ( R ) denote the support of R – this is the set of molecules that are either reactants or products of reactions in R (this is the same as the union of the set of molecules in F that are reactants of reactions in R , and the set of products of reactions in R ). Uninhibited RAFs are now defined more formally as follows.


Given a chemical reaction system, Q = ( X , R , C ) , with food set F, a subset R of R is a u-RAF if

(u-1) R is an RAF.

(u-2) R R i supp ( R ) X i = .

Note that if a set of reactions R satisfies (u-2), and if we let R now refer to any subset of that set, then this subset also satisfies (u-2); this implies that any subset of a u-RAF that is an RAF is also a u-RAF. □

Determining whether a CRS contains a u-RAF was shown to be an NP-complete problem in[15]. However, here we show that the problem is fixed parameter tractable in the parameter k. So, provided k is not too large, we can still find u-RAFs in a CRS efficiently (or determine that a u-RAF does not exist).

We first require some additional definitions. Let [k]:={1,…,k}, and for any subset J of [k], let
R J : = { r R : supp ( r ) X j = for all j J } ,
and let
R J : = { r R , r R j for all j J } .

In the following theorem, the set R J R J plays a prominent role (where J is a subset of k); this is precisely the set of reactions r in R for which (i) r does not belong to R j for any jJ and (ii) if r R j (for some j J) then none of the molecules in the support of r lie in X j . Recall from[19] that for any subset R of reactions in R , s ( R ) is the maximal subRAF contained within R (as computed by our RAF algorithm) or the empty set if no such subRAF of R exists. We can now state our first theorem.

Theorem 1

Given a chemical reaction system, Q = ( X , R , C ) , with food set F, the following assertions hold:

For any subset J of [k], if s ( R J R J ) is non-empty, then it is a u-RAF.

If R R is a u-RAF, then R R J R J where
J = { j [ k ] : supp ( R ) X j } .

The set of maximal u-RAFs is precisely the collection of all non-empty subsets of R of the form s ( R J R J ) as J ranges over subsets of [k].


For part (i) we know that if s ( R J R J ) is non-empty, then it is an RAF (from[14]), thus it suffices to verify property (u-2) in the definition of a u-RAF above for the set R : = R J R J , which implies that s ( R ) will also satisfy property (u-2), since it is a subset of R .

Suppose, to the contrary, that property (u-2) in the definition of a u-RAF is violated by R , then we can derive a contradiction as follows. For some i[k] we must have:
R R i and supp ( R ) X i .

In particular, there exists a reaction, say r1, in R R i . Moreover, since supp ( R ) = r R supp ( r ) , the second part of Eqn. (4) implies that there also exists a reaction, say r2, in R for which supp(r2)∩X i . Now, since r 1 R J and r 1 R i it follows, by the definition of R J , that i cannot be in J. Now consider r2. This reaction is in R J and so, since i does not lie in J, we must have supp(r)∩X i = . But this contradicts the choice of r2. This establishes part (i).

For part (ii), suppose that R R is a u-RAF. It suffices to show that R R J and that R R J for the set J described in Eqn. (3); it follows that R will be contained in the intersection of these two sets.

Observe that, for the set J as described in Eqn. (3), R j is the set of reactions in R which do not lie in R j for any j for which supp ( R ) X j . Now, if R is a u-RAF then by condition (u-2) in its definition, any reaction r R must belong to R j . Similarly, for the choice of J as described, R J is the set of reactions r in R for which supp(r)∩X i is empty for all i for which supp ( R ) X i = , and so any reaction r R must also lie in R J . This establishes the required two containments, and so part (ii).

For part (iii), we have shown by part (i) that non-empty sets of the form s ( R J R J ) are u-RAFs, so we need to check that all maximal u-RAFs are of this form. Suppose that R is a maximal u-RAF. Then by part (ii) we know that R R J R J for the choice of J given by Eqn. (3). Now, s ( R ) = R and so, by part (ii) s ( R J R J ) is a u-RAF containing R , and, since R is assumed maximal, these two u-RAFs must coincide. Part (iii) now follows. □

Corollary 1

Given a chemical reaction system, Q = ( X , R , C ) , with food set F, together with a family { ( X i , R i ) : i [ k ] } of inhibition pairs, there is an algorithm for constructing one (or all) maximal u-RAFs (or determining that no u-RAF exists) in time 2 k p(n) where p is a polynomial in the size n of Q .


Simply apply the RAF algorithm to compute s ( R J R J ) for all 2 k subsets J of [k]. □


In contrast to ordinary RAFs, u-RAFs need not be closed under union, i.e., if R and R ′′ are two u-RAFs then R R ′′ may fail to be a u-RAF. Thus, in general, a CRS may have several maximal u-RAFs, while there is always a unique maximal RAF.

So, this extension of our algorithm shows that, even though the general problem of finding RAF sets under inhibition is NP-complete, we can still deal with specific situations (such as when the number of inhibitors is limited) in a relatively efficient way. In the next section, we formulate another extension, or rather a generalization, of our RAF algorithm, which indicates that it can also be applied to problems outside of the context of chemistry and origin of life.

Part III: A generalization

The original RAF algorithm is specifically formulated in the context of chemical reaction systems. However, it is also possible to state the algorithm in a more generalized form. This may be useful for (i) understanding its relationship to other algorithms, and (ii) extending it in further directions, both within the context of chemical reaction systems as well as for other applications (e.g., in economics, as already speculated in[19]).

Suppose we have arbitrary (finite or infinite) sets Y, W where W has a partial order (≤, for example, take W to be the set of subsets of some set partially ordered by set inclusion; as discussed later, this applies in the RAF setting), and functions
f : 2 Y W and g : Y W
(here 2 Y refers to the set of all subsets of Y ). Consider the function: ψ:2 Y →2 Y which is determined by f and g according to the following rule:
ψ ( A ) = { y A : g ( y ) f ( A ) } ,

for each subset A of Y . Note that ψ(A)A, for all A2 Y .


We say that a subset A of Y is gf-compatible if it is non-empty and satisfies the property that g(y) ≤ f(A) for all yA.

For a subset A of Y , and k ≥ 1, define ψ(k)(A) to be the result of applying function ψ iteratively k times starting with A. Thus, ψ(1)(A) = ψ(A) and for k ≥ 1, ψ(k + 1)(A) = ψ(ψ(k)(A)). Notice that the sequence (ψ(k)(A),k ≥ 1) is a nested, decreasing sequence of subsets of Y , and so we may define
ψ ¯ ( A ) : = lim k ψ ( k ) ( A ) = k 1 ψ ( k ) ( A )

which is a (possibly empty) subset of Y . Moreover, if Y is finite, then ψ ¯ ( Y ) = ψ ( k ) ( Y ) for some k ≤ |Y|.

To state the main result of this section, we recall two more standard definitions. A set A2 Y is a fixed point of ψ if ψ(A) = A; and f is monotone if it satisfies the property A1 A2 f(A1) ≤ f(A2).

Theorem 2

Given sets Y and W, where W is partially ordered, together with functions f:2 Y W andg:YW, the following hold:

The gf−compatible subsets of Y are precisely the non-empty subsets of Y that are fixed points of ψ;

ψ ¯ ( Y ) is gf−compatible, provided it is non-empty; moreover, it contains all gf−compatible subsets of Y provided that f is monotone. In particular, when f is monotone, there exists a gf−compatible subset of Y if and only if ψ ¯ ( Y ) is nonempty.


If a subset A of Y is non-empty and A = ψ(A) then A = {yA:g(x) ≤ f(A)} and so A is gf−compatible. Conversely if Aψ(A) then since ψ(A)A, there exists yA so that g(y) is not dominated by f(A) in the partial order. Thus A is not gf−compatible. This establishes Part (i).

For Part (ii), let B = ψ ¯ ( Y ) . Then ψ ( B ) = ψ ( ψ ¯ ( Y ) ) = ψ ¯ ( Y ) = B , so, ψ ¯ ( Y ) is a fixed point of ψ, and so, by part (i), is gf−compatible provided B is non-empty. Also, if f is monotone, and A1A2, then ψ(A1) equals
{ y A 1 : g ( y ) f ( A 1 ) } { y A 1 : g ( y ) f ( A 2 ) } { y A 2 : g ( y ) f ( A 2 ) }

and this last set is ψ(A2), so ψ is monotone as a function from 2 Y to the set 2 Y partially ordered under set inclusion. Thus, if B is any gf−compatible set then, by part (i), B is a fixed point of ψ and so, since B Y, we have B = ψ(B ) ψ(Y) and, by iteration of ψ, B ψ ¯ ( Y ) , as claimed. The remaining claim in part (ii) now follows directly. □

An algorithm

Theorem 2 has the following immediate consequence when Y is finite, and f is monotone. In this case, consider the following ‘gf−algorithm’. Starting with Y , compute the sequence ψ(k)(Y) until it stabilizes. If this set is empty, then report that no gf−compatible subset of Y exists, otherwise output the stable set ψ ¯ ( Y ) , which is the unique maximal gf−compatible subset of Y . Provided that for each subset A of Y , and element yY, the values f(A) and g(y) can be calculated in polynomial time in |Y|, this algorithm runs in polynomial time in |Y|. Notice that the algorithm begins with the set Y and iteratively removes subsets of elements, until eventually arriving at a non-empty set ψ ¯ ( Y ) from which nothing further can be removed, or until all the elements of Y are eliminated.

Relationship to the original RAF algorithm

First a simple observation: If a reaction r is catalyzed by k ≥ 1 molecules, then we can replace it (formally) by k copies of this reaction, each of which is catalyzed by just one of the k-molecules. This way we get a set of reactions, each of which is catalyzed by exactly one molecule. We can thus think of this catalyst as an additional reactant and so the reaction proceeds precisely if all the ‘reactants’ are present – formally this is cleaner than saying “all the reactants and at least one catalyst are present”. In fact, the implementation of our RAF algorithm is actually based on this idea. We call this ‘cleaner’ version the expanded CRS, and the catalyst chosen for any given reaction the nominated catalyst. In this expanded CRS, given a reaction r, let ρ(r) denote the set of reactants plus the nominated catalyst of this reaction. We now describe how Theorem 2 and the gf−algorithm applies.

Given a CRS ( X , R , C ) and food set F X, take Y to be the set of all reactions in the expanded CRS, and take M = X, the set of all molecules, take W = 2 M , partially ordered under set inclusion. For our choice of the function f we set f(A) = cl A (F), where cl A (F) is the closure of the food set F under a subset A of reactions in the expanded CRS; this is the set of all molecules in X that can be constructed from F by repeatedly applying just those reactions that lie in A (and allowing any reaction in A to proceed even if the nominated catalyst is not present). Finally, we set g(r) = ρ(r) (in the expanded model, so ρ(r) includes the nominated catalyst). Then the gf−compatible subsets of Y correspond exactly to the RAFs in the expanded CRS under the recent modified definition of RAF[17], and ψ ¯ ( Y ) is just what we call s(Y) (the maxRAF for the expansion Y of R ). Theorem 2(ii) asserts this maxRAF can be found by the gf−algorithm, which is just the modified RAF algorithm[17] applied in the expanded CRS, and the fact this RAF is the unique maximal RAF follows from the fact that the function Acl A (F) is monotone in A.

The connection described assumes that we are working within the expanded CRS setting. However, we can easily relate this back to the original CRS setting by noting that if A is a set of reactions, and A is the expanded version (replacing each reaction by k copies each with a unique nominated catalyst) then cl A (F) (in the original setting) coincides with cl A ( F ) (in the expanded setting). Moreover, (i) for any RAF A in the original setting, in the expansion of A there is a subset (selecting an appropriate nominated catalyst for each reaction) that is an RAF in the expanded CRS, and (ii) for any RAF A in the expanded CRS, replacing the nominated catalyst of each reaction by its full complement of catalysts returns an RAF A in the original CRS.

Notice that, apart from the monotonicity of the function f(A) = cl A (F), a major factor that helps in guaranteeing a polynomial-time algorithm in the RAF setting is that f(A) can be computed efficiently.

Novel and alternative applications

We now present a simple application of Theorem 2 in a toy economic setting. Suppose Y is a collection of individuals, each of whom produces or consumes different types of “goods”, labeled 1,2,…k. For an individual yY, let g i (y) be the maximum price individual y is able to pay for good i and let f i ( y ) be the minimal price for which individual y is willing to produce good i. To allow greater generality, if individual y does not need good i we can just set g i (y) = 0 and if individual y does not produce good i we can just set f i ( y ) = . We assume that individuals can produce and sell as many goods as they wish (i.e. the individuals who are buying are not competing for a fixed number of items from any one seller).

We define a subset A of Y as viable if (i) it is non-empty, and (ii) every individual in A can afford to buy each good they need from at least one individual in A. We can formalize this as a gf−compatibility condition as follows.

Let W = ( R { } ) k (i.e., k-dimensional Euclidean space with infinity added to each co-ordinate) partially ordered in the usual way: (x1,…,x k )≤(y1,…,y k ) if and only if x i y i for all i. Note that in this example W is not a collection of subsets of a set (as in the RAF setting). Further, let g(y):=(g1(y),…,g k (y))W, and for a set A individuals (i.e. A2 Y ) let
f ( y ) : = ( max y A f 1 ( y ) , , max y A f k ( y ) ) W .

Then a subset A of Y is viable precisely if for each i and each yA, g i ( y ) max { f j ( A ) : j = 1 , , k } , which is equivalent to g(y) ≤ f(A) for all y A. In other words, A is viable if and only if A is gf−compatible. Moreover, notice that f is monotone, and so Theorem 2(ii) applies, so if there is a stable set, then there is a unique maximal one, and it can be found in polynomial time in the size of the population, by using the gf−algorithm.

This provides a (simple) example of how the gf−algorithm can be applied in other contexts, such as economics. This is a first concrete step towards a generalized theory of autocatalytic sets, as we recently proposed[19].

As a further, and rather different, application we point out that the gf−algorithm also provides a polynomial-time solution to HORN-SAT, which is a basic problem in propositional logic, of deciding whether a given conjunction of Horn clauses is satisfiable[30]. Recall that a Horn clause is a clause with at most one positive literal, and any number of negative literals (a literal being a boolean variable which can be either ‘true’ or ‘false’). HORN-SAT is of interest as it is ‘P-complete’ (i.e. not only is it in the complexity class P of problems having polynomial-time solutions, but every problem in the complexity class P can be reduced to HORN-SAT).

Suppose then, that we have an instance of HORN-SAT consisting of a conjunction of a set H of n HORN clauses. Without loss of generality we will assume that not all the clauses in H contain a positive literal, as this is equivalent to the condition that assigning each literal the truth value ‘true’ satisfies every clause in H , and this can be easily checked. We indicate this restriction by saying that H is a proper instance of HORN-SAT. Now we define the sets and functions we will use in the generalized RAF set-up. We take W = 2 H with the usual partial order on subsets. Let Y denote the set of all literals appearing in at least one clause in H (as a positive or negative literal). For a subset A of Y let f(A) be the set of clauses in H that contain at least one element of A as a negative literal. For yY, let g(y) be the set of clauses in H which either contain y as a positive literal or else do not contain any positive literals. The following connection with gf−compatibility is established in the Appendix.

Lemma 1

For a proper instance H of HORN-SAT, a subset A of Y is gf−compatible if and only if the following truth assignment satisfies every clause in H :

y = false y A

By Lemma 1, and the fact that f is monotone, we can invoke Theorem 2(ii) and deduce that the gf−algorithm determines whether or not a proper instance of HORN-SAT has a satisfying assignment, and if it does, it will construct the truth assignment that has a minimal set of literals set to ‘true’. This may all seem rather technical and irrelevant to chemistry, but it actually shows that a very specific algorithm that was inspired by and constructed for solving a chemical problem in the context of the origin of life (finding autocatalytic sets in chemical reaction systems), turns out to be capable (in its generalized form) of solving any problem that is within the problem class P. This is a surprising and interesting result from an algorithmic point of view, and could perhaps lead to another application of molecular computation[31].


In our previous work, we already showed (both computationally and theoretically) that autocatalytic (RAF) sets are highly likely to exist. However, most of these results were based on graph theoretical properties of RAF sets. Here, we have shown that also in terms of dynamics such sets are indeed self-sustainable and can outcompete non-autocatalytic sets. Furthermore, these dynamical results confirm arguments made previously[19] about how RAF subsets can enable their own growth or give rise to other such subsets coming into existence.

Next, the extension described here of our RAF algorithm shows that more realistic scenarios (such as including inhibition) can also be dealt with within our framework. Despite the fact that the general problem of finding RAF sets when inhibition is present is NP-complete, in specific cases (such as when the number of inhibitors is not too large) it is still possible to detect RAF sets efficiently, due to our proof of this problem being fixed parameter tractable.

Finally, the generalization of our RAF algorithm shows that it can even be applied to areas outside of chemistry and origin of life, such as economics. This is an important first step towards a generalized theory of autocatalytic sets, as proposed in[19]. And, perhaps, it could lead to another application of molecular computation.

Of course there are still many further extensions possible. In terms of dynamics, a next step could be to consider multiple, possibly competing, compartments each having some (different) combination of subRAFs existent within them. This could then give rise to an evolutionary process along the lines of[23]. Also, it would be interesting to find further applications of the gf−algorithm outside of chemistry. We hope to work on some of these further extensions and generalizations in the future.

Appendix: Proof of Lemma 1

First suppose that A is gf−compatible, and the truth assignment is as specified. Consider clause c H . There are three possibilities:
  1. 1.

    If c contains a positive literal that is not in A then c is satisfied, since that positive literal is assigned the value ‘true’ under (5).

  2. 2.

    If c contains a positive literal y in A then c f(A) (since g(y) f(A), as y A), and so c is satisfied under (5).

  3. 3.

    If c contains no positive literal, then c is contained in g(y) for any y A(and there exists at least one such y since A is non-empty), and so the condition g(y) f(A) (for y A) implies, once again, that c lies in f(A), and so c is satisfied under (5).


Thus all clauses in H are satisfied.

Conversely, suppose the truth assignment determined by some set A according to (5) satisfies every clause in H . Then A cannot be the empty-set, otherwise every clause in H contains a positive literal, so H would not be proper. We wish to show that g(y)f(A) for all yA. Consider clause cg(y). Then, by definition of g, either (i) c has no positive literal, or (ii) c has a positive literal and it is y, which lies in A. In case (i), the assumption that c is satisfied implies that at least one of the negative literals in c is set to false, which means one of these literals must be in the set A. Consequently cf(A). Similarly, in case (ii), since the positive literal y A is set to ‘false’ at least one of the negated literals in c must be set to false, which again requires this literal to lie in A, and hence cf(A). Thus g(y)f(A) for all yA, as required.



MS thanks the Royal Society of New Zealand for funding support. We thank Stuart Kauffman for helpful and stimulating discussions.

Authors’ Affiliations

Biomathematics Research Centre, University of Canterbury


  1. Kauffman SA: Cellular homeostasis, epigenesis and replication in randomly aggregated macromolecular systems. J Cybernetics 1971, 1: 71–96. 10.1080/01969727108545830View ArticleGoogle Scholar
  2. Eigen M, Schuster P: The hypercycle: a principle of natural self-organization. Part A: Emergence of the hypercycle. Naturwissenschaften 1977, 64: 541–565. 10.1007/BF00450633View ArticleGoogle Scholar
  3. Dyson FJ: A model for the origin of life. J Mol Evolution 1982, 18: 344–350. 10.1007/BF01733901View ArticleGoogle Scholar
  4. Wächterhäuser G: Evolution of the first metabolic cycles. PNAS 1990, 87: 200–204. 10.1073/pnas.87.1.200View ArticleGoogle Scholar
  5. Gánti T: Biogenesis itself. J Theor Biol 1997, 187: 583–593. 10.1006/jtbi.1996.0391View ArticleGoogle Scholar
  6. Rosen R: Life Itself. Columbia University Press, New York; 1991.Google Scholar
  7. Letelier JC, Soto-Andrade J, Abarzúa FG, Cornish-Bowden A, Cárdenas ML: Organizational invariance and metabolic closure: Analysis in terms of (M;R) systems. J Theor Biol 2006, 238: 949–961. 10.1016/j.jtbi.2005.07.007View ArticleGoogle Scholar
  8. Sievers D, von Kiedrowski G: Self-replication of complementary nucleotide-based oligomers. Nature 1994, 369: 221–224. 10.1038/369221a0View ArticleGoogle Scholar
  9. Ashkenasy G, Jegasia R, Yadav M, Ghadiri MR: Design of a directed molecular network. PNAS 2004,101(30):10872–10877. 10.1073/pnas.0402674101View ArticleGoogle Scholar
  10. Hayden EJ, von Kiedrowski G, Lehman N: Systems chemistry on ribozyme self-construction: Evidence for anabolic autocatalysis in a recombination network. Angew Chem Int Ed 2008, 120: 8552–8556. 10.1002/ange.200802177View ArticleGoogle Scholar
  11. Taran O, Thoennessen O, Achilles K, von Kiedrowski G: Synthesis of information-carrying polymers of mixed sequences from double stranded short deoxynucleotides. J Syst Chem 2010, 1: 9. 10.1186/1759-2208-1-9View ArticleGoogle Scholar
  12. Braakman R, Smith E: The emergence and early evolution of biological carbon-fixation. PLoS Comput Biol 2012,8(4):e1002455. 10.1371/journal.pcbi.1002455View ArticleGoogle Scholar
  13. Steel M: The emergence of a self-catalysing structure in abstract origin-of-life models. Appl Mathematics Lett 2000, 3: 91–95.View ArticleGoogle Scholar
  14. Hordijk W, Steel M: Detecting autocatalytic, self-sustaining sets in chemical reaction systems. J Theor Biol 2004,227(4):451–461. 10.1016/j.jtbi.2003.11.020View ArticleGoogle Scholar
  15. Mossel E, Steel M: Random biochemical networks: The probability of self-sustaining autocatalysis. J Theor Biol 2005,233(3):327–336. 10.1016/j.jtbi.2004.10.011View ArticleGoogle Scholar
  16. Hordijk W, Hein J, Steel M: Autocatalytic sets and the origin of life. Entropy 2010,12(7):1733–1742. 10.3390/e12071733View ArticleGoogle Scholar
  17. Hordijk W, Kauffman SA, Steel M: Required levels of catalysis for emergence of autocatalytic sets in models of chemical reaction systems. Int J Molecular Sciences 2011,12(5):3085–3101. 10.3390/ijms12053085View ArticleGoogle Scholar
  18. Hordijk W, Steel M: Predicting template-based catalysis rates in a simple catalytic reaction model. J Theor Biol 2012, 295: 132–138.View ArticleGoogle Scholar
  19. Hordijk W, Steel M, Kauffman S: The structure of autocatalytic sets: Evolvability, enablement, and emergence. Acta Biotheoretica 2012,60(4):379–392. 10.1007/s10441-012-9165-1View ArticleGoogle Scholar
  20. Kauffman SA: Autocatalytic sets of proteins. J Theor Biol 1986, 119: 1–24. 10.1016/S0022-5193(86)80047-9View ArticleGoogle Scholar
  21. Kauffman SA: The Origins of Order. Oxford University Press, New York; 1993.Google Scholar
  22. Andersen JL, Flamm C, Merkle D, Stadler PF: Maximizing output and recognizing autocatalysis in chemical reaction networks is NP-complete. J Syst Chem 2012, 3: 1. 10.1186/1759-2208-3-1View ArticleGoogle Scholar
  23. Vasas V, Fernando C, Santos M, Kauffman S, Sathmáry E: Evolution before genes. Biol Direct 2012, 7: 1. 10.1186/1745-6150-7-1View ArticleGoogle Scholar
  24. Filisetti A, Graudenzi A, Serra R, Villani M, Füchslin RM, Kauffman SA, Packard N, Poli I, De Lucrezia D: A stochastic model of the emergence of autocatalytic cycles. J Syst Chem 2011, 2: 2. 10.1186/1759-2208-2-2View ArticleGoogle Scholar
  25. Gillespie DT: A general method for numerically simulating the stochastic time evolution of coupled chemical reactions. J Comput Phys 1976, 22: 403–434. 10.1016/0021-9991(76)90041-3View ArticleGoogle Scholar
  26. Gillespie DT: Exact stochastic simulation of coupled chemical reactions. J Physical Chem 1977,81(25):2340–2361. 10.1021/j100540a008View ArticleGoogle Scholar
  27. Segré D, Ben-Eli D, Deamer DW, Lancet D: The lipid world. Origin of Life and Evol Biospheres 2001,31(1–2):119–145.View ArticleGoogle Scholar
  28. Martin W, Russel MJ: On the origins of cells: A hypothesis for the evolutionary transition from abiotic geochemistry to chemoautotrophic prokaryotes, and from prokaryotes to nucleated cells. Philos Trans R Soc B 2003, 358: 59–85. 10.1098/rstb.2002.1183View ArticleGoogle Scholar
  29. Martin W, Russel MJ: On the origin of biochemistry at an alkaline hydrothermal vent. Philos Trans R Soc B 2007, 362: 1887–1925. 10.1098/rstb.2006.1881View ArticleGoogle Scholar
  30. Papadimitriou CH: Computational complexity. Addison Wesley, Boston; 1994.Google Scholar
  31. Adleman LM: Molecular computation of solutions to combinatorial problems. Science 1994,266(5187):1021–1024. 10.1126/science.7973651View ArticleGoogle Scholar


© Hordijk and Steel; licensee Chemistry Central Ltd. 2012

This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.