# Autocatalytic sets extended: Dynamics, inhibition, and a generalization

- Wim Hordijk
^{1}Email author and - Mike Steel
^{2}

**3**:5

**DOI: **10.1186/1759-2208-3-5

© Hordijk and Steel; licensee Chemistry Central Ltd. 2012

**Received: **25 May 2012

**Accepted: **10 August 2012

**Published: **14 August 2012

## Abstract

### Background

Autocatalytic sets are often considered a necessary (but not sufficient) condition for the origin and early evolution of life. Although the idea of autocatalytic sets was already conceived of many years ago, only recently have they gained more interest, following advances in creating them experimentally in the laboratory. In our own work, we have studied autocatalytic sets extensively from a computational and theoretical point of view.

### Results

We present results from an initial study of the dynamics of self-sustaining autocatalytic sets (RAFs). In particular, simulations of molecular flow on autocatalytic sets are performed, to illustrate the kinds of dynamics that can occur. Next, we present an extension of our (previously introduced) algorithm for finding autocatalytic sets in general reaction networks, which can also handle inhibition. We show that in this case detecting autocatalytic sets is fixed parameter tractable. Finally, we formulate a generalized version of the algorithm that can also be applied outside the context of chemistry and origin of life, which we illustrate with a toy example from economics.

### Conclusions

Having shown theoretically (in previous work) that autocatalytic sets are highly likely to exist, we conclude here that also in terms of dynamics such sets are viable and outcompete non-autocatalytic sets. Furthermore, our dynamical results confirm arguments made earlier about how autocatalytic subsets can enable their own growth or give rise to other such subsets coming into existence. Finally, our algorithmic extension and generalization show that more realistic scenarios (e.g., including inhibition) can also be dealt with within our framework, and that it can even be applied to areas outside of chemistry, such as economics.

## Background

The idea of *collectively autocatalytic sets* has been introduced more or less independently several times[1–3], and was subsequently used in a number of origin of life models[4–7]. Recent experimental advances in creating such sets in the laboratory[8–11] have generated a renewed interest in autocatalytic sets. Moreover, there is growing evidence that simple autocatalytic cycles may indeed have been at the core of the origin of life[12].

In our own work, we have studied autocatalytic sets extensively from a computational and theoretical point of view[13–19]. We briefly review some of the main definitions and results here. First, we define a *chemical reaction system* (CRS) as a tuple$\mathcal{Q}=\{X,\mathcal{R},C\}$ consisting of a set of molecule types *X*, a set of reactions$\mathcal{R}$ (transforming reactants to products), and a catalysis set *C* indicating which molecule types catalyze which reactions. We also include the notion of a food set *F* ⊂ *X* of molecule types assumed to be freely available from the environment. In a particular model of a CRS, known as the binary polymer model[1, 20, 21], molecule types are represented as bit strings up to a certain length *n*, reactions are simply ligation and cleavage, and catalysis is assigned at random according to some parameter *p* (the probability that a given molecule type catalyzes a given reaction). The food set consists of all molecule types up to a certain length *t* ≪ *n*.

*autocatalytic set*that is self-sustaining (or an RAF set, in our terminology) is now defined as a subset${\mathcal{R}}^{\prime}\subseteq \mathcal{R}$ of reactions (and associated molecule types) in which:

- 1.
each reaction $r\in {\mathcal{R}}^{\prime}$ is catalyzed by at least one molecule type involved in ${\mathcal{R}}^{\prime}$;

- 2.
all reactants in ${\mathcal{R}}^{\prime}$ can be produced from the food set

*F*by using a series of reactions only from ${\mathcal{R}}^{\prime}$ itself.

A formal definition is provided in[14, 17], where we also introduced a polynomial-time (in the size of the reaction set$\mathcal{R}$) algorithm for finding RAF sets in a general CRS. Note that our framework is somewhat different from that of[22], for which it was shown that maximizing the output flow and recognizing autocatalysis is NP-complete.

Some of our main results are that autocatalytic sets are highly likely to exist, even at very moderate levels of catalysis. For example, in the binary polymer model, each molecule needs only catalyze between one and two reactions, on average, to have a high probability of RAF sets emerging[14, 15]. Also, more realistic assumptions, such as template-based catalysis (as opposed to merely random catalysis) can be built into the framework easily. In this case, a molecule can only act as a catalyst if it matches (somewhere along its length) a template made up of several bits around the reaction site (which actually prevents the smallest molecules from being catalysts). However, this restriction does not significantly change the main results[17]. In fact, required levels of catalysis for RAF sets to form in the template-based model can be predicted analytically from the (known) required levels in the base (random) model[18]. And finally, RAF sets can often be decomposed into smaller RAF subsets (possibly even exponentially many), which can provide a mechanism for the evolvability of autocatalytic sets[19, 23].

Here, we continue our studies of autocatalytic sets with various extensions of our framework. First, we investigate actual dynamics of autocatalytic sets. We present some initial but insightful results from simulating molecular flow on RAF sets. Next, we present an extension of our algorithm for detecting autocatalytic sets when inhibition is also considered, i.e., molecules that can potentially prevent a reaction from happening. In an earlier paper we proved that the general problem of detecting autocatalytic sets when inhibition is present, is NP-complete[15]. However, here we show that the problem is actually fixed parameter tractable, i.e., if the number of inhibiting molecules is not too large, autocatalytic sets (or their absence) can still be determined in polynomial time. Finally, in a recent paper we speculated about a generalized theory of autocatalytic sets beyond the context of chemistry and origin of life[19]. Here, we make a first concrete step in this direction by formulating a generalized version of our RAF algorithm which does not depend on the specifics of chemistry (i.e., molecules and reactions), and can be applied in a more general setting. These results are presented, in three parts, in the following section.

## Results and discussion

### Part I: Dynamics

In our work so far, we have mostly looked at autocatalytic sets in terms of their graph theoretical properties. However, this has ignored dynamics, i.e., actual molecular flow on autocatalytic sets. Here, we fill this gap by presenting initial results on studying the dynamics of RAF sets. In particular, we provide two examples, a constructed one and a realistic one, to show several aspects of the molecular flow that (can) occur. To a large degree, these dynamical results confirm what had already been analyzed, concluded, and speculated in our earlier (structural) studies, but they also shed some new light on autocatalytic sets and their behavior. Note that a related dynamical study was reported recently[24], although here we focus more directly on the actual molecular flow on RAF sets themselves.

#### A constructed example

*F*= {00,01,10,11}. This CRS consists of four reactions, each one being a bi-directional ligation/cleavage reaction, either combining two food molecules into a unique molecule of length four (in the “forward”, or ligation reaction), or splitting up a molecule of length four into two food molecules (in the “backward”, or cleavage reaction). The two reactions at the top are mutually catalyzed by each others ligation product, and form a 2-reaction autocatalytic (RAF) set. The two bottom reactions are not catalyzed, and are thus not part of any RAF set. However, these two sets of reactions (the top RAF one and the bottom non-RAF one) compete with each other for the food molecules.

Using the Gillespie algorithm[25, 26], we simulate the flow of molecules on this constructed reaction graph. Food molecules are assumed to be always available, and are kept at a minimum concentration of five molecules each (i.e., if after one of the ligation reactions the concentration of a food molecule has dropped below five, it is immediately replenished). One rationale for this is that the reaction system can be assumed to be “contained” inside some compartment, for example a lipid layer[27] or simply naturally occurring cavities in the soil[28, 29]. So, even though the food molecules are in “unlimited” supply in the environment, they still need to be taken up and brought inside the compartment to be used as reactants.

The presence of a catalyst increases the probability that a reaction will happen in direct proportion to the catalyst’s current concentration. However, with this constructed example we are specifically interested in the effects of *auto* catalysis, and we ignore the fact that a catalyst normally also increases the basic reaction rate. So, for this example, the reaction rates of catalyzed and uncatalyzed reactions are kept equal (at *k* = 1, in arbitrary units) for all reactions (we relax this assumption again in the more realistic example in the next subsection). The volume is also set to *V* = 1 (arbitrary units).

*V*= 5, which effectively reduces the mentioned reaction rate ratio by a factor of 5).

#### A realistic example

*n*= 5,

*t*= 2, and

*p*= 0.0045 (with these parameter values, there is a probability of

*P*

_{ n }= 0.5 that a model instance contains an RAF set). Figure5 shows this RAF set, which consists of eight bi-directional (ligation/cleavage) reactions. The food set is

*F*= {0,1,00,01,10,11}.

This maximal RAF set actually consists of several RAF subsets (in[19] we show formally how RAF sets can be decomposed into, possibly exponentially many, RAF subsets). First there are two simple (1-reaction) irreducible RAF sets contained inside the yellow and purple boxes, respectively. Given that their reactants and catalysts are all food molecules, these subRAFs will always be present. Then there is the 3-reaction subRAF contained inside the red box. This subRAF actually includes the purple (1-reaction) irrRAF, but can only “grow” into the full 3-reaction red subRAF once molecule type 1010 is present. This molecule type catalyzes its own ligation from two instances of the food molecule 10, so this reaction will have to happen spontaneously (uncatalyzed) first, before the red subRAF can come into full self-sustaining existence (in fact, this reaction is actually an irrRAF in itself, but for the purposes of the dynamical analysis here, we do not consider it separately as such, as it immediately gives rise to the full red subRAF as soon as it comes into existence). Next, there is the 3-reaction irreducible RAF set contained inside the blue box. This blue subRAF also needs to be seeded, by one of the three reactions happening uncatalyzed (or one of the required molecules coming from elsewhere). Finally, there is the reaction contained in the green box, which strictly speaking is not an RAF by itself, but once molecule type 111 (produced by the blue subRAF) is available, it can become an “extension” of the blue subRAF. However, since the green reaction is catalyzed by its own product, it also needs to happen spontaneously at least once, before it can maintain its own existence autocatalytically.

*k*= 0.05 for uncatalyzed reactions and

*k*= 1, as before, for catalyzed reactions). A higher, more realistic, factor is of course possible, but does not change the qualitative results, and simply means we need somewhat larger time-scales to observe similar behavior. Figure6 shows the concentrations over time (simulating 25,000 reaction events this time) for the (ligation) products of the eight reactions making up the maximal RAF set.

The dynamics of the molecular concentrations are a direct reflection of the particular structure of the maximal RAF set in terms of its subRAFs. First of all, the concentrations of the products of the two 1-reaction irrRAFs (indicated, as in Figure5, with yellow and purple lines, respectively), immediately start growing at a steady rate (although not exponentially, as they are catalyzed by food molecules, which remain in relatively low concentrations). However, the other subRAFs all need to be seeded by a spontaneous reaction. The first such event happens around time 0.3, when one of the reactions in the blue subRAF happens uncatalyzed. But once this has happened, the blue subRAF as a whole can come into existence and grow in concentration. Note that the two product types 010 (solid blue line) and 11100 (dashed blue line) immediately grow rapidly in concentration, but 111 (dotted blue line) has a damped growth, as it is also used again as a reactant.

The next spontaneous event happens around time 0.5. Recall that around time 0.3 the molecule type 111 came into existence, but for the green reaction to become an extension of the blue subRAF, it will still need to happen uncatalyzed at least once (given that it is catalyzed by its own product). However, when this happens (around time 0.5), the concentration of its product type 01111 (green line), supported by a product of the blue subRAF, immediately starts to grow rapidly. Finally, a last required spontaneous event happens around time 0.55, when molecule type 1010 is created, which then gives rise to the red subRAF coming into full existence (given that the purple irrRAF it contains was already present).

Some additional observations can be made about these dynamics. First, molecule type 00100 (dashed red line), a product of the red subRAF, was actually already present before the full red subRAF came into existence, as a result of spontaneous (uncatalyzed) reactions. However, its concentration only really starts growing once molecule type 1010 (its catalyst; solid red line) is present. Next, the concentration of the product of the purple irrRAF (100, purple line) starts decreasing again as soon as the red subRAF comes into existence, as this molecule type is used as a reactant within the red subRAF. And finally, note that the three molecule types that seem to grow in concentration without limit (00100, 11100, and 01111) are the ones that actually have a non-food molecule as one of their building blocks (reactants). Food molecules remain present in relatively low concentrations (although they are replenished when they fall below a concentration of five), but non-food molecules reach higher concentrations, and thus increase, in direct proportion to their concentration, the rate at which reactions that use them as reactants will happen. However, at some point the growth of these three molecule types also levels off, because of the backward (cleavage) reactions happening more and more often as well (similar to what happens in Figure3); for readability of the graph, though, concentrations above 100 molecules are not shown in Figure6.

The reason we have used a stochastic dynamical simulation here (instead of solving a set of ODEs), is that we are specifically interested in the *transient* behavior of the system, i.e., how subRAFs come into existence and (sometimes) depend on each other. Looking at the equilibrium distribution resulting from the corresponding ODEs does not provide this information. Furthermore, we have shown only one particular instance (realization) of the simulation model in Figure6. Other realizations show very similar behaviors overall, except that the waiting times and order in which the various subRAFs come into existence may differ between simulation runs (due to their stochastic nature). However, averaging the concentrations over many runs would not show these specific behaviors of interest, so we have chosen to show one particular instance as a representative for a whole set of simulations.

These initial results are, of course, only a first step towards a more complete study of the dynamics of autocatalytic sets. However, they already provide some very useful and interesting insights into the kinds of dynamics one can observe in RAF sets, and also confirm some of the claims made recently on how subRAFs can enable their own growth and each others coming into existence[19]. Moreover, there are many directions in which such a dynamical analysis can be extended. For example, one can consider having autocatalytic (sub)sets enclosed in different compartments, able to grow and reproduce (once a threshold concentration of certain molecule types is reached). Variation can then be introduced by only passing on a (perhaps random) subset of the molecules from the parent to the offspring, i.e., offspring compartments can possibly have different combinations of existent subRAFs, enabling an evolutionary process to happen[23]. As another example, one can ask what will happen if there are *inhibitors* present in the system, i.e., molecules that can actually *prevent* a reaction from happening. In the next section, we describe an extension of our RAF algorithm for dealing with such a situation.

### Part II: Inhibition

Given a chemical reaction system,$\mathcal{Q}=(X,\mathcal{R},C)$, with food set *F*, suppose we have a collection$({X}_{1},{\mathcal{R}}_{1}),({X}_{2},{\mathcal{R}}_{2}),\dots ,({X}_{k},{\mathcal{R}}_{k})$ where *X*_{
i
}⊂*X*, and${\mathcal{R}}_{i}\subset \mathcal{R}$. The interpretation of the pair$({X}_{i},{\mathcal{R}}_{i})$ is that every molecule *x*∈*X*_{
i
}*inhibits* every reaction$r\in {\mathcal{R}}_{i}$. Notice that any pattern of inhibition can be represented this way, for example by numbering the reactions, and taking${\mathcal{R}}_{i}=\left\{{r}_{i}\right\}$ and *X*_{
i
} to be the set of molecules that inhibit *r*_{
i
} (or we may number the molecules, and take *X*_{
i
}= {*x*_{
i
}} and${\mathcal{R}}_{i}$ to be the set of reactions inhibited by *x*_{
i
}). We wish, however, to consider ‘types’ of molecules that will inhibit ‘types’ of reactions so that *k* can be chosen to be not too large.

We say that a subset${\mathcal{R}}^{\prime}\subseteq \mathcal{R}$ forms an *uninhibited RAF*, or more briefly a *u*-RAF, if${\mathcal{R}}^{\prime}$ is an RAF (in the usual sense) and${\mathcal{R}}^{\prime}$ contains no reaction that is inhibited by any molecule that is involved in${\mathcal{R}}^{\prime}$. For a more formal definition, let$\text{supp}\left({\mathcal{R}}^{\prime}\right)$ denote the *support* of${\mathcal{R}}^{\prime}$ – this is the set of molecules that are either reactants or products of reactions in${\mathcal{R}}^{\prime}$ (this is the same as the union of the set of molecules in *F* that are reactants of reactions in${\mathcal{R}}^{\prime}$, and the set of products of reactions in${\mathcal{R}}^{\prime}$). Uninhibited RAFs are now defined more formally as follows.

#### Definition

Given a chemical reaction system,$\mathcal{Q}=(X,\mathcal{R},C)$, with food set *F*, a subset${\mathcal{R}}^{\prime}$ of$\mathcal{R}$ is a *u*-RAF if

(u-1)${\mathcal{R}}^{\prime}$ is an RAF.

(u-2)${\mathcal{R}}^{\prime}\cap {\mathcal{R}}_{i}\ne \varnothing \Rightarrow \text{supp}\left({\mathcal{R}}^{\prime}\right)\cap {X}_{i}=\varnothing .$

Note that if a set of reactions${\mathcal{R}}^{\prime}$ satisfies (u-2), and if we let${\mathcal{R}}^{\prime}$ now refer to any subset of that set, then this subset also satisfies (u-2); this implies that any subset of a *u*-RAF that is an RAF is also a *u*-RAF. □

Determining whether a CRS contains a *u*-RAF was shown to be an NP-complete problem in[15]. However, here we show that the problem is *fixed parameter tractable* in the parameter *k*. So, provided *k* is not too large, we can still find *u*-RAFs in a CRS efficiently (or determine that a *u*-RAF does not exist).

*k*]:={1,…,

*k*}, and for any subset

*J*of [

*k*], let

In the following theorem, the set${\mathcal{R}}^{J}\cap {\mathcal{R}}_{J}$ plays a prominent role (where *J* is a subset of *k*); this is precisely the set of reactions *r* in$\mathcal{R}$ for which (i) *r* does not belong to${\mathcal{R}}_{j}$ for any *j*∈*J* and (ii) if$r\in {\mathcal{R}}_{{j}^{\prime}}$ (for some *j*^{
′
}∉*J*) then none of the molecules in the support of *r* lie in${X}_{{j}^{\prime}}$. Recall from[19] that for any subset${\mathcal{R}}^{\ast}$ of reactions in$\mathcal{R}$,$s\left({\mathcal{R}}^{\ast}\right)$ is the maximal subRAF contained within${\mathcal{R}}^{\ast}$ (as computed by our RAF algorithm) or the empty set if no such subRAF of${\mathcal{R}}^{\ast}$ exists. We can now state our first theorem.

#### Theorem 1

Given a chemical reaction system,$\mathcal{Q}=(X,\mathcal{R},C)$, with food set *F*, the following assertions hold:

For any subset *J* of [*k*], if$s({\mathcal{R}}^{J}\cap {\mathcal{R}}_{J})$ is non-empty, then it is a *u*-RAF.

*u*-RAF, then${\mathcal{R}}^{\prime}\subseteq {\mathcal{R}}^{J}\cap {\mathcal{R}}_{J}$ where

The set of maximal *u*-RAFs is precisely the collection of all non-empty subsets of$\mathcal{R}$ of the form$s({\mathcal{R}}^{J}\cap {\mathcal{R}}_{J})$ as *J* ranges over subsets of [*k*].

#### Proof

For part (i) we know that if$s({\mathcal{R}}^{J}\cap {\mathcal{R}}_{J})$ is non-empty, then it is an RAF (from[14]), thus it suffices to verify property (u-2) in the definition of a *u*-RAF above for the set${\mathcal{R}}^{\ast}:={\mathcal{R}}^{J}\cap {\mathcal{R}}_{J}$, which implies that$s\left({\mathcal{R}}^{\ast}\right)$ will also satisfy property (u-2), since it is a subset of${\mathcal{R}}^{\ast}$.

*u*-RAF is violated by${\mathcal{R}}^{\ast}$, then we can derive a contradiction as follows. For some

*i*∈[

*k*] we must have:

In particular, there exists a reaction, say *r*_{1}, in${\mathcal{R}}^{\ast}\cap {\mathcal{R}}_{i}$. Moreover, since$\text{supp}\left({\mathcal{R}}^{\ast}\right)={\cup}_{r\in {\mathcal{R}}^{\ast}}\text{supp}\left(r\right)$, the second part of Eqn. (4) implies that there also exists a reaction, say *r*_{2}, in${\mathcal{R}}^{\ast}$ for which supp(*r*_{2})∩*X*_{
i
}≠ *∅.* Now, since${r}_{1}\in {\mathcal{R}}_{J}$ and${r}_{1}\in {\mathcal{R}}_{i}$ it follows, by the definition of *R*_{
J
}, that *i* cannot be in *J*. Now consider *r*_{2}. This reaction is in *R*^{
J
} and so, since *i* does not lie in *J*, we must have supp(*r*)∩*X*_{
i
}= *∅*. But this contradicts the choice of *r*_{2}. This establishes part (i).

For part (ii), suppose that${\mathcal{R}}^{\prime}\subseteq \mathcal{R}$ is a *u*-RAF. It suffices to show that${\mathcal{R}}^{\prime}\subset {\mathcal{R}}^{J}$ and that${\mathcal{R}}^{\prime}\subseteq {\mathcal{R}}_{J}$ for the set *J* described in Eqn. (3); it follows that${\mathcal{R}}^{\prime}$ will be contained in the intersection of these two sets.

Observe that, for the set *J* as described in Eqn. (3),${\mathcal{R}}_{j}$ is the set of reactions in$\mathcal{R}$ which do not lie in${\mathcal{R}}_{j}$ for any *j* for which$\text{supp}\left({\mathcal{R}}^{\prime}\right)\cap {X}_{j}\ne \varnothing $. Now, if${\mathcal{R}}^{\prime}$ is a *u*-RAF then by condition (u-2) in its definition, any reaction$r\in {\mathcal{R}}^{\prime}$ must belong to${\mathcal{R}}_{j}$. Similarly, for the choice of *J* as described,${\mathcal{R}}^{J}$ is the set of reactions *r* in${\mathcal{R}}^{\prime}$ for which supp(*r*)∩*X*_{
i
} is empty for all *i* for which$\text{supp}\left({\mathcal{R}}^{\prime}\right)\cap {X}_{i}=\varnothing $, and so any reaction$r\in {\mathcal{R}}^{\prime}$ must also lie in${\mathcal{R}}^{J}$. This establishes the required two containments, and so part (ii).

For part (iii), we have shown by part (i) that non-empty sets of the form$s({\mathcal{R}}^{J}\cap {\mathcal{R}}_{J})$ are *u*-RAFs, so we need to check that all maximal *u*-RAFs are of this form. Suppose that${\mathcal{R}}^{\prime}$ is a maximal *u*-RAF. Then by part (ii) we know that${\mathcal{R}}^{\prime}\subseteq {\mathcal{R}}^{J}\cap {\mathcal{R}}_{J}$ for the choice of *J* given by Eqn. (3). Now,$s\left({\mathcal{R}}^{\prime}\right)={\mathcal{R}}^{\prime}$ and so, by part (ii)$s({\mathcal{R}}^{J}\cap {\mathcal{R}}_{J})$ is a *u*-RAF containing${\mathcal{R}}^{\prime}$, and, since${\mathcal{R}}^{\prime}$ is assumed maximal, these two *u*-RAFs must coincide. Part (iii) now follows. □

#### Corollary 1

Given a chemical reaction system,$\mathcal{Q}=(X,\mathcal{R},C)$, with food set *F*, together with a family$\left\{\right({X}_{i},{\mathcal{R}}_{i}):i\in [k\left]\right\}$ of inhibition pairs, there is an algorithm for constructing one (or all) maximal *u*-RAFs (or determining that no *u*-RAF exists) in time 2^{
k
}*p*(*n*) where *p* is a polynomial in the size *n* of$\mathcal{Q}$.

#### Proof

Simply apply the RAF algorithm to compute$s({\mathcal{R}}^{J}\cap {\mathcal{R}}_{J})$ for all 2^{
k
}subsets *J* of [*k*]. □

#### Remark

In contrast to ordinary RAFs, *u*-RAFs need not be closed under union, i.e., if${\mathcal{R}}^{\prime}$ and${\mathcal{R}}^{\mathrm{\prime \prime}}$ are two *u*-RAFs then${\mathcal{R}}^{\prime}\cup {\mathcal{R}}^{\mathrm{\prime \prime}}$ may fail to be a *u*-RAF. Thus, in general, a CRS may have several maximal *u*-RAFs, while there is always a unique maximal RAF.

So, this extension of our algorithm shows that, even though the general problem of finding RAF sets under inhibition is NP-complete, we can still deal with specific situations (such as when the number of inhibitors is limited) in a relatively efficient way. In the next section, we formulate another extension, or rather a generalization, of our RAF algorithm, which indicates that it can also be applied to problems outside of the context of chemistry and origin of life.

### Part III: A generalization

The original RAF algorithm is specifically formulated in the context of chemical reaction systems. However, it is also possible to state the algorithm in a more generalized form. This may be useful for (i) understanding its relationship to other algorithms, and (ii) extending it in further directions, both within the context of chemical reaction systems as well as for other applications (e.g., in economics, as already speculated in[19]).

*Y, W*where

*W*has a partial order (≤, for example, take

*W*to be the set of subsets of some set partially ordered by set inclusion; as discussed later, this applies in the RAF setting), and functions

^{ Y }refers to the set of all subsets of

*Y*). Consider the function:

*ψ*:2

^{ Y }→2

^{ Y }which is determined by

*f*and

*g*according to the following rule:

for each subset *A* of *Y* . Note that *ψ*(*A*)⊆*A*, for all *A*∈2^{
Y
}.

#### Definition

We say that a subset *A* of *Y* is *gf-compatible* if it is non-empty and satisfies the property that *g*(*y*) ≤ *f*(*A*) for all *y*∈*A*.

*A*of

*Y*, and

*k*≥ 1, define

*ψ*

^{(k)}(

*A*) to be the result of applying function

*ψ*iteratively

*k*times starting with

*A*. Thus,

*ψ*

^{(1)}(

*A*) =

*ψ*(

*A*) and for

*k*≥ 1,

*ψ*

^{(k + 1)}(

*A*) =

*ψ*(

*ψ*

^{(k)}(

*A*)). Notice that the sequence (

*ψ*

^{(k)}(

*A*),

*k*≥ 1) is a nested, decreasing sequence of subsets of

*Y*, and so we may define

which is a (possibly empty) subset of *Y* . Moreover, if *Y* is finite, then$\overline{\psi}\left(Y\right)={\psi}^{\left(k\right)}\left(Y\right)$ for some *k* ≤ |*Y*|.

To state the main result of this section, we recall two more standard definitions. A set *A*∈2^{
Y
} is a *fixed point* of *ψ* if *ψ*(*A*) = *A*; and *f* is *monotone* if it satisfies the property *A*_{1} ⊆ *A*_{2} ⇒ *f*(*A*_{1}) ≤ *f*(*A*_{2}).

##### Theorem 2

Given sets *Y and W,* where *W* is partially ordered, together with functions *f*:2^{
Y
}→*W* and*g*:*Y*→*W*, the following hold:

The *gf*−compatible subsets of *Y* are precisely the non-empty subsets of *Y* that are fixed points of *ψ*;

$\overline{\psi}\left(Y\right)$ is *gf*−compatible, provided it is non-empty; moreover, it contains all *gf*−compatible subsets of *Y* provided that *f* is monotone. In particular, when *f* is monotone, there exists a *gf*−compatible subset of *Y* if and only if$\overline{\psi}\left(Y\right)$ is nonempty.

##### Proof

If a subset *A* of *Y* is non-empty and *A* = *ψ*(*A*) then *A* = {*y*∈*A*:*g*(*x*) ≤ *f*(*A*)} and so *A* is *gf*−compatible. Conversely if *A* ≠ *ψ*(*A*) then since *ψ*(*A*)⊂*A*, there exists *y*∈*A* so that *g*(*y*) is not dominated by *f*(*A*) in the partial order. Thus *A* is not *gf*−compatible. This establishes Part (i).

*ψ*, and so, by part (i), is

*gf*−compatible provided

*B*is non-empty. Also, if

*f*is monotone, and

*A*

_{1}⊆

*A*

_{2}, then

*ψ*(

*A*

_{1}) equals

and this last set is *ψ*(*A*_{2}), so *ψ* is monotone as a function from 2^{
Y
} to the set 2^{
Y
} partially ordered under set inclusion. Thus, if *B*^{
′
}is any *gf*−compatible set then, by part (i), *B*^{
′
}is a fixed point of *ψ* and so, since *B*^{
′
}⊆*Y*, we have *B*^{
′
}= *ψ*(*B*^{
′
}) ⊆ *ψ*(*Y*) and, by iteration of *ψ*,${B}^{\prime}\subseteq \overline{\psi}\left(Y\right)$, as claimed. The remaining claim in part (ii) now follows directly. □

#### An algorithm

Theorem 2 has the following immediate consequence when *Y* is finite, and *f* is monotone. In this case, consider the following ‘*gf*−algorithm’. Starting with *Y* , compute the sequence *ψ*^{(k)}(*Y*) until it stabilizes. If this set is empty, then report that no *gf*−compatible subset of *Y* exists, otherwise output the stable set$\overline{\psi}\left(Y\right)$, which is the unique maximal *gf*−compatible subset of *Y* . Provided that for each subset *A* of *Y* , and element *y*∈*Y*, the values *f*(*A*) and *g*(*y*) can be calculated in polynomial time in |*Y*|, this algorithm runs in polynomial time in |*Y*|. Notice that the algorithm begins with the set *Y* and iteratively removes subsets of elements, until eventually arriving at a non-empty set$\overline{\psi}\left(Y\right)$ from which nothing further can be removed, or until all the elements of *Y* are eliminated.

#### Relationship to the original RAF algorithm

First a simple observation: If a reaction *r* is catalyzed by *k* ≥ 1 molecules, then we can replace it (formally) by *k* copies of this reaction, each of which is catalyzed by just one of the *k*-molecules. This way we get a set of reactions, each of which is catalyzed by exactly one molecule. We can thus think of this catalyst as an additional reactant and so the reaction proceeds precisely if all the ‘reactants’ are present – formally this is cleaner than saying “all the reactants and at least one catalyst are present”. In fact, the implementation of our RAF algorithm is actually based on this idea. We call this ‘cleaner’ version the *expanded CRS*, and the catalyst chosen for any given reaction the *nominated catalyst*. In this expanded CRS, given a reaction *r*, let *ρ*(*r*) denote the set of reactants plus the nominated catalyst of this reaction. We now describe how Theorem 2 and the *gf*−algorithm applies.

Given a CRS$(X,\mathcal{R},C)$ and food set *F* ⊆ *X*, take *Y* to be the set of all reactions in the expanded CRS, and take *M* = *X*, the set of all molecules, take *W* = 2^{
M
}, partially ordered under set inclusion. For our choice of the function *f* we set *f*(*A*) = cl_{
A
}(*F*), where cl_{
A
}(*F*) is the closure of the food set *F* under a subset *A* of reactions in the expanded CRS; this is the set of all molecules in *X* that can be constructed from *F* by repeatedly applying just those reactions that lie in *A* (and allowing any reaction in *A* to proceed even if the nominated catalyst is not present). Finally, we set *g*(*r*) = *ρ*(*r*) (in the expanded model, so *ρ*(*r*) includes the nominated catalyst). Then the *gf*−compatible subsets of *Y* correspond exactly to the RAFs in the expanded CRS under the recent modified definition of RAF[17], and$\overline{\psi}\left(Y\right)$ is just what we call *s*(*Y*) (the maxRAF for the expansion *Y* of$\mathcal{R}$). Theorem 2(ii) asserts this maxRAF can be found by the *gf*−algorithm, which is just the modified RAF algorithm[17] applied in the expanded CRS, and the fact this RAF is the unique maximal RAF follows from the fact that the function *A*↦cl_{
A
}(*F*) is monotone in *A*.

The connection described assumes that we are working within the expanded CRS setting. However, we can easily relate this back to the original CRS setting by noting that if *A* is a set of reactions, and *A*^{
′
} is the expanded version (replacing each reaction by *k* copies each with a unique nominated catalyst) then cl_{
A
}(*F*) (in the original setting) coincides with${\text{cl}}_{{A}^{\prime}}\left(F\right)$ (in the expanded setting). Moreover, (i) for any RAF *A* in the original setting, in the expansion of *A* there is a subset (selecting an appropriate nominated catalyst for each reaction) that is an RAF in the expanded CRS, and (ii) for any RAF *A*^{
′
} in the expanded CRS, replacing the nominated catalyst of each reaction by its full complement of catalysts returns an RAF *A* in the original CRS.

Notice that, apart from the monotonicity of the function *f*(*A*) = cl_{
A
}(*F*), a major factor that helps in guaranteeing a polynomial-time algorithm in the RAF setting is that *f*(*A*) can be computed efficiently.

#### Novel and alternative applications

We now present a simple application of Theorem 2 in a toy economic setting. Suppose *Y* is a collection of individuals, each of whom produces or consumes different types of “goods”, labeled 1,2,…*k*. For an individual *y*∈*Y*, let *g*_{
i
}(*y*) be the maximum price individual *y* is able to pay for good *i* and let${f}_{i}^{\prime}\left(y\right)$ be the minimal price for which individual *y* is willing to produce good *i*. To allow greater generality, if individual *y* does not need good *i* we can just set *g*_{
i
}(*y*) = 0 and if individual *y* does not produce good *i* we can just set${f}_{i}^{\prime}\left(y\right)=\infty $. We assume that individuals can produce and sell as many goods as they wish (i.e. the individuals who are buying are not competing for a fixed number of items from any one seller).

We define a subset *A* of *Y* as *viable* if (i) it is non-empty, and (ii) every individual in *A* can afford to buy each good they need from at least one individual in *A*. We can formalize this as a *gf*−compatibility condition as follows.

*k*-dimensional Euclidean space with infinity added to each co-ordinate) partially ordered in the usual way: (

*x*

_{1},…,

*x*

_{ k })≤(

*y*

_{1},…,

*y*

_{ k }) if and only if

*x*

_{ i }≤

*y*

_{ i }for all

*i*. Note that in this example

*W*is not a collection of subsets of a set (as in the RAF setting). Further, let

*g*(

*y*):=(

*g*

_{1}(

*y*),…,

*g*

_{ k }(

*y*))∈

*W*, and for a set

*A*individuals (i.e.

*A*∈2

^{ Y }) let

Then a subset *A* of *Y* is viable precisely if for each *i* and each *y*∈*A*,${g}_{i}\left(y\right)\le max\left\{{f}_{j}^{\prime}\right(A):j=1,\dots ,k\}$, which is equivalent to *g*(*y*) ≤ *f*(*A*) for all *y* ∈ *A*. In other words, *A* is viable if and only if *A* is *gf*−compatible. Moreover, notice that *f* is monotone, and so Theorem 2(ii) applies, so if there is a stable set, then there is a unique maximal one, and it can be found in polynomial time in the size of the population, by using the *gf*−algorithm.

This provides a (simple) example of how the *gf*−algorithm can be applied in other contexts, such as economics. This is a first concrete step towards a generalized theory of autocatalytic sets, as we recently proposed[19].

As a further, and rather different, application we point out that the *gf*−algorithm also provides a polynomial-time solution to HORN-SAT, which is a basic problem in propositional logic, of deciding whether a given conjunction of Horn clauses is satisfiable[30]. Recall that a *Horn clause* is a clause with at most one positive literal, and any number of negative literals (a literal being a boolean variable which can be either ‘true’ or ‘false’). HORN-SAT is of interest as it is ‘P-complete’ (i.e. not only is it in the complexity class P of problems having polynomial-time solutions, but *every* problem in the complexity class P can be reduced to HORN-SAT).

Suppose then, that we have an instance of HORN-SAT consisting of a conjunction of a set$\mathcal{H}$ of *n* HORN clauses. Without loss of generality we will assume that not all the clauses in$\mathcal{H}$ contain a positive literal, as this is equivalent to the condition that assigning each literal the truth value ‘true’ satisfies every clause in$\mathcal{H}$, and this can be easily checked. We indicate this restriction by saying that$\mathcal{H}$ is a *proper* instance of HORN-SAT. Now we define the sets and functions we will use in the generalized RAF set-up. We take$W={2}^{\mathcal{H}}$ with the usual partial order on subsets. Let *Y* denote the set of all literals appearing in at least one clause in$\mathcal{H}$ (as a positive or negative literal). For a subset *A* of *Y* let *f*(*A*) be the set of clauses in$\mathcal{H}$ that contain at least one element of *A* as a negative literal. For *y*∈*Y*, let *g*(*y*) be the set of clauses in$\mathcal{H}$ which either contain *y* as a positive literal or else do not contain any positive literals. The following connection with *gf*−compatibility is established in the Appendix.

##### Lemma 1

For a proper instance$\mathcal{H}$ of HORN-SAT, a subset *A* of *Y* is *gf*−compatible if and only if the following truth assignment satisfies every clause in$\mathcal{H}$:

By Lemma 1, and the fact that *f* is monotone, we can invoke Theorem 2(ii) and deduce that the *gf*−algorithm determines whether or not a proper instance of HORN-SAT has a satisfying assignment, and if it does, it will construct the truth assignment that has a minimal set of literals set to ‘true’. This may all seem rather technical and irrelevant to chemistry, but it actually shows that a very specific algorithm that was inspired by and constructed for solving a chemical problem in the context of the origin of life (finding autocatalytic sets in chemical reaction systems), turns out to be capable (in its generalized form) of solving *any* problem that is within the problem class P. This is a surprising and interesting result from an algorithmic point of view, and could perhaps lead to another application of *molecular computation*[31].

## Conclusions

In our previous work, we already showed (both computationally and theoretically) that autocatalytic (RAF) sets are highly likely to exist. However, most of these results were based on graph theoretical properties of RAF sets. Here, we have shown that also in terms of dynamics such sets are indeed self-sustainable and can outcompete non-autocatalytic sets. Furthermore, these dynamical results confirm arguments made previously[19] about how RAF subsets can enable their own growth or give rise to other such subsets coming into existence.

Next, the extension described here of our RAF algorithm shows that more realistic scenarios (such as including inhibition) can also be dealt with within our framework. Despite the fact that the general problem of finding RAF sets when inhibition is present is NP-complete, in specific cases (such as when the number of inhibitors is not too large) it is still possible to detect RAF sets efficiently, due to our proof of this problem being fixed parameter tractable.

Finally, the generalization of our RAF algorithm shows that it can even be applied to areas outside of chemistry and origin of life, such as economics. This is an important first step towards a generalized theory of autocatalytic sets, as proposed in[19]. And, perhaps, it could lead to another application of molecular computation.

Of course there are still many further extensions possible. In terms of dynamics, a next step could be to consider multiple, possibly competing, compartments each having some (different) combination of subRAFs existent within them. This could then give rise to an evolutionary process along the lines of[23]. Also, it would be interesting to find further applications of the *gf*−algorithm outside of chemistry. We hope to work on some of these further extensions and generalizations in the future.

## Appendix: Proof of Lemma 1

*A*is

*gf*−compatible, and the truth assignment is as specified. Consider clause$c\in \mathcal{H}$. There are three possibilities:

- 1.
If

*c*contains a positive literal that is not in*A*then*c*is satisfied, since that positive literal is assigned the value ‘true’ under (5). - 2.
If

*c*contains a positive literal*y*in*A*then*c*∈*f*(*A*) (since*g*(*y*) ⊆*f*(*A*), as*y*∈*A*), and so*c*is satisfied under (5). - 3.
If

*c*contains no positive literal, then*c*is contained in*g*(*y*) for*any**y*∈*A*(and there exists at least one such*y*since*A*is non-empty), and so the condition*g*(*y*) ⊆*f*(*A*) (for*y*∈*A*) implies, once again, that*c*lies in*f*(*A*), and so*c*is satisfied under (5).

Thus all clauses in$\mathcal{H}$ are satisfied.

Conversely, suppose the truth assignment determined by some set *A* according to (5) satisfies every clause in$\mathcal{H}$. Then *A* cannot be the empty-set, otherwise every clause in$\mathcal{H}$ contains a positive literal, so$\mathcal{H}$ would not be proper. We wish to show that *g*(*y*)⊆*f*(*A*) for all *y*∈*A*. Consider clause *c*∈*g*(*y*). Then, by definition of *g*, either (i) *c* has no positive literal, or (ii) *c* has a positive literal and it is *y*, which lies in *A*. In case (i), the assumption that *c* is satisfied implies that at least one of the negative literals in *c* is set to false, which means one of these literals must be in the set *A*. Consequently *c*∈*f*(*A*). Similarly, in case (ii), since the positive literal *y* ∈ *A* is set to ‘false’ at least one of the negated literals in *c* must be set to false, which again requires this literal to lie in *A*, and hence *c*∈*f*(*A*). Thus *g*(*y*)⊆*f*(*A*) for all *y*∈*A*, as required.

## Declarations

### Acknowledgements

MS thanks the Royal Society of New Zealand for funding support. We thank Stuart Kauffman for helpful and stimulating discussions.

## Authors’ Affiliations

## References

- Kauffman SA:
**Cellular homeostasis, epigenesis and replication in randomly aggregated macromolecular systems.***J Cybernetics*1971,**1:**71–96. 10.1080/01969727108545830View ArticleGoogle Scholar - Eigen M, Schuster P:
**The hypercycle: a principle of natural self-organization. Part A: Emergence of the hypercycle.***Naturwissenschaften*1977,**64:**541–565. 10.1007/BF00450633View ArticleGoogle Scholar - Dyson FJ:
**A model for the origin of life.***J Mol Evolution*1982,**18:**344–350. 10.1007/BF01733901View ArticleGoogle Scholar - Wächterhäuser G:
**Evolution of the first metabolic cycles.***PNAS*1990,**87:**200–204. 10.1073/pnas.87.1.200View ArticleGoogle Scholar - Gánti T:
**Biogenesis itself.***J Theor Biol*1997,**187:**583–593. 10.1006/jtbi.1996.0391View ArticleGoogle Scholar - Rosen R:
*Life Itself*. Columbia University Press, New York; 1991.Google Scholar - Letelier JC, Soto-Andrade J, Abarzúa FG, Cornish-Bowden A, Cárdenas ML:
**Organizational invariance and metabolic closure: Analysis in terms of (M;R) systems.***J Theor Biol*2006,**238:**949–961. 10.1016/j.jtbi.2005.07.007View ArticleGoogle Scholar - Sievers D, von Kiedrowski G:
**Self-replication of complementary nucleotide-based oligomers.***Nature*1994,**369:**221–224. 10.1038/369221a0View ArticleGoogle Scholar - Ashkenasy G, Jegasia R, Yadav M, Ghadiri MR:
**Design of a directed molecular network.***PNAS*2004,**101**(30):10872–10877. 10.1073/pnas.0402674101View ArticleGoogle Scholar - Hayden EJ, von Kiedrowski G, Lehman N:
**Systems chemistry on ribozyme self-construction: Evidence for anabolic autocatalysis in a recombination network.***Angew Chem Int Ed*2008,**120:**8552–8556. 10.1002/ange.200802177View ArticleGoogle Scholar - Taran O, Thoennessen O, Achilles K, von Kiedrowski G:
**Synthesis of information-carrying polymers of mixed sequences from double stranded short deoxynucleotides.***J Syst Chem*2010,**1:**9. 10.1186/1759-2208-1-9View ArticleGoogle Scholar - Braakman R, Smith E:
**The emergence and early evolution of biological carbon-fixation.***PLoS Comput Biol*2012,**8**(4):e1002455. 10.1371/journal.pcbi.1002455View ArticleGoogle Scholar - Steel M:
**The emergence of a self-catalysing structure in abstract origin-of-life models.***Appl Mathematics Lett*2000,**3:**91–95.View ArticleGoogle Scholar - Hordijk W, Steel M:
**Detecting autocatalytic, self-sustaining sets in chemical reaction systems.***J Theor Biol*2004,**227**(4):451–461. 10.1016/j.jtbi.2003.11.020View ArticleGoogle Scholar - Mossel E, Steel M:
**Random biochemical networks: The probability of self-sustaining autocatalysis.***J Theor Biol*2005,**233**(3):327–336. 10.1016/j.jtbi.2004.10.011View ArticleGoogle Scholar - Hordijk W, Hein J, Steel M:
**Autocatalytic sets and the origin of life.***Entropy*2010,**12**(7):1733–1742. 10.3390/e12071733View ArticleGoogle Scholar - Hordijk W, Kauffman SA, Steel M:
**Required levels of catalysis for emergence of autocatalytic sets in models of chemical reaction systems.***Int J Molecular Sciences*2011,**12**(5):3085–3101. 10.3390/ijms12053085View ArticleGoogle Scholar - Hordijk W, Steel M:
**Predicting template-based catalysis rates in a simple catalytic reaction model.***J Theor Biol*2012,**295:**132–138.View ArticleGoogle Scholar - Hordijk W, Steel M, Kauffman S:
**The structure of autocatalytic sets: Evolvability, enablement, and emergence.***Acta Biotheoretica*2012,**60**(4):379–392. 10.1007/s10441-012-9165-1View ArticleGoogle Scholar - Kauffman SA:
**Autocatalytic sets of proteins.***J Theor Biol*1986,**119:**1–24. 10.1016/S0022-5193(86)80047-9View ArticleGoogle Scholar - Kauffman SA:
*The Origins of Order*. Oxford University Press, New York; 1993.Google Scholar - Andersen JL, Flamm C, Merkle D, Stadler PF:
**Maximizing output and recognizing autocatalysis in chemical reaction networks is NP-complete.***J Syst Chem*2012,**3:**1. 10.1186/1759-2208-3-1View ArticleGoogle Scholar - Vasas V, Fernando C, Santos M, Kauffman S, Sathmáry E:
**Evolution before genes.***Biol Direct*2012,**7:**1. 10.1186/1745-6150-7-1View ArticleGoogle Scholar - Filisetti A, Graudenzi A, Serra R, Villani M, Füchslin RM, Kauffman SA, Packard N, Poli I, De Lucrezia D:
**A stochastic model of the emergence of autocatalytic cycles.***J Syst Chem*2011,**2:**2. 10.1186/1759-2208-2-2View ArticleGoogle Scholar - Gillespie DT:
**A general method for numerically simulating the stochastic time evolution of coupled chemical reactions.***J Comput Phys*1976,**22:**403–434. 10.1016/0021-9991(76)90041-3View ArticleGoogle Scholar - Gillespie DT:
**Exact stochastic simulation of coupled chemical reactions.***J Physical Chem*1977,**81**(25):2340–2361. 10.1021/j100540a008View ArticleGoogle Scholar - Segré D, Ben-Eli D, Deamer DW, Lancet D:
**The lipid world.***Origin of Life and Evol Biospheres*2001,**31**(1–2):119–145.View ArticleGoogle Scholar - Martin W, Russel MJ:
**On the origins of cells: A hypothesis for the evolutionary transition from abiotic geochemistry to chemoautotrophic prokaryotes, and from prokaryotes to nucleated cells.***Philos Trans R Soc B*2003,**358:**59–85. 10.1098/rstb.2002.1183View ArticleGoogle Scholar - Martin W, Russel MJ:
**On the origin of biochemistry at an alkaline hydrothermal vent.***Philos Trans R Soc B*2007,**362:**1887–1925. 10.1098/rstb.2006.1881View ArticleGoogle Scholar - Papadimitriou CH:
*Computational complexity*. Addison Wesley, Boston; 1994.Google Scholar - Adleman LM:
**Molecular computation of solutions to combinatorial problems.***Science*1994,**266**(5187):1021–1024. 10.1126/science.7973651View ArticleGoogle Scholar

## Copyright

This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.