Definitions
In the following paragraphs we formally introduce chemical reaction networks. We emphasize that our setup is the same as in the literature on flux analysis; we have opted, however, for a somewhat different notation that is closer to the conventions commonly used in graph theory as this makes the subsequent discussion more concise.
A chemical reaction network (CRN) is represented a directed multihypergraph G(V, E) consisting of a vertex set V, the compounds, and a set E of directed hyperedges encoding the reactions [2]. Each reaction e ∈ E is a pair (e^{}, e^{+}) of multisets e^{}, e^{+} ⊆ V of compounds, denoting the educts and products of the reaction e. The stoichiometric coefficients s_{x,e} and s_{x,e+}are represented by the multiplicity of the compounds in the multisets. For instance, the hyperedge encoding
{C}_{2}{H}_{2}+2{H}_{2}O\to {\left(C{H}_{2}OH\right)}_{2}
reads
(\{{C}_{2}{H}_{2}\mathrm{,}{H}_{2}O\mathrm{,}{H}_{2}O\left\}\mathrm{,}\right\{{(C{H}_{2}OH)}_{2}\})
Reversible reactions are encoded by a pair of forward and backward reactions. The entries of the stoichiometric matrix are recovered as S_{x,e}= s_{x,e+} s_{x,e}.
In addition to the ordinary reactions like the one above, CRNs also contain pseudoreactions E' representing influx and outflux of compounds of the form e_{in(x)}= ({x_{
in
}}, {x}) and e_{out(x)}= ({x}, {x_{
out
}}) where x_{
in
} and x_{
out
} refer to external reservoirs. These are additional vertices V' distinct from V. These pseudoreactions feed the CRN and remove "waste products" and extract a desired output. In particular, the x_{
in
}, y_{
out
} ∈ V' do not take part in any other reaction.
A flow on the directed hypergraph G is a function f : E ∪ E'→ ℕ_{0} such that, for each compound x ∈ V, the condition
{\displaystyle \sum _{e\in E\cup {E}^{\prime}}f(e)\left({s}_{x,{e}^{\u2013}}{s}_{x\mathrm{,}{e}^{+}}\right)=0}
(1)
is satisfied. This condition enforce that the total production and the total consumption of x is balanced, i.e., the CRN is in a stationary state. The total consumption of an input material x is therefore
f({e}_{in(x)})={\displaystyle \sum _{e\in E}f(e)({s}_{x\mathrm{,}{e}^{}}{s}_{x\mathrm{,}{e}^{+}})}
(2)
and the total outflux of a product is
f({e}_{out(x)})={\displaystyle \sum _{e\in E}f(e)({s}_{x\mathrm{,}{e}^{+}}{s}_{x\mathrm{,}{e}^{}})}
(3)
We say that a species x is produced in a network if f(e_{out(x)}) > 0.
Note that this definition of f naturally generalized the definition of an (integer) flow on a directed graph with source x_{
in
} and target y_{
out
}, see e.g., [23]. In [26], a generalization of equ.(1), although restricted to hypergraphs with e^{+} = 1, is considered, where the flows add up to a vertexdependent demand term rather than to zero. In contrast to the usual setting of flow problems, we have a nontrivial restriction on the capacity only for the input edge(s), while the values of f are unrestricted for all other hyperedges.
Formulation of the problems
MAXCRNOutput
Given a chemical reaction network with n nodes, of which any subset may have influx or outflux, find a flow f that maximizes the outflow f(e_{out(y)}) to a specified output node y_{
out
}.
MAXCRN(d)Output
Given a chemical reaction network with n nodes, reactions (hyperedges) with indegree and outdegree at most d, where any subset of vertices may have influx or outflux, find a flow f that maximizes the outflow f(e_{out(y)}) to a specified output node y_{
out
}.
MAXCRN(d)Output1
Given a chemical reaction network with n nodes, reactions (hyperedges) with indegree and outdegree at most d, and a single vertex with influx where any subset of vertices may have outflux, find a flow f that maximizes the outflow f(e_{out(y)}) to a specified output node y_{
out
}.
Autocata
Given a chemical reaction network with n nodes and one or more input sources, determine whether there is a source node x such that:

1.
x cannot be produced from all other source molecules, i.e., for all flows f, f(e_{in(x)}) = 0 implies f(e_{out(x)}) = 0; and

2.
x can be produced in a quantity that is larger than its inflow, i.e., there is a flow f such that f(e_{out(x)}) > f (e_{in(x)}) > 0.
Outline
Formally, NPcompleteness is defined for decision problems [27]. Optimization problems can be converted into decision problems by asking whether they admit a solution that is at least as good as some value. By abuse of language, it therefore makes sense to speak of an "NPcomplete optimization problem" instead of using the phrase "the decision problem corresponding to our optimization problem is NPcomplete".
The basic idea of proving that problem \mathfrak{X} is NPcomplete is to find a socalled reduction ρ from another problem \mathfrak{B} that is already known to be NPcomplete. The reduction ρ is an algorithm with polynomial runtime that converts any given instance of \mathfrak{B} into an instance of \mathfrak{X}. An efficient (i.e., polynomial time) algorithm to solve (all instances of) \mathfrak{X}, therefore would also provide an efficient solution for every instance P\in \mathfrak{B} by simply reducing P to \rho \left(P\right)\in \mathfrak{X} then solving ρ(P). Hence we can conclude that \mathfrak{X} is a hard problem when a known hard problem \mathfrak{B} can be reduced to it.
In this section we devise a procedure that reduces every instance of the socalled 3partition problem to a CRN with a single output pseudoreaction in such a way that solving the output maximization problem for the CRN also solves the 3partition problem. Thus optimizing output in CRNs is at least as hard as solving 3partition. The same basic construction is then modified to show that the CRN can be built in such a way that all reactions are monomolecular or bimolecular. We then employ the same construction to show that problem remains hard even if only a single source is provided. A simple modification finally establishes the hardness result for finding autocatalytic compounds.
3Partition
The 3partition problem (3PART) consists of deciding whether a given multiset of n = 3m integers s_{
i
}, i = 1, ..., 3m can be partitioned into triples that all have the same sum. This problem is one of the most famous strongly NPcomplete problems, i.e., it stays NPcomplete even when the numbers in the input instance are given in unary encoding [28], i.e., their values grows not faster than a polynomial in the problem size n. This remains true when the s_{
i
} are distinct [29]. If B denotes the desired sum of each subset then 3PART remains strongly NPcomplete even if for every integer B{/}_{4}<{s}_{i}<B{/}_{2} holds. The latter fact will be employed in our reduction proof in order to be able to show that an optimal mass flow through the network must have certain properties.
Basic Construction
Given an instance of 3PART we construct the associated CRN in a stepwise fashion. The first step is a latticelike labeled graph, Figure 2(A), that consists of one input node for each s_{
i
}, m auxiliary nodes Z_{
j
}, each of which has an influx of 1/{m}_{}{\displaystyle {\sum}_{i}{s}_{i}}=s/{m}_{}, an output sink node, 3m × m switch nodes, 3m waste nodes at the right and m waste nodes at the bottom. These switch nodes have two inputs; l from the left and u from above, and three outputs; r towards the right, d downwards, and o into the output channel. Each of the switch nodes can be in one of two distinct states: either it is
off The node transmits all its left input to right and all its input from above downwards, no flow is then diverted towards the output, i.e., r = l, d = u, o = 0; or
on The node consumes its entire input from the left (and thus transmits nothing to the right), at the same time uses up a corresponding amount of the input from above, and diverts the rest towards the output. Note that switch nodes are designed such that the flow downwards needs to be reduced by the same quantity as the flow to the right. As the flow to the right is completely consumed, i.e., the corresponding flow is reduced by l, it holds r = 0, d = u  l, o = l.
All flux along the output channel is collected in the output node, i.e., given a particular state of the switch nodes, the flux into the output node is the sum of the fluxes consumed from the left.
Lemma 1. An assignment of "on" and "off" to the 3m × m switch nodes is a solution of the original 3PART problem if and only if the total flow in the output node O equals the maximally possible value s = ∑_{
i
} s_{
i
}.
Proof. Consider the CRN in Figure 2 with 3m × m switch nodes. Each column corresponds to one of the m desired subsets of the underlying instance of 3PART, each row corresponds to one of the 3m integer values s_{
i
}. Note that any assignment of "on" and "off" to switch nodes will split the overall horizontal as well as the overall vertical inflow into two parts: a part directed to waste material and an output part directed to node O. Let w_{
H
} (resp. w_{
V
}) be the overall horizontally (resp. vertically) produced waste. For any assignment of "on" and "off" states to switch nodes s = f(e_{out(O)}) + w_{
H
} = f(e_{out(O)}) + w_{
V
} is invariant. Obviously, if w_{
H
} = w_{
V
} = 0, then the outflow f(e_{out(O)}) to node O is maximal. Furthermore note that at most one switch can be in "on" state in each row.
Consider an assignment of "on" and "off" to the switch nodes that corresponds to a solution of the original 3PART problem. Thus exactly 3m switch nodes are in mode "on" (three per column and one per row). As one switch node per row i is in mode "on", the outflux s_{
i
} of node Q_{
i
} flows to output node O and the waste produced horizontally in row i is 0. As this is true for all rows, w_{
H
} = w_{
V
} = 0 holds and the total flow in the output node O is s which is maximal.
Assume that the flow in the output node is the maximal possible value s, and therefore w_{
H
} = w_{
V
} = 0 holds. This implies that exactly one switch node per row needs to be in mode "on". As we can assume s/{}_{4m}<{s}_{i}<s/{}_{2m} exactly 3 switch nodes per column need to be in state "on". The overall assignment is therefore a solution to the original 3PART problem. □
Of course, the intermediate network in Figure 2(A) is not (yet) an proper CRN. To achieve this goal, we have to replace the switch nodes by hypergraphs that implement the highlevel rule governing their behavior.
Implementing Switchnodes
Suppose the molecules emitted from the 3m input nodes are all of different types Q_{
i
}, and distinguish the m types of inputs from above as Z_{
j
}. Then the switch node (i, j) must implement a net reaction of the form
{s}_{i}{Q}_{i}+{s}_{i}{Z}_{j}\to {s}_{i}O
(4)
where O is the type of the output molecule. This net reaction can be split into four subsequent reactions:
\begin{array}{ll}\hfill {s}_{i}{Q}_{i}& \to {W}_{ij}\phantom{\rule{2em}{0ex}}\\ \hfill {s}_{i}{Z}_{j}& \to {V}_{ij}\phantom{\rule{2em}{0ex}}\\ \hfill {V}_{ij}+{W}_{ij}& \to {X}_{ij}\phantom{\rule{2em}{0ex}}\\ \hfill {X}_{ij}& \to {s}_{i}O\phantom{\rule{2em}{0ex}}\end{array}
(5)
We see that the switch node (i, j) can be in the "on"state only if it received at least s_{
i
} copies of the input from the left and a matching number of input molecules from above. A graphical description of this partial network is shown in Figure 2(B). Since the input from the left is limited to s_{
i
} copies of Q_{
i
}, either none or a single molecule of the intermediate X_{
ij
} is produced, depending on whether (i, j) is "on" or not. Clearly, for each i, only a single one of the switches (i, j) can be "on".
Note that equ.(5) already provides the necessary device to complete the proof. If we insist that the CRN may use at most bimolecular reactions, we have to find a way to implement the reactions s_{
i
}Q_{
i
} → W_{
ij
}, X_{
ij
} → s_{
i
}O, and X_{
ij
} → s_{
i
}O by more restricted elementary reactions. This will the topic of the following section. According to equ.(5) each diamond node is replaced by 3(s_{
i
} +1) vertices, so that the entire network has 6m+2m+1+m{\sum}_{i=1}^{3m}3\left({s}_{i}+1\right)=8m+3sm+3{m}^{2}+1 nodes. Thus, all instances of 3PART for which s = s(m) is polynomially bounded in m can be reduced to a maximum output problem on an equivalent CRN. We explicitly use the fact that 3PART is strongly NPcomplete: we need that m is polynomially bounded by the network size n to ensure that s, and thus the reduction to 3PART, remains polynomial. We know the maximal outflux of the CRN and can therefore use a simple guessandcheck argument to show that MAXCRNOutput is in NP. Our discussion thus establishes
Theorem 1. MAXCRNOutput is strongly NPcomplete when the number of inputs into the CRN and number of educts in a chemical reaction is unrestricted.
We remark the our CRNs need to have at least two output nodes, one for the desired product and one to collect all waste products.
Restriction to Bimolecular Reactions
In this section we show that the problem does not become easier when the CRN has only a single input and all reactions are bimolecular. To this end we further refine the reactions s_{
i
}Q_{
i
} → W_{
ij
}, X_{
ij
} → s_{
i
}O, and X_{
ij
} → s_{
i
}O. We will make use of two specialized types of edges that can be implemented by bimolecular reactions.
The first type of edge merges exactly k identical molecules into 1 molecule (the corresponding edges will be referred to as mergeedges). The second type of edge expands one molecule to exactly k identical molecules (expansionedges). We first focus on a specific type of merge and expansionedges: mergeedges of type (2^{u} → 1) can easily be implemented by u subsequent reactions f^{i}, i = 1, ..., u that iteratively create (doublesized) molecules out of 2 identical molecules. Formally, let I = X_{1} and O = X_{u+1}then f^{i}is defined by
2{X}_{i}\to {X}_{i+1},
(6)
and the corresponding flow is chosen to be f^{i}({X_{
i
}, X_{i+1}}) := 2^{ui}. Symmetrically, expansionedges of type (1 → 2^{u}) can be implemented by u subsequent reactions that split molecules repeatedly into two equal molecules. These (2^{u} → 1)mergeedges (resp. (1 → 2^{u})expansionedges) will in the following be used to implement the generalized merge and expansionedges.
Let b_{m1}b_{m2}... b_{0} be the binary representation of k > 0 with m = ⌊log k⌋ + 1, and let B = {i_{1}, i_{2}, ..., i_{
r
}} be the indices of all nonzero bits, i.e., i ∈ B with b_{
i
} = 1. The underlying idea for the merging of k molecules of type I into 1 molecule of type O is to split the outflow k of I into r individual flows, i.e., k={\sum}_{j=1}^{r}{2}^{{i}_{j}1}. We remark that this representation is unique. These flows of quantity {2}^{{i}_{j}1}, j = 1, ..., r are then individually reduced to flows of size 1. The resulting r flows of quantity 1 are then all merged to a flow of one molecule of quantity 1. The implementation of generalized mergeedges is depicted in Figure 3(A). Expansionedges that expand the flow of one molecule of quantity 1 to a flow of one molecule of quantity k can be implemented analogously. First, a flow of quantity 1 of one molecule is changed into r flows of quantity 1, then these r flows are expanded to r flows of quantity {2}^{{i}_{j}1}, j = 1, ..., r, and then these flows are iteratively summed up. The details are depicted in Figure 3(B). Clearly, merge and expansion edges can be employed for the refinement of reactions s_{
i
}Q_{
i
} → W_{
ij
}, X_{
ij
} → s_{
i
}O, and X_{
ij
} → s_{
i
}O in equ.(5). The number of additional edges and nodes to implement a (k → 1) mergeedge is O(log^{2} k), as there are O(log k) flows after the split into individual flows, and each individual flow employs O(log k) edges for the (k → 1) merge (with k being a power of 2). Symmetrically a (1 → k) expansionedge uses O(log^{2} k) bimolecular edges and additional compounds. Based on this polynomial extension and as all merge and expansion reactions are bimolecular, we have the following
Corollary 1. MAXCRN(2)Output is strongly NPcomplete.
Restriction to a single input
To show that MAXCRNOutput is NPcomplete even if we have a single input only, we require an additional edge type that is implemented by connecting a (k → 1)mergeedge and a (1 → k)expansion edge in series. Such an edge ensures that exactly k (or exactly a multiplicity of k) input molecules react to the same number of output molecules. We will refer to these edges as (k)forceflowedges. Note, that such edges do not change the quantity of a flow. The number of additional edges and nodes required to implement a (k)forceflow edge is O(log^{2} k).
So far we assumed input nodes Q_{
i
} with corresponding influx s_{
i
}, i = 1, ..., 3m, plus the m additional input nodes Z_{1}, ..., Z_{
m
} with influx s=1/{m}_{}{\displaystyle {\sum}_{i}{s}_{i}} each. In the following we will describe how to extend the construction of the CRN based on an instance of the 3PART problem (cmp. Figure 2) such that there is only a single input node. Note that all s_{
i
}, m, and the influx to nodes Z_{
i
} are defined by the given 3PART instance.
Influx to nodes Q_{
i
}
In the extended CRN the nodes Q_{
i
} will be internal nodes with influx s_{
i
}. In order to achieve this we will add a single input node Q with influx s', where s' is the integer representation of the concatenation of the rbit binary representation of all s_{
i
}, i.e.,
{s}^{\prime}=\sum _{i=1}^{3m}{s}_{i}\times {2}^{r\left(i1\right)},\mathsf{\text{with}}r=\mathrm{max}\left\{\u230a\mathrm{log}{s}_{i}\u230b\right\}+1
(7)
Attached to node Q will be a subnetwork that splits the flux s' into the fluxes s_{1}, ..., s_{3m}by iteratively using the last r bits of the remaining flux as influx to a node Q_{
i
}, and then divide the remaining flux by 2^{r}. The hypergraph structure to implement this with bimolecular reactions only is depicted in Figure 4. All dashed lines with red rectangles indicate forceflowedges (the number in the rectangle indicates the enforced flow), all red edges with open arrowheads indicate merge or expansionedges. To enforce that exactly (and not a multiplicity) of s_{
i
} molecules flow towards node Q_{
i
}, the flow downwards needs to be maximized. This is done by introducing an additional outflux node: the flux of quantity s_{3m}≥ 1 towards O' is multiplied by a factor c, such that the additional overall nonwaste outflux to O' dominates any other nonwaste outflux. This can be ensured by choosing the factor c as the maximal possible influx to Q, i.e., c = 2^{r×3m} 1 (the binary representation of c has r × 3m bit all set to 1). The number of additional edges and nodes is polynomially bound and the overall outflux of the extended network is then s_{3m}× c + ∑_{
i
} s_{
i
}. As all outflux can be easily merged in a binary fashion as applied in the definition of expansionedges, the resulting CRN has only a single input node and a single nonwaste output node.
Influx to nodes Z_{
i
}
In order to have nodes Z_{
i
} (cmp. Figure 2) as internal nodes, we split the outflux from node Q of quantity s' in two fluxes of quantity s'  1 and 1 (by employing forceflowedges), that will be directly merged again and be used as influx of quantity s' to node Q'. However, this simple splitting procedure gives a flux of quantity 1. This simple flux is easily transformed into m fluxes of quantity 1, which are then multiplied by ^{s}/_{
m
} using expansionedges, and then used as the input towards the internal nodes Z_{
i
}.
Recall, that the number of nodes and edges needed for a forceflowedge of quantity k is O(log^{2} k). The number of bits for the maximal flux on any forceflowedge is O(r × 3m). As 3PART is strongly NPcomplete we can assume that all s_{
i
} are polynomially bound in m, and therefore r ∈ O(log m). Therefore the maximal flux on any edge is O(2^{m log m}). The number of additional nodes and edges is therefore O(m^{2} log^{2}m) per forceflowedge. As the construction needs O(m) additional forceflowedges, the overall number of additional nodes and edges is O(m^{3} log^{2} m). Therefore the following corollary easily follows:
Corollary 2. MAXCRN(2)Output1 is NPcomplete.
Autocatalysis
The NPcompleteness of detecting an autocatalytic species can be shown by expanding the CRN used for showing the NPcompleteness of MAXCRN(2)Output1. Let O be the output node, where an outflux of s_{3m}× c + ∑_{
i
} s_{
i
} can be detected iff the underlying instance of 3PART is solved. We add a mergeedge from O towards an additional node A' to create an outflux of exactly 1 from A'. The CRN is furthermore extended by the following two additional reactions, where compound A is an input and an output node of the CRN.
\begin{array}{ll}\hfill {A}^{\prime}+A\phantom{\rule{1em}{0ex}}& \to \phantom{\rule{1em}{0ex}}2B\phantom{\rule{2em}{0ex}}\\ \hfill B\phantom{\rule{1em}{0ex}}& \to \phantom{\rule{1em}{0ex}}A\phantom{\rule{2em}{0ex}}\end{array}
The outflux of A' is 1, if and only if

1.
Compound A cannot be produced from all other source molecules, i.e., for all flows f(e_{in(A)}) = 0 implies f(e_{out(A)}) = 0, and

2.
two A can be produced if there is an inflow of one A, i.e., there is a flow f such that f(e_{out(A)}) > f (e_{in(A)}) > 0.
The construction of our reduction highlights the difficult part in determining autocatalysts. This is not so much finding the autocatalytic cycle itself but to ensure that the building blocks are provided from the "food source" through an in principle arbitrarily complicated subnetwork.