An insertion-deletion system, or insdel system for short, is an interesting system for describing languages that, in certain incarnations, actually has enough power to describe recursively enumerable languages. Because insertion and deletion operations are basic in RNA and DNA processes, this formalism may have future applications in DNA computing and more general molecular computing.
Formally, an insdel system is a 5-tuple:
G = (&Sigma, T, A, I D)
where Σ is an alphabet, T is a subset of Σ* known as its terminal alphabet, A is a finite subset of Σ*, known as the set of axioms, and I and D are finite subsets of Σ* × Σ* × Σ* (i.e. sets of triples of words over Σ), which are the insertion and deletion rules respectively.
An insertion or deletion rule is a triple (u, z, v), where u, z, and v are words from Σ*. Here, u and v are the context where an insertion or deletion can occur and z is the word that is to be inserted or deleted from the input. For any x and y in Σ*, we say x => y if one of the following is true:
- x = x1uvx2 and y = x1uzvx2, for some x1, x2 in Σ* and (u, z, v) in I (an insertion step), or
- x = x1uzvx2 and y = x1uvx2, for some x1, x2 in Σ* and (u, z, v) in D (a deletion step)
So, to execute an insdel grammar, one would scan the input and nondeterministically find an insertion or deletion rule that would match a portion of the input, and write an output string following the rule.
If =>* is the reflexive and transitive closure of =>, the language generated by G is defined as:
L(G) = { w in T*| x =>* w, for some x in A}
To further characterize an insdel system G = (&Sigma, T, A, I D), we say it has weight (n, m, p, q) if
max{|z| | (u, z, v) in I} = n (i.e. the longest string being inserted has length n)
max{|u| | (u, z, v) in I or (v, z, u) in I} = m (i.e. the longest context where an insertion can occur is of length m)
max{|z| | (u, z, v) in D} = p (i.e. the longest string being deleted has length p)
max{|u| | (u, z, v) in D or (v, z, u) in D} = q (i.e. the longest context where a deletion can occur is of length m)
It is interesting to note that it can be proved that insdel systems with surprisingly low values for the weights can actually generate recursively enumerable languages, and can thus form the basis for a universal model of computation (i.e. you can make an insdel system with those weights simulate a universal Turing machine of some kind). In particular, it has been shown that insdel systems with weights (1, 2, 1, 1), (2, 1, 2, 0), and (1, 2, 2, 0) suffice to simulate any semi-Thue grammar.
Even restricting insdel systems in such a way that its insertion or deletion rules (u, z, v) are such that z consists only of words that consist of a single symbol c doesn't remove its ability to generate recursively enumerable languages. Even restricting the contexts to include only one additional symbol a does not change this property. This is important because the U (uracil) nitrogenous base in RNA is much easier to insert or remove from a given RNA sequence than any other type, so a universal molecular computer based on RNA is mathematically feasible.
For more information, see the paper by Lila Kari, Gheorghe Paun, Gabriel Thierrin, and Sheng Yu: "At the Crossroads of Linguistics, DNA Computing, and Formal Language Theory: Characterizing RE Using Insertion-Deletion Systems".