Generating Fractal-Like Sequences using Convolution Grammars
Table of Contents
Update (14.11.2023): Not very surprisingly, I finally found relevant existing literature. I knew that I was lacking the right search terms! While glancing at this paper, I discovered that what I was exploring and playing with for the most part goes under the name Toeplitz words, and researching this term lead me almost straight to this paper. It reads almost like someone has carefully read this post and decided to answer most my questions (but doing that back in 1997) – it felt like someone got into my head and one after another showed examples and theorems confirming multiple conjectures and intuitions I could not put in to proper math. To be honest, I am a bit sad for the wasted time and effort, but also quite happy to have closure and proper answers – now I can let this go and move on. On the bright side, I learned some bits about classes of infinite words beyond (ultimately) periodic, and some of the concepts might be useful to me in the future. Also, the generalization to full grammars of such patterns I have not seen being done before, so parts of the following post are still not completely redundant.
In this post I will define and explore a non-standard version of formal grammars I have come up with and not yet encountered before in literature. I call them convolution grammars. These grammars can generate languages of infinite words which are aperiodic, but self-similar, and thus are somewhat fractal-like. If this sounds interesting to you, read on.
Disclaimer: This is not fleshed out and fully rigorous yet, and I do not have the time to do it properly (this is just one of my free-time hobbies). But I built up some formal machinery I think is useful, and provide some observations and facts I believe to be true based on empirical evidence and intuition. For most statements I provide only some proof ideas or sketches and I am not sure how to best tackle a formal proof, but I think (or hope) it is all quite manageable, given some motivation and time.
Motivational Backstory
I rather randomly stumbled onto this paper while
reading about aperiodic tilings, and while most of it was over my head, I was curious
about the infinite “Sudoku-like Puzzle” (see sections 7-9 of the paper) that was used to
derive the main results.
The key ingredient for the whole construction is an interesting function
$f_q: \mathbb{Z} \to \Sigma$, for some $q>0$ and $\Sigma$ defined as
$(\mathbb{Z}/q\mathbb{Z}) \setminus \{0\}$:
Basic 2-adically structured function:
$f_q(q^{k}m) := m\mod q$
whenever $k\geq 0$ and $m$ is an integer not divisible by $q$,
meaning that $f_q(n)$ is the last non-zero digit in the base $q$ expansion of $n$:
//=
return %
While it is very simple, it is an clever way to generate an infinite aperiodic sequence that I have not seen before. It is easy to see that the sequence cannot be periodic, because it forms an intertwined infinite cascade of cyclic “digit counters” - whenever we hit a 0, we delegate the actual value to the higher-order cycle, recursively. Yet, the generated sequence is also self-similar in a very regular way. For all $k\geq 0$, $f_q(n) = f_q(q^k n)$, so restricting the domain from $\mathbb{Z}$ to $\mathbb{N}$ and treating it as a sequence, we can say that $f_q$ contains infinitely many stretched-out, nested copies of itself.
Example for $q=4$: 123112321233123112311232123312321231123212331233 …
I took this sequence and started to look at it from a formal language perspective - how can one generate more sequences with similar properties, considering them as infinite strings from $\Sigma^\omega$? The rest of the post is about what I came up with in my exploration of these questions and decided to call convolution grammars.
Preliminaries
Convolution Grammars
A convolution grammar is a 4-tuple $G := (N, Σ, P, S)$ where
- $|\Sigma|\geq 2$, $S \in N$, $N\cap\Sigma = \emptyset$, $P: N\to\Sigma(\Sigma\cup N)^*$
- $N$ is a finite set of non-terminal symbols,
- $\Sigma$ is a finite set of terminal symbols,
- the function $P$ defines the seed patterns, and
- the non-terminal symbol $S$ defines the initial seed pattern.
We assume w.l.o.g. that all symbols are useful, i.e., informally, that every (non-)terminal is “reachable” using a sequence of derivations (which will be explained below).
Note that in the definition above all seed patterns must begin with a terminal symbol, and that for each non-terminal there is exactly one seed pattern (i.e. “production rule”). In principle, both restrictions could be lifted, but this variant is expressive enough to already allow for very interesting behaviours. In fact, we will later further restrict the kind of grammars to an especially well-behaved subclass.
When there is no ambiguity, we will identify the patterns by their non-terminal symbols, i.e. may say pattern $X$ and mean $P(X)$. Furthermore, we will follow the usual convention of denoting non-terminals with captial letters, while terminal symbols can be digits or lower-case letters $a, b, c, d$, whereas lowercase letters $p, …, z$ will usually denote finite or infinite sequences.
Note that each pattern $w = P(X)$ for some $X \in N$ represents an infinite periodic sequence $w^\omega \in (\Sigma\cup N)^\omega$ that cyclically repeats the finite pattern. Convolution grammars actually describe actions on infinite words, so it is best to think of the finite patterns as an auxiliary representation. We will freely switch between the two views, whichever is more convenient.
While syntactically such a grammar looks very much like a classical regular or context-free grammar, the notion of derivation we will use is very different from the standard setting. Each derviation step consists of performing a single convolution of the current pattern with one of the seed patterns (i.e. one of those defined in the grammar), producing another, longer (non-seed) pattern.
Index Sequences
To make the required notion of convolution definable in a convenient way, it is useful to introduce a notation to pick out or replace subsequences of a longer sequence.
An index sequence is a strictly monotonically increasing sequence $I: D \to D’$, with $D’ \subseteq D \subseteq \mathbb{N}$. The index set $D$ is either equal to $\mathbb{N}$ (for infinite subsequences of infinite sequences), or contains the numbers in $\{1, 2, …, n\}$ for some $n \in \mathbb{N}$ (for finite subsequences). The correct $D$ and $D’$ will be assumed to be clear from context.
Given a sequence $w = a_1 a_2 …$, $w(i) := a_i$ for $i \in \mathbb{N}$ with $1\leq i \leq |w|$ and a suitable index sequence $I$, $w(I)$ denotes the subsequence $a_{I(1)} a_{I(2)} …$. The sequence indexing all positions where $X\in N$ occurs in sequence $w$ will be denoted by $I^w_{=X}$.
We can modify an existing sequence $X$ using subsequence substitution, written as $[w(I) := v]$ for some $v \in (\Sigma \cup N)^+$, returning a new sequence $w’$ where all positions of $w$ indexed by $I$ are substituted symbols from $v^\omega$, i.e. we let $w’$ to be $w’(I(i)) = v^\omega(i)$ for all $i\in dom(I)$ and $w’(j) = w(j)$ for all $j \not\in img(I)$.
Example: Given the pattern $w = 01XY2XX3$,
we have $w(1) = 0, w(I^w_{=X}) = 3, 6, 7$ and $[w(I^w_{=X}) := ab] = 01aY2ba3$.
The Convolution Operation
The number of occurrences of $Y\in N$ in the finite or infinite sequence $w$ is denoted by $Occ_Y(w)$ and equals 0 if it does not occur, is a natural number if it does and the passed sequence is a pattern, or is $\omega$ if it does and the passed sequence is an infinite sequence cycling a pattern. Now we are finally ready define the convolution operation.
Finite Pattern Convolution:
Let $u, v \in (\Sigma \cup N)^+$ be two patterns, $X \in N$, and $Occ_{X}(u) > 0$,
The convolution $w = u \overset{X}{\circledast} v$ is a new pattern defined as follows.
Let $l := lcm(Occ_X(u), |v|), n := l/Occ_X(u)$ and $m=l/|v|$.
Then $w := [u^n(I^{u^n}_{=X}) := v^m]$, i.e. all ocurrences of $X$ in $u^n$ are
substituted by symbols from $v^m$.
When a grammar $G$ is understood and $w, w’$ are patterns and $X\in N$,
we write $w \overset{X}{\to} w’$ to denote a convolution step where
$w’ = w \overset{X}{\circledast} P(X)$.
Example:
If $S\to u, Y \to v$ with $u:=0Y1Y$ and $v:=23Y$,
then $l=lcm(2,3)=6, n=6/2=3, m=6/3=2$,
so we have $u \overset{Y}{\to} w$ with $w =\ u \overset{Y}{\circledast} v = 0\mathbf{2} 1\mathbf{3} 0\mathbf{Y} 1\mathbf{2} 0\mathbf{3} 1\mathbf{Y}$
Note how this definition makes sure to extend the pattern to a sufficient length until the convolved sequence will start to repeat itself. This is relevant for practical realizations of the procedure. But more naturally, one can view convolution as immediately taking place on infinite sequences that are represented by the patterns.
Infinite Sequence Convolution:
Let $u, v \in (\Sigma\cup N)^\omega$ and $X\in N$.
Then the convolution $w = u \overset{X}{\circledast}v$ is defined as
$w := [u(I^{u}_{=X}) := v]$.
Example: Same as above, but working directly with the infinite periodic sequences.
The expression $V \overset{W}{\circledast} X \overset{Y}{\circledast} Z$ is to be understood as $(V \overset{W}{\circledast} X) \overset{Y}{\circledast} Z$, i.e. the operator $\overset{X}{\circledast}$ is defined to be left-associative.
Derivations, Words and Languages
A derivation of $G$ is a countable sequence of pattern convolutions starting with the initial
seed pattern $p_0 := P(S)$[1], i.e. a sequence
$p_0 \overset{N_0}{\to} p_1 \overset{N_1}{\to} p_2 \overset{N_2}{\to}…$ with seed patterns
selected by $N_0, N_1, … ∈ N$ and resulting derived patterns $p_1, p_2, …$.
Written in the more general notation, this is the same as a sequence of convolutions
$p_0 \overset{N_0}{\circledast} P(N_0)\overset{N_1}{\circledast} P(N_1)\overset{N_2}{\circledast} …$.
with $p_i = p_{i-1} \overset{N_{i-1}}{\circledast} P(N_{i-1})$ for $i>0$.
The result of a derivation after finitely or infinitely many steps is a word
$w\in\Sigma^\omega$ derivable in $G$. The language $L(G)$ is the set of all
such words.
Some Properties of Patterns
Convolution Equivalence of Unrolled Patterns
As with classical grammars, different convolution patterns can result in the same derivable words. Two patterns $u,v$ are called convolution-equivalent (written $u \approx_\circledast v$) iff for every infinite derivation sequence replacing all occurrences of $u$ with $v$ (or vice versa) does not affect the resulting word.
For example, take any pattern $w$. It follows from the definition of convolution (on infinite sequences) that $w \approx_\circledast w^k$ for $k\geq 1$, i.e. all $k$-fold unrollings of $w$ are convolution-equivalent. Therefore, we can restrict our attention to primitive patterns, i.e. patterns $w$ such that there is no $w’$ such that $w’^{k} = w$ for some $k>1$.
Example: $0S1$ is an primitive pattern, whereas $0S10S1$ is not.
Note: A simple criterium to show for two patterns $u,v$ that $u \not\approx_\circledast v$ is to construct a finite derivation sequence where interchanging $u$ and $v$ results in two patterns that have distinct terminal symbols at the same position.
Note: Syntactic periodicity of the pattern is unrelated to periodicity of derivable words (this will become clearer in the following sections).
Balanced Patterns
A pattern $w$ is balanced if for all $X\in N$ with $Occ_X(w) > 0$ we have that $Occ_X(w)$ is divisible by $|P(X)|$. Balanced patterns are useful because they allow for “remainder-free” convolution.
Proposition: For each pattern $w$ there is some $k$ such that $w^k$ is balanced.
Proof: The definition of finite convolution balances the pattern for the
non-terminal that is being substituted. It is easy to see how this can be generalized.
$\blacksquare$
Example: Consider the grammar with $S \to 0AB, A \to 1B, B \to 2A3A$. Then $P(S)^4 = 0AB0AB0AB0AB$ is equivalent to $P(S)$, and also is balanced.
Some Properties of Convolution
Increasing Stable Prefix
We will call the maximal prefix of a seed (pattern) that consists only of terminal symbols a stable prefix $stp(w) \in \Sigma^+$. Note that by definition of convolution grammars we always have $stp(w\overset{X}{\circledast}v) > stp(w)$, because patterns must start with a symbol in $\Sigma$. This is what allows an infinite derivation to converge to a well-defined infinite word in the limit.[2]
Convolution is Non-Commutative in General
The order of convolution steps in general matters, and allows for derivation of distinct words even if the grammars, as defined above, do not allow alternative choices.
Example: Take the grammar $S \to 0AB, A \to 1B, B \to 2A$. Then
- $0AB \overset{A}{\to} 01B0BB\overset{B}{\to} 0120A\mathbf{2} 01A02A$
- $0AB \overset{B}{\to} 0A20AA\overset{A}{\to} 0120B\mathbf{1} 0B201B$
Note that the resulting patterns have distinct terminal symbols at the same position and thus cannot be derived into the same words.
In a sense, one could say that convolution is non-local - a previous derivation step can influence subsequent steps by introducing additional non-terminal symbols of a certain kind into the pattern, which in turn will result in a different convolution result for the pattern corresponding to that symbol. Therefore, in classical terms, convolution grammars are inherently context-sensitive, because the result in general depends on the complete pattern.
Convolution is Non-Associative in General
Even worse – for the reasons outlined above, pattern convolution is in general not even associative. While this does not matter for normal derivation, which is usually done in left-to-right order, it is important to be aware of this fact for the purposes of more detailed analysis.
Example: Consider the grammar $S \to 0AB, A \to 1BB, B \to 2A$. Then we have:
- $u := P(S) \overset{A}{\circledast} P(A) = 01B0BB0BB$
- $v := P(A) \overset{B}{\circledast} P(B) = 12A$
- $x := u \overset{B}{\circledast} P(B) = 0120A20A2$
- $y := P(S) \overset{A}{\circledast} v = 01B02B0AB$
- $x \overset{A}{\circledast} P(A) = 0120\mathbf{1}20B2 \not\approx 01B0\mathbf{2}B01B = y \overset{A}{\circledast} P(A)$
- $\Rightarrow (P(S) \overset{A}{\circledast} P(A))\overset{B}{\circledast} P(B) \not\approx_\circledast P(S) \overset{A}{\circledast} (P(A)\overset{B}{\circledast} P(B))$
Conditions for Recovering Associativity
While the order dependence is expected and even in a sense desirable as a degree of freedom that can be used in a grammar, a lack of associativity poses a challenge. However, it is possible to describe circumstances under which we can recover it.
Claim: Let $u,v,w$ be patterns and $X,Y \in N$ such that $u \overset{X}{\circledast} v \overset{Y}{\circledast} w$ is well-defined. Then $(u \overset{X}{\circledast} v) \overset{Y}{\circledast} w = u \overset{X}{\circledast} (v \overset{Y}{\circledast} w)$ holds if $Occ_Y(v)>0$ and we have that either $X = Y$, or $Occ_Y(u) = 0$.
Proof (idea): The requirement $Occ_Y(v) > 0$ is required to make the convolution
$v \overset{Y}{\circledast} w$ well-defined in the first place. Note that this is not
trivially true, because $u$ also could provide the needed $Y$ positions, which, if
untouched by the first convolution, still would make the second one well-defined.
For the rest, check that the only problematic case (i.e., potentially yielding distinct
results) is one where $u$ has occurrences of $Y$ and the convolution with $v$
does not substitute those, while adding more occurrences of $Y$ (we assumed
$Occ_Y(v)>0$).
If the first convolution $x = u \circledast v$ is done first, $w$ is substituted into the
positions $I^x_{=Y}$, which is a different index sequence from $I^u_{=Y}$, used to substitute
in the right-hand expression $v\circledast w$, in the case that the second convolution is
evaluated first. $\blacksquare$
Claim: A convolution sequence of any length $n$ such as $w_1 \overset{X_1}{\circledast} w_2 \overset{X_2}{\circledast} … \overset{X_{n-1}}{\circledast} w_n$ is associative (i.e. allows for any parentheses) if all patterns $w_i$ have just one kind of non-terminal, i.e. $w_i \cap N = \{X\}$ for some $X \in N$.
Proof (idea): In this case, each convolution can only fully eliminate the unique non-terminal, replacing it (partially) with the non-terminal that will be replaced in the next convolution, and it is not possible to construct a situation such as discussed above to force distinct results using any evaluation order (i.e. placement of parenthesis) of the convolution sequence.[3]$\blacksquare$
Some Properties of Convolution Languages
Finitary Convolution Grammars are Trivially Regular
Let us call grammars that do not allow infinite sequences of derivations finitary. Finitary convolution grammars, as can be expected, are not very interesting.
Proposition: All finitary convolution grammars are ω-regular.
Proof: The impossibility of infinite derivation sequences implies that the grammar has no cycles. Thus, we can compute all possible distinct derivation paths, collecting the resulting patterns $w$ (i.e., finite words), each of which induces a periodic infinite word $w^\omega$. $\blacksquare$
In the following, if not stated otherwise, statements about convolution grammars are meant to be about infinitary grammars.
Some Infinitary Convolution Grammars Are Also Regular
A grammar with infinite derivation steps does not automatically imply that the resulting languages cannot be regular.
Proposition: For any $w\in\Sigma^+$ there exist infinitely many convolution grammars $G$ generating the regular language $L(G) = \{w^\omega\}$ for some $w\in\Sigma^+$.
Proof: (sketch) Pick a finite word $w$. Pick some large enough $k$ and let $w’ = w^k$. Pick a suitable index subsequence $I$ for $w’$ such that $w’(I) = w^l$ for some $l \lneq k$. Then the seed pattern $S\to p$ is given by $p := [w’(I) := S]$.
As the stable prefix is monotonically increasing, an inductive argument shows that we can obtain an infinite sequence of derivations with stable prefixes $w^{k_i}$ where $k_{i+1} > k_i$ for all $i$, yielding the word $w^\omega$ in the limit. $\blacksquare$
Examples:
- $S \to 01SS$ yields the word $(01)^\omega$
- $S \to 01S10S$ also yields the word $(01)^\omega$
- $S \to 0S10S100S$ yields the word $(001)^\omega$
- $S \to 011SS0S1SS$ yields the word $(01101)^\omega$
Proposition: For a seed pattern $S \to P(S) =: u$ with $u \cap N = \{S\}$ one can construct a longer pattern $w = uv$, so that a grammar $G$ with seed pattern $S \to w$ generates the regular language $L(G) = \{w^\omega\}$ for some $w\in\Sigma^+$.
Proof: (sketch) Let $S\to P(S)$ be a seed pattern of length $n$ that w.l.o.g. has only one type of non-terminal $S\in N$. Derive the pattern $S$ until a prefix $p \in\Sigma^+$ of length $\geq n$ stabilizes. Let $w’ := p^k$ for some sufficiently large $k$. Pick an index sequence $I$ such that $I(i) = I^u_{=S}(i)$ for $1 \leq i \leq Occ_S(u)$ and $w’(I) = p^l$ for some $l\leq k$. Finally, let $w := [w’(I) := S]$. $\blacksquare$
Example:
Consider the seed pattern $S \to 0S1$.
We have $0S1 \overset{S}{\to} 0010S1011$, and in particular,
the stable prefix of length 3 is given by $p := 001$.
Now let $w’:=p^2 = 001001$ and $I := 2,4,6$, yielding $w’(I) = 001 = p^1$ and
thus after the substitution we have obtained $w := uv = 0S1S0S$.
Taking $S \to w$ as the new seed pattern, we get:
$0S1S0S \overset{S}{\to} 001S010S100S \overset{S}{\to} 0010010S1001001S0100100S …$
Note that this is very similar to the previous construction, only that we are slightly more constrained by having a pre-defined pattern and some prefix it induces.[4]
Claim: If a convolution grammar $G$ is regular, then all words in $L(G)$ must be of the form $w^\omega$ for some $w\in\Sigma^+$.
Corollary: The set of languages that can be generated by convolution grammars does not fully contain $\omega$-regular languages.
Proof: In general, $\omega$-regular languages consist of words with the shape $uv^\omega$ such that $u\in\Sigma^*, v\in\Sigma^+$, but convolution grammars cannot generate words with a finite pre-periodic prefix. $\blacksquare$
Convolution Grammars Can Generate Non-Regular Languages
There are convolution grammars that only generate infinite words which are not lasso-shaped, i.e. words not of the form $uv^\omega$. This implies the following statement.
Proposition: There exist convolution grammars $G$ such that $L(G) \not\in \omega-REG$, and $L(G)$ has no non-trivial $\omega$-regular subset.
Proof: (sketch) Consider the grammar with the single pattern $S \to 0S1$ and its unique derivation sequence $P(S) =: p_1 \overset{S}{\to} p_2 \overset{S}{\to}…$ yielding some word $w$ in the limit. Note that each $p_i$ is balanced, has a length of $3^i$, consists of 3 copies of $p_{i-1}$ and has a single $S$ in the middle. Observe that if $w$ was periodic, it would need to have a period length of $3^i$ for some $i$.[5]
By definition of convolution, the prefix of $w$ of length $3^i$ stabilized from the pattern $p_i$ in the derivation chain. But then the next derived pattern $p_{i+1}$ by construction has one copy of $p_i$ where $S$ is replaced with 0 and one where $S$ is replaced with 1, implying that $w$ cannot have a period of length $3^i$ for any $i$. Therefore $w$ is aperiodic, and because the derivation is unique, we have $L(G) =\{w\}$. $\blacksquare$
Strictly Cyclic Convolution Grammars
We have seen that the convolution operation in general appears to have no nice algebraic properties, and the grammars yield languages quite unlike ones generated by classical formalisms, so it seems hard to transfer and adapt standard results from the classical setting to this syntactically similar, but semantically very different formalism. However, there is a somewhat better-behaved restriction of convolution grammars that still is an interesting object of study.
A cyclic convolution grammar has the restriction $|P(X) \cap N| = 1$ for all $X\in N$, i.e. each seed pattern contains exactly one kind of non-terminal symbol, but possibly appearing multiple times in the pattern. The grammar is strictly cyclic if there exists some $X$ such that $S \in P(X)$, i.e. the unique derivation cycle loops back to the initial seed pattern. The cycle length of a cyclic grammar is the length of the convolution cycle, i.e. number of convolutions needed to obtain a sequence with the same non-terminal. Note that each cyclic convolution grammar generates a unique infinite word in the limit, because there are no degrees of freedom for the pattern expansion[6], i.e. $L(G) = \{w\}$ for some $w \in \Sigma^\omega$.
Note: Each cyclic grammar is infinitary, as non-terminals only vanish in the limit.
Note: Each singleton grammar, i.e. grammar with just one pattern, is trivially a strictly cyclic convolution grammar with cycle length 1.
Examples:
-
$S \to 0A1, A \to 2BB, B \to 3A$ is a cyclic grammar with cycle length 2.
Its language is a unique word obtained as the limit of the infinite derivation $ P(S) =: p_0 \overset{A}{\to} p_1 \overset{B}{\to} p_2 \overset{A}{\to} p_3 \overset{B}{\to} …$ converging to a word with the prefix $021 031 021 021 031 031 021 031 021 021 031 021 021 031 031 021 031 031…$ -
The following are some simple singleton grammars:
- $G_1$ with $S \to 0S1$ yields $001 001 011 001 001 011 001 011 011…$
- $G_2$ with $S \to 01S$ yields $010 011 010 010 011 011 010 011 010…$
- $G_3$ with $S \to 0S1SS$ yields $00101 00110 00111 00100 01111…$
- $G_4$ with $S \to 1S0S$ yields $1101 1001 1100 1001 1101 1000 1100 1001…$
which is known as the regular paper-folding sequence
- A singleton grammar of the form $S \to 123…(q-1) S$ generates the word equivalent to the basic 2-adically structured function $f_q$. For $q=4$ we have the grammar $G_{f_4}$ with $S \to 123S$ and the generated limit word starts with $1231 1232 1233 1231 1231 1232 1233 1232 1231 1232 1233 1233…$
In the following, we will focus on strictly cyclic grammars. It is easy to see how many if not most of the following observations and statements can be generalized to non-strictly cyclic grammars, where the sequence of derivations has a lasso-shape of a finite sequence of patterns that can be convolved exactly once, and the remaining patterns which form a strictly cyclic grammar.
Cyclic Grammars Can be Flattened
An important property strictly cyclic grammars have is that their patterns always satisfy the requirements for associativity for finite convolution sub-sequences along the unique and infinite derivation path. From this we can get the following.
Theorem: (Cyclic Grammar Flattening)
Each strictly cyclic grammar with a cycle length greater than 1
can be transformed into an equivalent singleton grammar consisting of a single pattern.
Proof:: Let $G$ be some strictly cyclic grammar pattern with cycle length $n$. The unique derivation sequence is cyclic and can be written as
$$ S \overset{N_1}{\to} p_1 \overset{N_2}{\to} p_2 \overset{…}{\to} … \overset{N_n}{\to} p_n \overset{N_1}{\to} p_{n+1} \overset{N_2}{\to} p_2 \overset{…}{\to} … $$
which is equal to
$$ S \overset{N_1}{\circledast} P(N_1) \overset{N_2}{\circledast} P(N_2) \overset{…}{\circledast} … \overset{N_n}{\circledast} P(N_n) \overset{N_1}{\circledast} P(N_1) \overset{N_2}{\circledast} P(N_2) \overset{…}{\circledast} … $$
which by associativity of finite convolution expressions for cyclic grammars equals
$$ S \overset{N_1}{\circledast} (P(N_1) \overset{N_2}{\circledast} P(N_2) \overset{…}{\circledast} … \overset{N_n}{\circledast} P(N_n)) \overset{N_1}{\circledast} (P(N_1) \overset{N_2}{\circledast} P(N_2) \overset{…}{\circledast} …) … $$
which means that we can compute the new pattern
$$ p’ := P(N_1) \overset{N_2}{\circledast} P(N_2) \overset{…}{\circledast} … \overset{N_n}{\circledast} P(N_n) $$
noting that $p’\cap N = \{N_1\}$ and $N_1 = S$.
If we now let $S \to p’$ be the singleton pattern of a grammar $G’$, the new sequence $S \overset{S}{\to} p’ \overset{S}{\to} p’_2 \overset{S}{\to} …$ must by construction yield the same word in the limit, i.e. $L(G’) = L(G)$. $\blacksquare$
Example:
The cyclic grammar $S \to 0A, A \to 1B, B \to 2C, C \to 3S$
can be flattened to the equivalent grammar $S’ \to 010201030102010S’$
Proposition: The flattened grammar $G’$ can in general have a size exponential in the number of patterns in $G$.
Proof: It is easy to see how the example of the normalization generalizes to a family of grammars where the sum of the pattern lengths is $2n$, while the normalized single pattern $S’$ has length $2^n$. $\blacksquare$
Open Question(s): Is there a systematic procedure for the inverse direction, i.e. breaking down a complex convolution pattern into a sequence of convolutions based on simpler patterns? Is it possible to effectively decide whether a given grammar only consists of prime patterns, i.e. patterns that cannot be further factorized into a sequence of convolutions based on smaller patterns?
Cyclic Grammars Generate Self-Similar Words
Cyclic Grammars have the interesting property of generating infinite words which contains themselves, recursively nested, infinitely many times. In more precise terms, this can be stated as follows.
Theorem: (Self-Similarity in Cyclic Grammars)
Each strictly cyclic convolution grammar of cycle length $n$ induces $n$ infinite words
$w_i$ for $1\leq i \leq n$ and for each word $w_i$ there is an infinite chain of nested index
sequences $I^i_j: \mathbb{N} \to D_j$ with $D_0 = \mathbb{N}$ and $D_{j+1} \subset D_{j}$
such that $w_i(I^i_j) = w_i$ for all $j\geq 0$.
Proof: (sketch) Recall the cyclic grammar flattening and note that the derivation can be “left-shifted”, i.e. one can start the derivation at some other seed than $S$, yielding $n$ sequences that can be flattened into
$$ \begin{aligned} p_1 &:= P(N_1) \overset{N_2}{\circledast} P(N_2) \overset{…}{\circledast} … \overset{N_n}{\circledast} P(N_n) \\ p_2 &:= P(N_2) \overset{N_3}{\circledast} P(N_3) \overset{…}{\circledast} … \overset{N_1}{\circledast} P(N_1) \\ & … \\ p_n &:= P(N_n) \overset{N_1}{\circledast} P(N_1) \overset{…}{\circledast} … \overset{N_{n-1}}{\circledast} P(N_{n-1}) \\ \end{aligned} $$
such that the following derivation sequences are equal:
$$ \begin{aligned} N_1 & \overset{p_1}{\to} … \overset{p_1}{\to} … \\ N_1 \overset{N_1}{\to} q_1 & \overset{p_2}{\to} … \overset{p_2}{\to} … \\ … \\ N_1 \overset{N_1}{\to} q_1 \overset{N_2}{\to} … \overset{N_{n-1}}{\to} q_{n-1} & \overset{p_n}{\to} … \overset{p_n}{\to} … \\ N_1 \overset{N_1}{\to} q_1 \overset{N_2}{\to} … \overset{N_{n}}{\to} q_{n} & \overset{p_1}{\to} … \overset{p_1}{\to} … \\ \end{aligned} $$
where $\overset{p_i}{\to}$ indicates convolution with the respective constructed flattened pattern and each $p_i$ can be interpreted as arising from a shifted grouping of the convolution cycle that makes up the derivation, i.e. one convolution with a pattern $p_i$ is equivalent to $n$ subsequent convolution steps in the original derivation sequence. In other words, those $n$ sequences (in general containing possibly distinct intermediate patterns) are related by single-step derivations with the original patterns $P(N_i)$.
Let the words $w_i$ be defined as the limit of infinitely repeated convolution of $N_i$ with the pattern $p_i$. Note that $w_1$ is simply the word generated by the original grammar. Now w.l.o.g. let $w := w_i$ for some fixed $i$.
Let $r_0, r_1, r_2, …$ be the sequence of intermediate patterns in the derivation generating $w$ and let $I^\to_j$ denote the index sequence of positions in $w$ such that were substituted in the $j$-th derivation step, i.e. $I^\to_j := I^{r_{j-1}}_{X}$ for the $X\in N$ appearing the used pattern.
By basic properties of cyclic grammars and pattern convolution, it is easy to see that we have $I^\to_{j+1} \subset I^\to_j$, and because terminal symbols are invariant and do not affect convolution, the infinite cyclic sequence of remaining convolutions starting with step $j$ is equal to the original sequence. In other words, we have for all $j$ that $w= w(I^\to_j)= [{r_{j-1}}^\omega(I^\to_j) := w]$. $\blacksquare$
Note: Whether the resulting words are all distinct or not depends on the grammar. One can easily construct a cyclic grammar consisting of equivalent but syntactically distinct patterns and in such a case of course all $w_i$ will be equal.
Examples:
-
$S \to 0A1, A \to 0S10S1$ is a cyclic grammar with cycle length 2.
Note that it is equivalent to example $G_1$, i.e. $L(G) = \{w\}$ such that $w = 001 001 011 001…$, and we have $w_1 = w_2 = w$ in the construction above. -
$S \to 0AA, A\to 1S$ is a cyclic grammar with cycle length 2 such that we have $L(G) = \{w\}$, and in the construction above we have $w_1=w, w_1 \neq w_2$:
- $w_1 = 010011010010011011…$
- $w_2 = 101110101111101110…$
-
$S \to 0A1, A \to 0B1, B \to 01S$ is a cyclic grammar with cycle length 3 using two distinct patterns (modulo renaming of non-terminals), but yields three distinct words in the construction:
- $w_1 =$ 001001011001011011001001011…
- $w_2 =$ 001011001001011001001011011…
- $w_3 =$ 010010011010010011010011011…
Most Convolution Grammars Generate Aperiodic Words
The self-similarity is given in a rather trivial way if the resulting word in periodic, so aperiodicity of the limit-words - combined with the self-similarity, is what makes the grammars interesting in the first place.[7]
However, while we have seen a few ways for constructing grammars that yield periodic words, I have found no simple criteria for deciding whether a grammar will be aperiodic or not, and no better procedure of checking periodicity of the result than checking a reasonably long stable prefix and trying to detect a period.
In an informal sense, it surely feels like “most” non-trivial convolution grammars end up yielding aperiodic words - i.e., most periodic ones I came up with were constructed on purpose (using the ideas above), while most “random” singleton grammars that I defined resulted in (apparently) aperiodic words.
Claim: The presented singleton grammars $G_1, G_2, G_3, G_4$ and $G_{f_q}$ for $q>2$ all generate an aperiodic word.
Trying to get a better understanding of periodicity in convolution grammars, I generated all singleton grammar patterns with $T=\{0,1\}$ and $N = \{S\}$ up to length 12 and ran a naive period detection for period lengths up to 60 using a sample of size $60^2=3600$. Based on the results I got, only 2508 out of 515084 patterns resulted in periodic words, confirming empirically the feeling that periodic convolution grammars are the in fact the exceptional case. Furthermore, based on inspection of the patterns, I believe the following statement.
Conjecture: (Period lengths of singleton grammars)
Let $G$ with $S \to p$ be a singleton grammar pattern with $L(G) = w$.
If $w$ is is periodic, i.e. $w=v^\omega$ for some $v$,
then $|v|$ is a non-trivial divisor of $|p|$.
Now given that this is true, the only missing piece for an effective procedure for detecting periodicity of cyclic convolution grammars would be a bound on the required sample length, i.e., a statement of the form:
Given a singleton grammar pattern of length $n$ that yields a word $w$, $w$ has period $|u|$ iff there exists a stable finite prefix $v$ of $w$ with $|v| \geq f(n)$ and primitive word $u$ such that $v=u^k$ for some $k>1$ and $u$ is primitive.
With such a bound, we would have an effective procedure to decide the periodicity of the word generated by any strictly cyclic convolution grammar:
- Flatten the grammar to a singleton grammar pattern of length $n$
- Compute a sufficiently large stable prefix of a suitable size $s \geq f(n)$
- Check this prefix of $w$ for periodicity with period sizes up to $n/2$
Relationship to Morphic Words
A (pure) morphic word is an infinite word that can be generated as the limit of a mapping $f: \Sigma \to \Sigma^+$ (that is applied to each symbol of a sequence) for some alphabet $\Sigma$, starting from an initial symbol $a\in\Sigma$[8]. A (k-)uniform morphic word is one that can be generated by a morphic map where the strings in the image all have the same fixed length $k$. Uniform morphic words are also automatic, meaning that they can be generated by a certain kind of DFA with output.
To give the restricted class of words we focused on earlier a nice name, let us call infinite words that can be generated by singleton convolution grammars self-convolutional. Now the regular paperfolding sequence (grammar $G_4$ above) is a word that is both self-convolutional and (uniformly) morphic, and it was the first example I stumbled upon that can be generated with both formalisms.
This connection is interesting for multiple reasons. A more philosophical reason is the following. Notice that convolution grammars feel “intrinsically sequential” – it looks like one has to traverse the word from left to right and put the correct symbols in place, and there is no obvious way to retrieve a symbol without computing sufficiently many predecessors. In contrast, morphic mappings are in a sense “intrinsically parallel” – to compute the next word in the derivation sequence, one can apply the same mapping to all symbols at the same time, no information about the preceding symbols or positions in the sequences is needed.
A practical reason to explore this relationship is the fact that every periodic word is also uniformly morphic[9], so understanding the relationship better could help to resolve the questions discussed above. For example, techniques used to show a word is not automatic (or not even morphic) also imply that it is not periodic. Furthermore, given an automatic sequence, it is possible to decide whether it is periodic in polynomial time (see slide 11ff.).
Note: In the following I use a slightly relaxed definition of morphic words[10] – instead of allowing to apply a letter-to-letter coding as the final step, I allow to apply a general mapping from each symbol to some finite sequence (possibly over a different alphabet).
Let us call a word $w$ to be $(k,l)$-uniform morphic if there is a $k$-uniform morphic mapping $f$ and an $l$-uniform final mapping $g$ that generate the word, i.e. for some letter $a$ we have $g(f^\infty(a)) = w$. Note that $k$-uniform morphic words are exactly the $(k, 1)$-uniform morphic words, as $l=1$ means that $g$ is a coding.
I experimented with words generated by simple singleton grammars and for about half a dozen I succeeded in finding a morphic mapping by just by playing around and looking for a “pattern” in the repetition of segments of fixed length in the word. Based on my initial experiments I had a growing suspicion.
Conjecture:
Let $G$ with $S \to p$ be a singleton grammar pattern with $L(G) = w$.
If $Occ_S(p)$ divides $|p|$, then $w$ is $(k,l)$-uniform morphic for some $k,l \leq |p|$.
Testing the Conjecture
While I have no idea how to prove this, I decided to test and explore it in an empirical way more rigorously. To do that, I implemented a (slightly optimized) brute force routine to automate my initial exploratory manual pattern-recognition.
The assumed uniformity makes the job pretty easy – we can just break up a word into segments of fixed length and replace each segment with a new symbol. W.l.o.g. we can use a normalized alphabet that consists of all encountered segments, numbered in the order of their first occurrence in the word.
Let us assume that $w$ is $(k,l)$-uniform. Then there is a $k$-uniform morphic map $f$ (w.l.o.g. we have $f(0) = 0s$ for some finite string $s$ over the normalized alphabet) and final map $g$ such that $g(f^\infty(0)) = w$. Now we can use the fact that if we have $v = f^\infty(0)$ and $w = g(v) = g(f^\infty(0))$, then $f^{-1}(g^{-1}(w)) = v$, because $v$ is by definition the fixed point of applying $f$ to the first symbol $0$.
But this means that we can simply:
- guess the segment lengths $k,l$ (for the mappings $f$ and $g$, respectively)
- translate the word into the corresponding pre-image(s) (as described above)
- check whether $g^{-1}(w) = f^{-1}(g^{-1}(w))$
There remains one problem – as the words have infinite length, we cannot really check them completely. However, we can check some reasonably long finite prefix to become pretty confident (similarly to the periodicity check I used).
Example:
Consider the sequence generated by $G_4$ (i.e. pattern $1S0S$):
$w = 11011001110010011101100011001001…$
Now let us guess that it is $(2,2)$-uniform, i.e. $k=l=2$.
We can compute the normalized sequence $g^{-1}(w)$, getting
$v = 01210321012303210121032301230321…$ and the 2-uniform mapping $g$
defined by $g(0) = 11, g(1) = 01, g(2) = 10, g(3) = 00$.
Using the same procedure once more to infer $f$, we observe that, in fact,
$f^{-1}(v) = v (= f(v))$,
with mapping $f$ defined by $f(0) = 01, f(1)=21, f(2)=03, f(3)=23$.
As all periodic words are also morphic and the periodicity check is more efficient, I actually combined both checks (i.e., if a word is found to be periodic, we can skip the uniform morphic mapping inference).
Some Suggestive Empirical Results
I initially ran the outlined algorithm to detect $(k,l)$-uniform words on all singleton grammar patterns $p$ over $\Sigma=\{0,1\}$ with $|p| \leq 8$ and the conjecture was not violated. By inspecting the computed morphic mapping(s) I actually could further refine the conjecture:
Conjecture (Improved):
Let $G$ with $S \to p$ be a singleton grammar pattern with $L(G) = w$.
If $ l:= Occ_S(p)$ is a divisor of $|p|$, then $w$ is $(k,l)$-uniform morphic
for $k := |p| / l$.
Note: The values $k$ and $l$ in the conjecture are not guaranteed to be minimal, and in general there are multiple equivalent choices, as a $k$-automatic (i.e. uniform morphic) sequence is known to be also $k^r$-automatic $\forall r\geq 1$.
I re-ran my tests for the refined conjecture, testing now explicitly for the $k$ and $l$ chosen as in the refined conjecture, and again the conjecture seems to hold up. This time I tested with pattern lengths up to 12, because I could significantly speed up the computations by fixing the respective $k$ and $l$ values instead of trying lots of combinations for each word.
To me, the stronger conjecture is quite surprising. I see no obvious connection between the fraction of non-terminals in the pattern and the corresponding segment lengths for the mappings $f$ and $g$, so it looks like some non-trivial combinatorics is going on under the hood.
For patterns of prime length, a uniform morphic map could not be found, except for the expected case where the non-terminal appears in the pattern exactly once, which I take as further evidence in support of the conjecture. For non-prime pattern lengths, there are examples where uniform morphic mappings can be found for grammars not covered by the conjecture (i.e. not satisfying the required divisor property), so the exact relationship remains unclear (and, of course, formally unproven).
In any case I am quite certain now that there is a deep relationship between uniform morphic maps and self-convolutional words and a neat characterization of their intersection is waiting to be found. As mentioned, this would also possibly resolve most questions concerning (a)periodicity of the generated words, using suitable results about automatic sequences.
I do not know whether there are (strictly[11]) non-uniform morphic words corresponding to certain convolutional grammars. My feeling is that this is not the case, but at least it could be disproved easily by finding a suitable counter-example. However, I have no ideas for brute-forcing general morphic words in an acceptably efficient way, and I strongly relied on uniformity (and thus fixed-length segments) in the search algorithm described above.
Update: I finally found what self-convolutional words are called like in literature, and what to me looks like the proof for the conjectures above – such words are actually called Toeplitz Words.
Summary and Further Directions
While all we have seen so far has not been really rigorous, I hope at least it was somewhat interesting and clear enough to enable somebody with interest in this topic to pick it up, formalize it better, and take it further. To summarize the main takeaways:
- Convolution grammars are an exotic interpretation of formal grammars
- They are not easy to work with, but cyclic grammars are a simpler subclass
- They can generate interlinked cascades of self-similar and aperiodic words
- There seems to be an interesting connection to (uniform) morphic words
I don’t know whether any of this is new or I am reinventing the wheel here (I hope not), but at least I have not been able to find something like this in literature. In any case, I think this formalism might be worth some further exploration. To list some rather obvious directions:
- What can be said about non-cyclic convolution grammars?
- What other interesting properties do resulting languages / words possess?
- Specifically, are there more connections to known fractal sequences?
- What is the precise relationship of convolutional and morphic words?
- What additional constraints would characterize the intersection with $\omega$-REG?
- What about computation and closure of convolution grammars with respect to language operations (e.g. union, intersection) on such languages?
- Is there a nice $\omega$-automata model that is equivalent to convolution grammars?
The last point could be especially interesting if an automata-based representation would more compact or clear than patterns. Due to the pattern “blow-up” caused by convolution, the sequences quickly become quite unwieldy. Of course, automata are also in general the typical approach for trying to answer many of the questions listed above. It seems natural to consider some universal or alternating automata model with e.g. a Büchi condition to guess and verify a correct derivation for a given word, but it is not clear whether this would be actually useful as an analysis tool, or mapping the grammars into some completely different formalism would be in fact more illuminating.
Unfortunately, I am not sure how to make the proof sketches and ideas more rigorous, but it feels to me like some arguments, at least the way I approached them, are actually co-inductive in nature. Of course it could simply be that my way of thinking about the infinite convolution sequences is not the best one.
If you have already seen or worked with something like that before, or you managed to figure out something interesting in this context, I’d be happy to know about it!
Appendix: Some Code to Play with Convolution Grammars
While I was not able to provide satisfactory proofs, I can at least provide some empirical evidence and helpful utilities for inspecting words generated by convolution grammars.
"""Exploration of convolution grammars."""
# Copyright (C) 2023 Anton Pirogov, licensed under the MIT license
# ----
# Example grammars:
=
=
=
=
=
=
=
# ----
=
=
= 0
+= 1
return
"""Given a pattern `pat`, return the first non-terminal that can be found.
If `sym` is passed, just returns sym (i.e. skip search).
"""
return
"""Given a strictly cyclic grammar, infer the periodic convolution sequence."""
=
=
break
return
# ----
"""Convolve two patterns `p1` and `p2` to a new pattern.
If no non-terminal `sym` is passed, will try to infer it.
"""
=
=
assert != 0, f
=
= // , //
, =
=
return
"""Convolve pattern `pat` with itself `n` times.
If no non-terminal `sym` is passed, will try to infer it.
"""
= =
=
return
# ----
"""Convolve a singleton pattern or strictly cyclic grammar into an infinite word.
Returns generator returning tuples (sym, depth), where sym is the final terminal
symbol and depth is the number of convolution steps until this symbol stabilized.
"""
=
=
=
= # recursive lazy generator
=
=
yield from
"""Sample a finite prefix of a word generated by a convolution grammar."""
return
"""Sample stabilization depths for symbols of a word generated by a convolution grammar.
Depth 0 means the symbol was terminal already in the first pattern,
depth 1 means the symbol was substituted by a terminal by the first convolution, etc.
"""
return
# ----
"""Return a subsequence of given word at depth d."""
return
return
= or +1
=
return
"""Given two words, return sequence to pretty-print positions at which they differ."""
return
"""Print a number of samples, showing differences between adjacent words."""
# ----
"""Generate all non-trivial singleton patterns over up to some length.
Returns patterns that start with a terminal (i.e. valid)
and which have >=2 distinct terminals and >=1 non-terminal (i.e. non-trivial).
"""
assert > 0
=
assert not
yield from
= ** 2
=
=
= False
= True # counterexample for period i found
continue
return # looks like a word with period of i
return None # no period detected
# ----
"""Collect segments of length n, in first appearance order.
If n is not passed, uses n=1, which normalizes the alphabet.
Keyword argument seg_map can be a dict to be filled
with new symbols representing the segments.
If not passed, the mapping will not be externally available.
"""
= # ensure it is an iterator
# use provided dict for mapping or create new one
=
# get next segment
=
break # reached the end
= # concat if input is seq. of strings
= # try to retrieve assigned symbol
# must assign new symbol
=
# unknown full segment -> give it next free number
=
# last segment, incomplete length -> mark with -1
= -1
=
# return symbol for current segment according to segment map
yield
=
=
=
=
=
=
=
# force lazy generators by retrieving a sample (will fill the maps)
=
# word generated by cg appears not result from g(f^infty(0))
return None
=
=
return
"""Given a singleton convolution grammar, try inferring (k,l)-uniform mapping pair."""
assert ==1
=
= or
= or 12
# ----
= 0, 0, 0
, ,
continue # optimization: w.l.o.g. test only patterns starting with 0
=
=
+= 1
=
+= 1
+= 1
continue
# non-term. count is divisor of pat. length
= // , # expected parameters st. morphic map exists
,
assert == ,
=
+= 1
Usage example:
>>>
0010S1011
>>>
0010010110010010110010110110010010110010010110010110110010010110010110110010110110010010110010010110
>>>
>>>
0100100110100100110100110110100100110100110110100110110100100110100100110100110110100100110100100110
>>>
0010010110010110110010010110010010110010110110010010110010010110010110110010110110010010110010110110
0010110010010110010010110110010110010010110010010110110010110010010110110010110110010110010010110010
0100100110100100110100110110100100110100110110100110110100100110100100110100110110100100110100100110
0010010110010110110010010110010010110010110110010010110010010110010110110010110110010010110010110110
0010110010010110010010110110010110010010110010010110110010110010010110110010110110010110010010110010
>>>
0111SS 6 2
-
0111S0 6 1
-
0111S1 6 1
-
01110S 6 1
-
01111S 6 1
-
----
:
11/244 .
218/244 .
-
One could say that the derivation starts with the pattern $S$, but it is always trivially expanded to $P(S)$ in the first step (as there is a unique pattern associated with $S$), so we can also just skip it. ↩
-
One could of course also lift the syntactic restriction and instead require that derivations are “productive”, i.e. each finite prefix is stable after finitely many steps. ↩
-
If the condition is not satisfied, care must be taken – the case with 3 patterns does not automatically and easily extend to constraints that ensure full general associativity of a convolution chain, as technically, pattern convolution is not really binary (or even algebraic) operation. While certain convolution sequences may be regrouped safely and more precise criteria could be defined, arbitrary re-groupings in general will not yield the same result. ↩
-
The general trick here, of course, is simply strategic placement of non-terminal symbols, in anticipation of the (terminal) symbol that should be inserted at the corresponding position during convolution to force the desired periodic word. ↩
-
Yes, this is the sloppy part of this argument, purely relying on intuition. Among other things, I lack a rigorous and simple technical argument for formally proving that a convolution chain yields aperiodic words without much pain and ad-hoc reasoning. ↩
-
If we would extend convolution grammars to support alternative choices for each non-terminal, like in the classical case, then all of the following still would apply in the analysis of suitably restricted “sub-grammars”. ↩
-
I am mostly interested in these grammars as a means to systematically produce what could be called “1D Fractals”, and it feels like convolution grammars are on the right track, given that the presented example grammar $G_4$ (the regular paper-folding sequence) can be interpreted as instructions to obtain the famous Dragon Curve. ↩
-
It is also required that for some $a\in \Sigma$ we have $f(a) = as$ for some $s \in \Sigma^+$, which is enough to ensure that in each step the resulting word is a prolongation of the previous one, i.e. the previous word is a prefix. This ensures that the limit can exist. ↩
-
For an infinite periodic word $w^\omega$ over alphabet $\Sigma$, simply use the mapping $f: \Sigma \to \Sigma^+$ with $f(x) = w$ for all $x\in\Sigma$ and notice that it is trivially morphic. ↩
-
I think it is quite justifiable to treat a finite collection of finite sequences as effectively nothing else but slightly more complicated symbols, and I hope (without knowing about a corresponding proven result) the notions are still equivalent. Should this not be the case, just substitute each occurrence of morphic with essentially morphic, which should correspond to the relaxed definition. ↩
-
For each uniform (i.e. automatic) sequence one can find also some non-uniform morphic mapping, so by non-uniform morphic I mean the case where no uniform mapping exists for the word, but non-uniform mappings are possible. ↩