1 Introduction
Paperfolding sequences are sequences over the alphabet
$\{ -1, 1\}$
that arise from the iterated folding of a piece of paper, introducing a hill (
$+1$
) or valley (
$-1$
) at each fold. They are admirably discussed, for example, in [Reference Davis and Knuth8, Reference Dekking, Mendès France and van der Poorten9].
The formal definition of a paperfolding sequence is based on a (finite or infinite) sequence of unfolding instructions
$\textbf {f}$
. For finite sequences
$\textbf {f}$
, we define a map
${P:\{-1, 1\}^{*} \rightarrow \{-1, 1\}^{*}}$
as follows:

for
$a \in \{ -1, 1\}$
and
${\textbf {f}} \in \{-1, 1\}^{*}$
. Here,
$\epsilon $
denotes the empty sequence of length
$0$
,
$-x$
changes the sign of each element of a sequence x and
$x^R$
reverses the order of symbols in a sequence x. An easy induction now shows that
$|P_{\textbf {f}}| = 2^{|{\textbf {f}}|} - 1$
, where
$|x|$
means the length, or number of symbols, of a sequence x.
Now, let
${\textbf {f}} = f_0 f_1 f_2 \cdots $
be an infinite sequence in
$\{-1, 1\}^{\omega }$
. It is easy to see that
$P_{f_0 f_1 \cdots f_n}$
is a prefix of
$P_{f_0 f_1 \cdots f_{n+1}}$
for all
$n \geq 0$
, so there is a unique infinite sequence of which all the
$P_{f_0 f_1 \cdots f_n}$
are prefixes; we call this infinite sequence
$P_{\textbf {f}}$
.
As in the previous paragraph, we always index the unfolding instructions starting at
$0$
:
${\textbf {f}} = f_0 f_1 f_2 \cdots $
. Also by convention, the paperfolding sequence itself is indexed starting at
$1$
:
$P_{\textbf {f}} = p_1 p_2 p_3 \cdots $
. With these conventions, we immediately see that
$P_{\textbf {f}} [2^n] = p_{2^n} = f_n$
for
$n \geq 0$
. Since there are a countable infinity of choices between
$-1$
and
$1$
for the unfolding instructions, there are uncountably many infinite paperfolding sequences.
As an example, let us consider the most famous such sequence, the regular paperfolding sequence, where the sequence of unfolding instructions is
$1^{\omega } = 111\cdots $
. Here, we have, for example,

The first few values of the limiting infinite paperfolding sequence
$P_{1^{\omega }} [n]$
are given in Table 1. The sequence
$P_{1^{\omega }}[n]$
is sequence
A034947
in the On-Line Encyclopedia of Integer Sequences (OEIS) [Reference Sloane20].
Table 1 The regular paperfolding sequence.

The paperfolding sequences have many interesting properties that have been explored in a number of papers. In addition to the papers [Reference Davis and Knuth8, Reference Dekking, Mendès France and van der Poorten9] already cited, the reader can also see those of Allouche [Reference Allouche2], Allouche and Bousquet-Mélou [Reference Allouche and Bousquet-Mélou4, Reference Allouche and Bousquet-Mélou5], and Goč et al. [Reference Goč, Mousavi, Schaeffer, Shallit, Beckmann, Mitrana and Soskova10], to name just a few.
Recently, Bunder et al. [Reference Bunder, Bates and Arnold6] explored the sequence of lengths of runs of the regular paperfolding sequence and proved some theorems about them. Here, by a ‘run’, we mean a maximal block of consecutive identical values. For example, the sequence of runs of the regular paperfolding sequence begins
$2,1,2,2,3,2,1,2,\ldots $
.
Runs and run-length encodings are a long-studied feature of sequences (see, for example, [Reference Golomb11]). The run lengths
$R_{1111}$
for the finite paperfolding sequence
$P_{1111}$
, as well as the starting positions
$S_{1111}$
and ending positions
$E_{1111}$
of the nth run, are given in Table 2.
Table 2 Run lengths of the regular paperfolding sequence.

As it turns out, however, much more general results, applicable to all paperfolding sequences, can be proven rather simply, in some cases making use of the Walnut theorem-prover [Reference Mousavi14]. As shown in [Reference Shallit17], to use Walnut, it suffices to state a claim in first-order logic and then the prover can rigorously determine its truth or falsity.
To use Walnut to study the run-length sequences, these sequences must be computable by a finite automaton (‘automatic’). Although the paperfolding sequences themselves have this property (as shown, for example, in [Reference Goč, Mousavi, Schaeffer, Shallit, Beckmann, Mitrana and Soskova10]), there is no reason, a priori, to expect that the sequence of run lengths will also have the property. For example, the sequence of runs of the Thue–Morse sequence
${\textbf t} = 0110100110010110\cdots $
is
$12112221121\cdots $
, a fixed point of the morphism
$1 \rightarrow 121$
,
$2 \rightarrow 12221$
[Reference Allouche, Arnold, Berstel, Brlek, Jockusch, Plouffe and Sagan3], and is known to not be automatic [Reference Allouche, Allouche and Shallit1].
The starting and ending positions of the nth run are integer sequences. To use Walnut to study these, we would need these sequences to be synchronised (see [Reference Shallit, Lecroq and Puzynina16]); that is, there would need to be an automaton that reads the integers n and x in parallel and accepts if x is the starting (respectively, ending) position of the nth run. However, there is no reason, a priori, that the starting and ending positions of the nth run of an arbitrary automatic sequence should be synchronised. Indeed, if this were the case and the length of runs were bounded, then the length of these runs would always be automatic, which, as we have just seen, is not the case for the Thue–Morse sequence.
However, as we will see, there is a single finite automaton that can compute the run sequence
$R_{\textbf {f}}$
for all paperfolding sequences simultaneously, and the same thing applies to the sequences
$S_{\textbf {f}}$
and
$E_{\textbf {f}}$
of starting and ending positions, respectively.
In this paper, we use these ideas to study the run-length sequences of paperfolding sequences, explore their critical exponent and subword complexity, and generalise the results of Bunder et al. [Reference Bunder, Bates and Arnold6] on the continued fraction of a specific real number to uncountably many real numbers.
2 Automata for the starting and ending positions of runs
We start with a basic result with a simple induction proof.
Proposition 2.1. Let
$\mathbf {f}$
be a finite sequence of unfolding instructions of length n. Then, the corresponding run-length sequence
$R_{\mathbf {f}}$
, as well as
$S_{\mathbf {f}}$
and
$E_{\mathbf {f}}$
, has length
$2^{n-1}$
.
Proof. The result is clearly true for
$n=1$
. Now suppose
${\textbf {f}}$
has length
$n+1$
and write
${\textbf {f}} = {\textbf {g}} a$
for
$a \in \{ -1,1 \}$
. For the induction step, we use (1.1). From it, we see that there are
$2^{n-1}$
runs in
$P_{\textbf {g}}$
and in
$-P_{\textbf {g}}^R$
. Since the last symbol of
$P_{\textbf {g}}$
is the negative of the first symbol of
$-P_{\textbf {g}}^R$
, introducing a between them extends the length of one run, and does not affect the other. Thus, we do not introduce a new run nor combine two existing runs into one. Hence, the number of runs in
$P_{\textbf {f}} $
is
$2^n$
, as desired.
Remark 2.2. Bunder et al. [Reference Bunder, Bates and Arnold6] proved the same result for the specific case of the regular paperfolding sequence. It also follows as a special case of [Reference Allouche and Bousquet-Mélou5, Corollaire 5.3] (as corrected by [Reference Kao, Rampersad, Shallit and Silva12]).
Next, we find automata for the starting and ending positions of the runs. Let us start with the starting positions.
The desired automaton
$\texttt {sp}$
takes three inputs in parallel. The first input is a finite sequence
$\textbf {f}$
of unfolding instructions, the second is the number n written in base
$2$
and the third is some number x, also expressed in base
$2$
. The automaton accepts if and only if
$x = S_{\textbf {f}} [n]$
.
Normally, we think of the unfolding instructions as over the alphabet
$\{ -1, 1 \}$
, but it is useful to be more flexible and also allow
$0$
s, but only at the end; these
$0$
s are essentially disregarded. We need this because the parallel reading of inputs requires that all three inputs be of the same length. Thus, for example, the sequences
$-1, 1, 1, 0$
and
$-1, 1, 1$
are considered to specify the same paperfolding sequence, while
$-1, 0, 1, 1$
is not considered a valid specification.
Because we choose to let
$f_0$
be the first symbol of the unfolding instructions, it is also useful to require that the inputs n and x mentioned above be represented with the least-significant digit first. In this representation, we allow an unlimited number of trailing zeros.
Finally, although we assume that
$S_{\textbf {f}}$
is indexed starting at position
$1$
, it is useful to define
$S_{\textbf {f}}[0] = 0$
for all finite unfolding instruction sequences
$\textbf {f}$
.
To find the automaton computing the starting positions of runs, we use a guessing procedure described in [Reference Shallit17], based on a variant of the Myhill–Nerode theorem. Once a candidate automaton is guessed, we can rigorously verify its correctness with Walnut.
We will need one Walnut automaton, FOLD, already introduced in [Reference Shallit17], and another one that we can define via a regular expression.
-
• FOLD takes two inputs,
$\textbf {f}$ and n. If n is in the range
$1 \leq n < 2^{|{\textbf {f}}|}$ , then it returns the nth term of the paperfolding sequence specified by f.
-
• lnk takes two inputs, f and x. It accepts if f is the valid code of a paperfolding sequence (that is, no
$0$ s except at the end) and x is
$2^t-1$ , where t is the length of f (not counting
$0$ s at the end). It can be created using the Walnut command
Our guessed automaton sp has
$17$
states. We must now verify that it is correct. To do so, we need to verify the following things.
-
(1) The candidate automaton sp computes a partial function. More precisely, for a given
$\textbf {f}$ and n, at most one input of the form
$({\textbf {f}},n,x)$ is accepted.
-
(2) sp accepts
$({\textbf {f}},0,0)$ .
-
(3) sp accepts
$({\textbf {f}},1,1)$ provided
$|{\textbf {f}}| \geq 1$ .
-
(4) There is an x such that sp accepts
$({\textbf {f}},2^{|{\textbf {f}}|-1},x)$ .
-
(5) sp accepts no input of the form
$({\textbf {f}},n,x)$ if
$n> 2^{|{\textbf {f}}|-1}$ .
-
(6) If sp accepts
$({\textbf {f}},2^{|{\textbf {f}}|-1},x)$ , then the symbols
$P_{\textbf {f}}[t]$ for
$x \leq t < 2^{|{\textbf {f}}|}$ are all the same.
-
(7) Runs are nonempty: if sp accepts
$({\textbf {f}},n-1,y)$ and
$({\textbf {f}},n,z)$ , then
$y<z$ .
-
(8) Finally, we check that if
$\texttt {sp}$ accepts
$({\textbf {f}},n,x)$ , then x is truly the starting position of the nth run. This means that all the symbols from the starting position of the
$(n-1)$ th run to
$x-1$ are the same, and different from
$P_{\textbf {f}}[x]$ .
We use the following Walnut code to check each of these. A brief review of Walnut syntax may be useful:
-
• ?lsd_2 specifies that all numbers are represented with the least-significant digit first and in base
$2$ ;
-
• A is the universal quantifier
$\forall $ and E is the existential quantifier
$\exists $ ;
-
• & is logical AND, | is logical OR, W is logical NOT, => is logical implication, <=> is logical IFF, and != is inequality;
-
• eval expects a quoted string representing a first-order assertion with no free (unbound) variables, and returns TRUE or FALSE;
-
• def expects a quoted string representing a first-order assertion
$\varphi $ that may have free (unbound) variables, and computes an automaton accepting the representations of those tuples of variables that make
$\varphi $ true, which can be used later.

Walnut returns TRUE for all of these, which gives us a proof by induction on n that indeed
$x_n = S_{\textbf {f}}[n]$
.
From the automaton for starting positions of runs, we can obtain the automaton for ending positions of runs, ep, using the following Walnut code:

Thus, we have proved the following result.
Theorem 2.3. There is a synchronised automaton of
$17$
states sp computing
$S_{\mathbf {f}}[n]$
and one of
$13$
states ep computing
$E_{\mathbf {f}}[n]$
for all paperfolding sequences simultaneously.
Using the automaton ep, we are now able to prove the following new theorem. Roughly speaking, it says that the ending position of the nth run for the unfolding instructions
$\textbf {f}$
is
$2n - \epsilon _n$
, where
$\epsilon _n \in \{0, 1 \}$
, and we can compute
$\epsilon _n$
by looking at a sequence of unfolding instructions closely related to
$\textbf {f}$
.
Theorem 2.4. Let
$\mathbf {f}$
be a finite sequence of unfolding instructions of length at least
$2$
. Define a new sequence
$\mathbf {g}$
of unfolding instructions as follows:

Then,

for
$1 \leq n < 2^{n-1}$
, where

Furthermore, if
$\mathbf {f}$
is an infinite set of unfolding instructions, then (2.2) holds for all
$n \geq 1$
.
Proof. We prove this using Walnut. First, we need an automaton assoc that takes two inputs
$\textbf {f}$
and
$\textbf {g}$
in parallel, and accepts if
$\textbf {g}$
is defined as in (2.1). This automaton is depicted in Figure 1 and correctness is left to the reader. Now, we use the following Walnut code:

and Walnut returns TRUE.
3 Automaton for the sequence of run lengths
Next, we turn to the sequence of run lengths itself. We can compute these from the automata for ep and sp:

Figure 1 The automaton assoc.

Proposition 3.1. For all finite and infinite sequences of paperfolding instructions, the only run lengths are
$1,2$
or
$3$
.
Proof. It suffices to prove this for finite paperfolding sequences:

and Walnut returns TRUE.
Remark 3.2. Proposition 3.1 was proved by Bunder et al. [Reference Bunder, Bates and Arnold6] for the specific case of the regular paperfolding sequence.
We now use another feature of Walnut, which is that we can turn a synchronised automaton computing a function of finite range into an automaton returning the value of the function. The following code:

computes an automaton RL of two inputs
$\textbf {f}$
and n, and returns the value of the run-length sequence at index n (either
$1$
,
$2$
or
$3$
) for the unfolding instructions
$\textbf {f}$
. This automaton has
$31$
states.
We now turn to examining the factors of the run-length sequences of paperfolding sequences. Recall that a factor is a contiguous block sitting inside a large sequence. We start with overlaps.
Recall that an overlap is a string of the form
$axaxa$
, where a is a single letter and x is a possibly empty string. For example, the word entente is an overlap from French. We now prove that the sequence of run lengths in a paperfolding sequence contains no overlaps.
Theorem 3.3. The sequence of run lengths corresponding to every finite or infinite paperfolding sequence is overlap-free.
Proof. It suffices to prove the result for every finite paperfolding sequence. We can do this as follows:

and Walnut returns TRUE.
We now consider squares, that is, blocks of the form
$zz$
, where z is a nonempty sequence.
Theorem 3.4. The only possible squares occurring in the run lengths of a paperfolding sequence are
$22$
,
$123123$
and
$321321$
.
Proof. We start by showing that the only squares are of order
$1$
or
$3$
:

Next, we check that the only square of order
$1$
is
$22$
:

Finally, we check that the only squares of order
$3$
are
$123123$
and
$321321$
:

Proposition 3.5. In every finite paperfolding sequence formed by
$7$
or more unfolding instructions, the squares
$22$
,
$123123$
and
$321321$
are all present in the run-length sequence.
We now turn to palindromes.
Theorem 3.6. The only palindromes that can occur in the run-length sequence of a paperfolding sequence are
$1,2,3, 22, 212, 232, 12321$
and
$32123$
.
Proof. It suffices to check the factors of the run-length sequences of length at most
$7$
. These correspond to factors of length at most
$2+3\times 7 = 23$
, and by the bounds on the ‘appearance’ function given in [Reference Shallit17, Theorem 12.2.2]; to guarantee we have seen all of these factors, it suffices to look at prefixes of paperfolding sequences of length at most
$13 \times 23 = 299$
. (Also see [Reference Burns7].) Hence, it suffices to look at all
$2^9$
finite paperfolding sequences of length
$2^9 - 1 = 511$
specified by instructions of length
$9$
. When we do this, the only palindromes we find are those in the statement of the theorem.
Recall that the subword complexity of an infinite sequence is the function that counts, for each
$n \geq 0$
, the number of distinct factors of length n appearing in it. The subword complexity of the paperfolding sequences was determined by Allouche [Reference Allouche2].
Theorem 3.7. The subword complexity of the run-length sequence of an infinite paperfolding sequence is
$4n+4$
for
$n \geq 6$
.
Proof. First we prove that if x is a factor of a run-length sequence and
$|x| \geq 2$
, then
$xa$
is a factor of the same sequence for at most two different a:

Next, we prove that if
$|x| \geq 5$
, then exactly four factors of a run-length sequence are right-special (have a right extension by two different letters):

Here, nofive shows that no length 5 or larger has five or more right-special factors of that length, and every length
$6$
or larger has exactly four such right-special factors. Here, we have used [Reference Shallit17, Theorem 12.2.2], which guarantees that every factor of length n of a paperfolding sequence can be found in a prefix of length
$13n$
. Thus, we see that if there are t factors of length
$n \geq 6$
, then there are
$t+4$
factors of length
$n+1$
: the t arising from those that can be extended in exactly one way to the right, and the
$4$
additional from those that have two extensions.
Since there are
$28$
factors of every run-length sequence of length
$6$
(which we can check just by enumerating them, again using [Reference Shallit17, Theorem 12.2.2]), the result now follows by a trivial induction.
4 The regular paperfolding sequence
In this section, we specialise everything we have done so far to the case of a single infinite paperfolding sequence, the so-called regular paperfolding sequence, where the folding instructions are
$1^{\omega } = 111\cdots $
. In [Reference Bunder, Bates and Arnold6], the sequence
$2122321231232212\cdots $
of run lengths for the regular paperfolding sequence was called
$g(n)$
, and the sequence
$2, 3, 5, 7, 10, 12, 13, 15, 18, 19, 21,\ldots $
of ending positions of runs was called
$h(n)$
. We adopt their notation. Note that
$g(n)$
forms sequence
A088431
in the OEIS, while the sequence of starting positions of runs
$1, 3, 4, 6, 8, 11, 13, 14, 16,\ldots $
is
A371594
. The sequence
$h(n)$
of ending positions of runs is
A381929
.
In this case, we can construct an automaton computing the nth term of the run length sequence
$g(n)$
as follows:

The resulting automaton is depicted in Figure 2.

Figure 2 The lsd-first automaton RLR.
Casual inspection of this automaton immediately proves many of the results of [Reference Bunder, Bates and Arnold6], such as the multi-part Theorems 2.1 and 2.2. To name just one example, the sequence
$g(n)$
takes the value
$1$
if and only if
$n \equiv 2, 7$
(mod
$8$
). We can also use Walnut to prove the other results from [Reference Bunder, Bates and Arnold6].
We can also specialise sp and ep to the case of the regular paperfolding sequence, as follows:

These automata are depicted in Figures 3 and 4.

Figure 3 Synchronised automaton sp_reg for starting positions of runs of the regular paperfolding sequence.

Figure 4 Synchronised automaton ep_reg for ending positions of runs of the regular paperfolding sequence.
Once we have these automata, we can easily recover many of the results of [Reference Bunder, Bates and Arnold6], such as Theorem 3.2. For example, if
$n \equiv 1$
(mod
$4$
), then
$h(n) = 2n$
. We can prove this as follows with Walnut:

The reader may enjoy constructing Walnut expressions to check the other results of [Reference Bunder, Bates and Arnold6].
Slightly more challenging to prove is the sum property, conjectured by Hendriks, and given in [Reference Bunder, Bates and Arnold6, Theorem 4.1]. We state it as follows.
Theorem 4.1. Arrange the set of positive integers not in
$H := \{ h(n)+1 \, : \, n \geq 0 \}$
in increasing order and let
$t(n)$
be the nth such integer for
$n \geq 1$
. Then:
-
(a)
$g(h(i)+1) = 2$ for
$i \geq 0$ ;
-
(b)
$g(t(2i)) = 3$ for
$i \geq 1$ ;
-
(c)
$g(t(2i-1)) = 1$ for
$i \geq 1$ .
Proof. The first step is to create an automaton tt computing
$t(n)$
. Once again, we guess the automaton from data and then verify its correctness. It is depicted in Figure 5.

Figure 5 The automaton tt computing
$t(n)$
.
To verify its correctness, we need to verify that tt indeed computes an increasing function
$t(n)$
and further that the set
$\{ t(n) \, : \, n \geq 1 \} = \{ 1,2, \ldots , \} \setminus H$
. We can do this as follows:

Now, we can verify parts (a)–(c) of Theorem 4.1 as follows:

and Walnut returns TRUE for all of these. This completes the proof.
5 Connection with continued fractions
Dimitri Hendriks observed, and Bunder et al. [Reference Bunder, Bates and Arnold6] proved, a relationship between the sequence of runs for the regular paperfolding sequence and the continued fraction for the real number
$\sum _{i \geq 0} 2^{-2^i}$
.
As it turns out, however, a much more general result holds; it links the continued fractions for uncountably many irrational numbers to runs in the paperfolding sequences.
Theorem 5.1. Let
$n\geq 2$
and
$\epsilon _i \in \{ -1, 1\}$
for
$2 \leq i \leq n$
. Define

Then, the continued fraction for
$\alpha (\epsilon _2, \epsilon _3, \ldots , \epsilon _n)$
is given by
$[0, 1, (2R_{1, \epsilon _2, \epsilon _3, \ldots , \epsilon _n})']$
, where the prime indicates that the last term is increased by
$1$
.
As a consequence, the numbers
$\alpha (\epsilon _2, \epsilon _3,\ldots )$
have continued fraction given by
$[0, 1, 2R_{1, \epsilon _2, \epsilon _3, \ldots }]$
.
Remark 5.2. The numbers
$\alpha (\epsilon _2, \epsilon _3,\ldots )$
were proved transcendental by Kempner [Reference Kempner13]. They are sometimes erroneously called Fredholm numbers, even though Fredholm never studied them [Reference Shallit15, page 162].
As an example, suppose
$(\epsilon _2,\epsilon _3,\epsilon _4,\epsilon _5) = (1,-1,-1,1)$
. Then,

while
$R_{1,1,-1,-1, 1} = 2213212232123122$
.
To prove Theorem 5.1, we need the ‘folding lemma’.
Lemma 5.3. Suppose
$p/q = [0, a_1, a_2,\ldots , a_t]$
, t is odd and
$\epsilon \in \{-1, 1\}$
. Then,

Proof. See [Reference Dekking, Mendès France and van der Poorten9, page 177], although the general ideas can also be found in [Reference Shallit18, Reference Shallit19].
We can now prove Theorem 5.1 by induction.
Proof of Theorem 5.1.
From Lemma 5.3, we see that if
$\alpha (\epsilon _2, \epsilon _3, \ldots , \epsilon _n) = [0, 1, a_2, \ldots , a_t]$
, then

Now, for the paperfolding sequence,
$P_{1, \epsilon _2, \epsilon _3, \ldots , \epsilon _n}$
always ends in
$-1$
. Write
$R_{1, \epsilon _2, \epsilon _3, \ldots , \epsilon _n} = b_1 b_2 \cdots b_t$
. Then,

if
$\epsilon _{n+1} = -1$
(because we extend the last run with one more
$-1$
), and

if
$\epsilon _{n+1} =1$
.
Suppose

and let
$R_{1, \epsilon _2, \ldots , \epsilon _n} = b_1 b_2 \cdots b_{t-1}$
. Then,

as desired.
Acknowledgments
I thank Jean-Paul Allouche, Narad Rampersad and the anonymous referee for their helpful comments.