A Prime Way to Count Edge-Labeled Ordered Trees

Lately I been thinking a bit about the various ways of encoding information and stumbled on some beautiful and natural encoding for arbitrary finite trees with edges that are labelled by natural numbers. I will first discuss a well-known related encoding that has applications in theory and is used for lists of numbers, and then show how it can be slightly generalized. It is one of these neat things that are really simple, yet for some reason I have not seen it anywhere before. I just found a few similar ideas, but not the exact variation I will present. While there are infinitely many different ways to define such an encoding, I think that the encoding you will see is very natural, and later I will also explain what I mean by that.

Intro: Saving state in numbers

Imagine that you need an unbounded number of memory cells that can store an arbitrary natural number, but somehow you need to squeeze all that into a single number. How could you do it? Easy, just use prime numbers – identify each cell with a distinct prime $p_i$ and the stored value with the multiplicity of the prime $e_i$. As every natural number has a unique prime factorization, the number $n := \prod p_i^{e_i}$ uniquely encodes the current state of your memory cells and you can decode by using any method of prime factorization of your choice. This is in fact quite a useful trick and variations of such an encoding are often used to prove Turing completeness of various formal systems – all you need is unbounded memory and the ability to do some basic arithmetic.

To formalize this encoding, let $\mathbb{N}$ be the set of natural numbers, $\mathbb{N}_0 := \mathbb{N} \cup \{0\}$, and let $P_i$ denote the $i$-th prime number (with $P_0 := 1$). Stated more formally, the sketched encoding implies a bijection between $\mathbb{N}$ and ${\mathbb{N}_0}^{*}\times\mathbb{N}$. Note that while there is an unbounded number of memory cells, at any point in time we only use finitely many, and we assume that we only write down values for cells up to the last one that we use, and unused (or erased) cells have the value 0. Then we can write the encoding function for a memory state like this:

$$ \mathsf{encode}(M) = \prod_{1\leq i \leq |M|} {P_{i}}^{M[i]} $$

So a memory state maps to a sequence like $M=(0,1,2,0,3)$, which we treat like an array (i.e. $M[i]$ denotes the $i$-th number). In the encoding this corresponds to $n_M = 3\cdot 5^2 \cdot 11^3 = 99825$, i.e. As each prime factorization can be interpreted as such a list, we can also reverse this mapping, giving us a way to enumerate all memory configurations.

Pushing it further: From lists to trees

So we started with a list of numbers and saw a way to compress them into a single number. Let us now turn this around – given some natural number, we can factorize it down into prime numbers. We get primes, and their multiplicities as exponents. The primes are, well, prime, and the exponents – they still can be any number! Why did we stop the decomposition process?

We can apply prime factorization recursively to all non-prime numbers in the exponents. This will eventually leave us with an arithmetic term that consists of nested products of prime powers. For example, the number1 $2^{243} 7^{250}$ is fully factorized to $2^{3^5} 7^{2\cdot 5^3}$, which can also be thought of like this:

Let’s call such a tree that represents the result of the recursive factorization simply factorization tree. Because multiplication is commutative, swapping siblings does not change the underlying expression. But as no node can have two outgoing edges labelled by the same prime, we can fix a simple canonical representation where we always list subtrees in ascending order of the prime on its incoming edge (i.e., just like in the example).

Factorization trees are neat, but I promised that we can use numbers to encode trees with no restrictions on the label values or the sibling ordering. The key here is not to observe what there is, but what is not. These trees are pretty constrained, but still there are infinitely many primes and each could be labelling an edge going out from any node of a valid factorization tree. We can use this to our advantage!

Imagine the sequence of all primes and consider presence or absence of a prime in some factorization as a bit of information. Ignoring the exponents, a list of prime factors like $2,5,13,29,…$ can be interpreted as the bit string $1010010001…$, where we have a $1$ at position $i$ iff the prime $P_i$ is part of our factorization. Now to complete our encoding, we will use another old and well-known trick and interpret such strings as a unary representation of a list of numbers. The number $n$ now corresponds to the bit sequence $0^{n-1}1$, so $1010010001$ represents $1,2,3,4$.

That’s all we need – putting it together, to encode arbitrary labels for our trees we have to choose primes so that the gaps between adjacent prime factors have the desired size (for the first factor, remember that we defined $P_0=1$). Formally, if we want to have a sequence of adjacent outgoing edges labelled by some numbers $n_i$, then in the encoding we have $f_1 := P_{n_1}$ as the first prime factor and then we let $f_i := P_{f_{i-1} + n_i}$. The number $\prod_i f_i$ now encodes an ordered edge-labeled tree of depth 1 with the desired labels. Inductively, given a sequence of pairs of numbers $(e_j, m_j)$, respectively encoding some tree and its incoming edge label value, we combine them into a larger tree by computing the correct base primes $p_j$ from the values $m_j$ as before, yielding a number that encodes the desired tree from the product $\prod_j {p_j}^{e_j}$. Conversely, to decode a tree, just compute the factorization tree and recover the correct labels from the gaps between primes labelling sibling edges.

Completing the example above, we have the following correspondence:

$2^{243} 7^{250}$ $\hspace{5mm}\leftrightarrow$ $\leftrightarrow$

A chaotic-looking kind of order

Not surprisingly, by construction the tree encoding inherits many properties from the arithmetic properties of prime factorizations. This can be used to create a conceptual mapping between the arbitrary labeled tree view and factorization tree view. While some are very straight-forward, I think it helps to think through various simple relationships to get an intuition for the encoding, e.g.:

  • the number $P_i$ encodes a edge labelled by $i$.
  • the number ${P_i}^j$ encodes the tree where the root has a single subtree indexed by the number $j$ in the encoding, and the incoming edge is labelled by $i$.
  • the product $P_i P_j$ encodes the tree of height 1 with two edges, where the first is labelled by $i$ and the second by $j-i$.

However, while interpreting what e.g. multiplication of two trees does (subtrees sharing common prime factors are zipped into each other, while co-prime branches are preserved) or interpreting tree operations like inserting or removing subtrees at certain positions in terms of numbers is possible, this does not seem to lead to any useful new insights. It just trivially follows from the chosen encoding.

It would also be interesting to have non-trivial upper bounds for listing all trees with a fixed or bounded parameter $k$ constraining e.g. such as tree height, sum of the edges, etc., but it appears to be pretty difficult to get something useful. Maybe this is not that surprising, given that we still do not really understand prime number occurrence patterns all that well, and this encoding just piggy-backs on the natural structure inherent in numbers.

A tree of trees… is still just a tree

We first looked at a well-known encoding of lists of numbers, and now you have seen how to even encode trees with basically the same approach. Now is that the most we can do, have we pushed prime factorization as an encoding method to its limit? Well, I would say yes.

At first you might think – the trees are labelled by natural numbers, and we know how to convert a number into a tree labelled by numbers. So why not just re-interpret each label as another tree? And why even stop there? We can do it multiple times, let’s have a tree of trees of trees!

The answer to this is that we could, but it would be quite unnatural and unnecessary. Unnatural, because it would mean that we have to call our decoding procedure multiple times to get to the inner trees – this feels like cheating. Unnecessary, because we do not even need to encode a “tree of trees” in such a convoluted way. A simpler approach would be reserving the first child of each node for its “label subtree”, and use the remaining children of a node as before. Why make it needlessly more complicated?

While the step from lists to trees to me feels consequential and worthwhile, there is no point in going meta with the trees – there is nothing to gain. So unless someone comes up with a neat way to push beyond trees and encode arbitrary graphs in a similarly neat way2, I guess that’s it. However, as trees are the best friend of computer scientists and mathematicians, I think we should happily settle for that.

Okay, but why is this interesting?

If you do not find this to be intrinsically beautiful and neat, then I won’t be able to convince you, but I will try anyway. Imagine we came to this issue from the other side – you would need to find a way to enumerate all such trees, i.e. write an algorithm that sequentially can spit out all such trees, i.e., every tree in the set is produced by your algorithm after finite time and no tree is left out.3

There are infinitely many ways to do it, but let me guess how you would approach this task, if you are a bit like me:

  1. Notice that labelled trees look a bit wild and there is no obvious way to do it
  2. Find a logical way to partition the set (e.g. trees of a fixed depth, etc.)
  3. Find a way to generate the trees in each partition (or go back to step 2.)
  4. Find a way to combine the partitions coherently to get the full construction
  5. Prove that your algorithm will in fact generate each tree

Once you are done, you will end up with some complex algorithm which takes a number, runs all the logic of your own making, and spits out the corresponding tree. I am quite sure that the result will be much more complicated than the encoding we have seen above, and your procedure will contain a bunch of arbitrary4 choices you made while designing your encoding.

In contrast, I claimed that the encoding presented above is quite natural. This word, outside of category theory, is rarely used formally and people can mean quite different things. I dare to say that up to the factorization tree there is nothing to discuss – that’s just arithmetic. But the step from factorization trees to arbitrary trees can be debated.

For example, we could have also used the structure of the trees to encode information (number of nodes, depth, etc.), but any such scheme appears to me much more convoluted, and at the same time wasteful, if we would not make use of the prime-labelled edges. But once we do choose to use the labels, we do not even need to do any acrobatics using the tree structure. Now we still could have used the actual primes on the labels to encode information for us, but then we would need to arbitrarily pick out certain “magic numbers” and make them mean something special.

I hope you can see the pattern – all alternatives I can come up with would involve arbitrarily picking out either special numbers or subtrees, and embellishing them with semantics. But in this encoding, no such choices are being made. One can maybe argue about the unary encoding, but here I would still claim that

  • among ways of using the infinite “bit sequence” this is by far the simplest,
  • natural numbers are usually defined in an equivalent of a unary encoding, and
  • choice of any other base (e.g. encoding values in binary) is again arbitrary.

Sure, one can come up with many other approaches, but to me it feels like this encoding just happened to be there! It feels discovered, not constructed – that’s what I call natural. There is effectively no deliberate design here, it all falls out from the structure of arithmetic. The encoding is extremely simple, and yet, the resulting order of the trees is nothing that you, as a human being, would intuitively come up with, or even be able to comprehend5. Isn’t that pretty awesome?

Appendix: Some Python Code

# Naive implementation of a natural edge-labeled tree encoding into numbers.
# Copyright (C) 2024 Anton Pirogov, Licensed under MIT License

def primes(n):
    """Generate all prime numbers up to some value n."""
    sieve = [True] * (n+1)
    for p in range(2, n+1):
        if (sieve[p]):
            yield p
            for i in range(p, n+1, p):
                sieve[i] = False

# Primes up to some arbitrary constant, in-order + inverse lookup table
Pi = dict(map(lambda x: (x[0]+1, x[1]), enumerate(primes(300000))))
Pi[0] = 1
Np = {v: k for k, v in Pi.items()}

def is_prime(n: int) -> bool:
    """Check whether a number is prime."""
    return n in Np

def factorize(n: int):
    """Compute prime factorization of n."""
    ret, cur, i = {}, 0, 1
    while n > 1:
        p = Pi[i]
        if n % p == 0:
            n /= p
            if i != cur:
                cur = i
                ret[p] = 0
            ret[p] += 1
            i += 1
    return ret

def fac_tree(n):
    """Compute factorization tree of n."""
    return { p: fac_tree(exp) for p, exp in factorize(n).items() }

def fac_to_el_tree(t):
    """Decode an edge-labeled tree from a factorization tree."""
    if not len(t):
        return []

    ps = list(t.keys())
    ns = [Np[ps[0]]]
    for i in range(1, len(ps)):

    return [(ns[i], fac_to_el_tree(t[ps[i]])) for i in range(len(ps))]

def to_tree(n: int):
    """Get the edge-labeled tree associated with number n.

    A tree node is represented as an ordered list of pairs (l, s)
    where l is an edge label value and s is another tree node.
    return fac_to_el_tree(fac_tree(n))

# TODO: Exercise for the reader:
# Implement reverse mapping (tree to number).

def edge_sum(t) -> int:
    """Return sum of all edges in the edge-labelled tree."""
    es = map(lambda x: x[0], t)
    ts = map(lambda x: x[1], t)
    return sum(es) + sum(map(edge_sum, ts))

def height(t) -> int:
    if not t:
        return 0
    return 1+max(map(lambda x: height(x[1]), t))

print("N Height Sum Tree")
for n in range(1, 100):
    t = to_tree(n)
    print(n, height(t), edge_sum(t), t)

The number, written out, is pretty huge: 265949360227680209415688754215185627499609075786317721599142367842677527639781313219756607505815380599829726565901576838862188537016923586066126654522383074316615182405453035245691722229411757546890040074939329334236883755402052888636845394393983293221186937661678368970447179367841792
But I never claimed that any of this is efficient or practical ;)


I doubt that this is possible, at least with the structure of natural numbers (maybe it works with other structures). General graphs seem not to have a natural decomposition like numbers and trees do, that’s why we have so many different ways to break them into pieces in a structured way and why graph problems are computationally so much more difficult compared to trees.


A simpler assignment would be proving that the set of such trees is countable, i.e., that such an algorithm exists, but this could be done without providing such an algorithm by using more abstract arguments.


Here arbitrary is meant in the sense of “there is no reason, except for personal preference, to do it this way”, i.e. still deliberately chosen, and (I certainly hope) not random, or I would be seriously concerned about your algorithm designing skills.


If you do think you fully comprehend the order, probably you should go get some prizes for unsolved problems related to prime numbers and their deeper structure.