Trees

by Herbert Fleischner
Recall that a graph G consists of a (finite) set of vertices, denoted by V(G), and a finite set of edges, denoted by E(G). We then write G=V(G)ÈE(G). A graph H=V(H)ÈE(H) is a subgraph of G if V(H) Í V(G), E(H) Í E(G). If for this subgraph H of G we actually have V(H)=V(G) then we call H a spanning subgraph of G. For the graph in Figure 1 below (where we have capitalized the vertices to help you follow, although we will not always do this), H1={U,V,X}È{e,f,g} is a simple subgraph of G, and H2={U,V,X,Y}È{f,g,h} is a simple spanning subgraph of G.

Figure 1: This was Fig.2 in Issue 6.1

A walk in a graph G connecting two vertices X,Y Î V(G), denoted as W(X,Y) for short, is an alternating sequence of the form
W(X,Y)=X,e,U,f,V,g,¼,h,Y
where `alternating' means a vertex in this sequence is followed by an edge which in turn is followed by a vertex, then an edge again, and so on; and if an edge k in this sequence follows vertex S and is followed by vertex T, then k is incident with S and T. So, in the above W(X,Y),{X,U,V,¼,Y} Í V(G) and {e,f,g,¼h} Í E(G). For example, in the graph H of Figure 1, the sequence
W(V,x)=V,f,U,k,U,e,X,h,Y,h,X
is a walk connecting V and X. On the other hand, the sequence S in the same graph H,
S=V,f,U,l,U,e,Y,h,X
is alternating in elements of V(H) and E(H), but it is not a walk because l is not incident with U, and e is not incident with Y. Observe that in a walk vertices and edges may be repeated. We can imagine a walk in a graph as the physical action of walking through vertices and edges. Recall that we distinguish between open and closed walks depending on whether the first and last vertex in the corresponding sequences are different or equal. So, the above W(V,x) is an open walk.

If in a (closed/open) walk no edges are repeated, then we call such a walk a (closed/open) trail. An open trail is called a path if no vertices are repeated in the corresponding sequence; and a closed trail is called a cycle if no vertices are being repeated (except the first being equal to the last one) and if it contains at least one edge. So again looking at Figure 1,
P(V,X)=V,f,U,e,X
is a path. The walk
P(V,V)=V
is a closed trail because no edges are repeated (simply because it does not contain any edge), but it is not a cycle precisely because it does not contain any edge. It is called a trivial path. However, the sequence V,l,V is a cycle, and so is the sequence V,f,U,e,X,g,V. A graph which has no cycles is called an acyclic graph or a forest; an acyclic graph which is also connected is called a tree.

tree.png

Figure 2: Some trees

While it is hard to imagine in day-to-day life that a forest should have no more than one or two or three trees, this is indeed possible in graph theory.

One of the central results in graph theory is the following: Every connected graph G contains a spanning tree. This means a subgraph which is a tree, and spans the original graph - that is, includes all the original vertices. To prove this result we proceed in two steps (to make the whole proof easier to understand). First we prove

Theorem 4
If G=V(G)ÈE(G) is a connected graph, and if e Î E(G) belongs to a cycle G, then G-{e}=V(G)È(E(G)-{e}) is also a connected graph.
Proof:
Let G1:=G-{e}. We want to show that G1 is connected. To achieve this we consider arbitrary vertices x,y Î V(G1)=V(G) and want to prove that a path P1(x,y) exists in G1 (for then we conclude that in G1 any two vertices can be connected by a path; i.e., G1 is connected).

In any case, a path P(x,y) containing x,y exists in G. If P(x,y) does not contain the special edge e, then P(x,y) is also a path in G1. So, setting in this case P1(x,y)=P(x,y) we obtain what we need. Whence we assume that P(x,y) contains e. Suppose e is incident to s,t Î V(G)=V(G1) (possibly s=t in which case e is a loop, or x=s, or t=y etc. - all this does not really matter). Then we can write the alternating sequence P(x,y) in the form
P(x,y)=x,¼,s,e,t,¼,y.
(we may assume s is reached before traversing e to reach t; otherwise, we just rename t as s and s as t).

Now, e belongs to a cycle C by hypothesis, so we can write the alternating sequence C in the form
C=s,¼,t,e,s;
i.e., we assume without loss of generality that a `run' through C starts at s and traverses e last (and therefore, t precedes e in this sequence). It follows that the subsequence of C containing all vertices and edges of C except e, i.e.
P(s,t)=s,¼,t,
is a path. By construction of P(s,t), we may replace in the above P(x,y) the section s,e,t with P(s,t) to obtain
W1(x,y)=x,¼,

s,¼,t
P(s,t) 
,¼,y
which is a walk connecting x and y and not containing e. That is, W1(x,y) is a walk in G1 connecting x and y. By Theorem ?? (in issue 6.1), W1(x,y) contains a subsequence which is a path P1(x,y) as required since e cannot belong to P1(x,y) because it is already not in W1(x,y). \blacksquare

Now, with Theorem 4 at hand it is easy to prove what we wanted to show originally.

Theorem 5
If G is a connected graph, then it contains a spanning tree T.
Proof (1):(direct)
  If G is acyclic, then T=G is the only spanning tree of G (note that G is connected by assumption). Whence suppose G has a cycle C; let e be an edge in C. By Theorem 4, G1:=G-{e} is connected. Now we repeat the above argument with G1 in place of G. If G1 is acyclic, then it is a spanning tree of G since it is also connected and V(G1)=V(G). In this case T=G1. So we assume G1 has a cycle C1, and let e1 be an edge in C1. G2:=G1-{e} is connected by Theorem 4. And so on.

To see that this procedure of successively deleting edges belonging to cycles, must terminate with some acyclic graph G (after having deleted a certain number of edges), we make the following observations:

(a) E(G) is a finite set;
(b) at each step we obtain a connected graph (by the choice of an edge belonging to a cycle, and by Theorem 4) which has one edge less than the graph considered in the preceding step; the vertex set, however, remains unchanged;
(c) if G has precisely one vertex, v say, then T={v} is the only spanning tree of G (i.e., we have to delete all edges of G: in this case E(G) is a set of loops);
(d) if G has two or more vertices, then the procedure must stop before deleting all edges since H=V(G) is then a disconnected graph (every vertex of H has degree 0) - however, the choice of the edges to be deleted and the application of Theorem 4 always render connected graphs.
In view of (a) - (d) we arrive at Gk after deleting k edges for some k ³ 0 (with G0=G) such that Gk is connected and acyclic. Then T=Gk is as required. \blacksquare
Proof (2): (by contradiction)
Suppose the theorem to be false for some graph on n ³ 1 vertices. That is, there is a connected graph G=V(G)ÈE(G), V(G)={v1,¼,vn}, n ³ 1, such that G has no spanning tree. Choose from among all counterexamples to the theorem, which have n vertices, one graph which has as few edges as possible; call it H. Now, H is connected; so if it is acyclic, then it is a tree and T=H is a spanning tree of H. Consequently since H is a counterexample it must contain a cycle C and thus an edge f belonging to C. Then H1:=H-{f} is a connected graph by Theorem 4, but it is not a counterexample to Theorem 5 because it has fewer edges than H. So H1 contains a spanning tree T. By definition V(T)=V(H1), and by construction V(H1)=V(H); i.e., T is a spanning tree of H as well, contradicting the supposition that H is a counterexample of Theorem 5. Whence we conclude that no counterexamples exist. \blacksquare
Proof (3): (by induction)
To avoid the use of a formula which we will prove below, we proceed by induction on the number of cycles in a connected graph G. That is we consider all connected graphs and show by induction on the number of cycles in each of these graphs that a spanning tree must exist.

Induction beginning: G is a connected graph with 0 cycles, i.e., G is also acyclic, i.e., G is a tree, so T=G is as required.

Induction hypothesis: Suppose it has been shown that every connected graph with at most k ³ 0 cycles contains a spanning tree. Let G be a connected graph having k+1 ³ 1 cycles; so it has at least one cycle C.

Induction step: Let e be an edge in C. By Theorem 4, G1:=G-{e} is connected and has k1 £ k cycles (every cycle of G1 is a cycle of G, but C is no longer a cycle of G1). By the induction hypothesis, G1 has a spanning tree T. Now V(T)=V(G1)=V(G) classifies T as a spanning tree of G as well. Whence we conclude that every connected graph having k+1 cycles, also has a spanning tree. \blacksquare

Note that deleting an edge belonging to a cycle may `destroy' many cycles. This is why in the above argument, we only obtain the inequality k1 £ k, and explains why we phrased the induction hypothesis as `... graph with at most k ³ 0 cycles...' (`having at most 0 cycles' is tantamount to saying `having no cycles').

However, more importantly, the above shows that in proving Theorem 5 we can employ each of the three principles of proof. In secondary school mathematics currently taught in Zimbabwe, there are hardly any mathematical statements where this is possible. The length of the first (direct) proof should not be considered as discouraging - for this proof illustrates how to formulate an algorithm which produces a spanning tree from a given connected graph (one says: `the proof is algorithmic by nature').

Looking at the graphs in Figure 2 we notice that they all have some vertices of degree 1 - such vertices are called endvertices. This is a general phenomenon indeed.

Lemma 6
If G is a tree with at least two vertices, then it has at least two different endvertices.
Proof:
Since G is a tree, it is also a connected graph; and since it has at least two vertices, E(G) ¹ Æ must hold (see (d) in the proof of Theorem 5 above). Start a walk at some u Î V(G) along some edge e incident with u; e cannot be a loop since G is acyclic; so e joins u and v ¹ u. If d(v)=1, i.e., e is the only edge incident with v, then v is an endvertex; otherwise continue the walk along some f ¹ e. Repeating the argument, we note:
(i) in continuing the walk we always choose the edge at a vertex of degree > 1, which is different from those edges already traversed;
(ii) because of (i) and since G is acyclic we cannot reach a vertex passed before (think this through!).
Thus, since V(G) and E(G) are finite sets, this walk (which is a path indeed) must terminate at some vertex t from which the walk cannot be continued along an edge not yet traversed. That is, d(t)=1, hence t is an endvertex. t ¹ u, but t=v is possible (see above).

Now we look at u. If d(u)=1, then u,t are two endvertices, and nothing is left to be proved. So assume d(u) > 1, and start at u another walk traversing e¢ ¹ e first (such e¢ incident with u exists because of d(u) > 1). Observing that (i) and (ii) also hold w.r.t. this second walk, and hold in this case even w.r.t. vertices passed in the first walk, we conclude that the second walk will terminate at some w ¹ t such that d(w)=1. That is, t and w are two different endvertices of G. \blacksquare

In fact, this Lemma will be used in the following criterion which characterizes trees by a simple relationship between the respective numbers of vertices and edges.

Theorem 7
A connected graph G on p vertices and q edges is a tree if and only if q=p-1.
Proof:
Since the theorem says `if and only if', we must proceed in two steps, starting from the hypothesis that G is connected.

  1. Suppose the connected graph G on p vertices and q edges is a tree. We want to show that q=p-1 (that is the `only if' part, the necessary condition).

    If G has just one vertex, then E(G)=Æ follows of necessity in this case: i.e., p=1,q=0 implying q=0=1-1=p-1. So the claimed equation holds for a tree with just one vertex.

    If G has more than one vertex, then G being a tree implies (by Lemma 6) that G has at least two endvertices. Let t Î V(G) be one of them, and let f be the only edge incident with t. Then
    G1:=G-{t,f}
    is also a tree (think this through!), and it has one vertex less than G. Since the equation has been proved for p=1, we may invoke induction: so if G1 has p1 vertices and q1 edges, then q1=p1-1. On the other hand, p=p1+1,q=q1+1 by construction of G1. Whence q=q1+1=(p1+1)-1=p-1 implying the validity of the equation for G as well. [As an exercise, prove this part of the proof of the theorem, by contradiction].

  2. Suppose for the connected graph G that its number of vertices, p, and its number of edges, q, satisfy q=p-1. We want to show that G is a tree. In any case, by Theorem 5, G contains a spanning tree T. T has p¢=p vertices and q¢ £ q edges. Now, since T is a tree we know from 1., the first part of the proof of the theorem, that q¢=p¢-1; and q=p-1 holds by assumption. Combining these two equations and using p¢=p,q¢ £ q, we obtain
    q ³ q¢=p¢-1=p-1=q
    from which q¢=q follows. That is T=G, so G is a tree indeed.
\blacksquare

Note that if we had proved part 1 of the proof of Theorem 7 before proving Theorem 5 (in this part 1 and in the proof of Lemma 6, no use of Theorem 5 has been made), then we could have proved Theorem 5 by induction on the number of edges of G, instead of proceeding by induction on the number of cycles (check this carefully!).

Trees play an important role in graph theory - but they originate from considerations in organic chemistry! And this is where we want to turn to, now.

Let us consider the alkanes: their formula is CnH2n+2. But what are their structure formulae? Don't worry, if you forgot. Let us just consider the structure formulae for n=1,2; and the corresponding abstract graphs (for this is what structure formulae of chemical compounds are!).

fig6.png
Figure 3:

So, the structure formulae of CH4 and C2H6 are trees if we represent atoms by vertices and valencies by edges. But is this true for every n? In any case, the graphs corresponding to CnH2n+2 have only two types of vertices: vertices of degree 4 and vertices of degree 1 - and this is all that we borrow from chemistry: carbon atoms are 4-valent, hydrogen atoms are 1-valent.

The graph Gn corresponding to CnH2n+2 has
p=n+(2n+2)=3n+2
vertices. The Handshaking Lemma (Theorem 3 in the preceding issue),
åi=1p d(vi)=2q, applied to this special case yields
n·4+(2n+2)·1 = 2q, i.e., q=3n+1
(observe that in this case, the first n terms in the sum are all 4, while the others are all 1). That is, the graphs Gnn=1,2,¼, with p=3n+2 and - as just shown - q=3n+1 thus satisfy q=p-1. Since we are not dealing with polymers, the molecules CnH2n+2 correspond to connected (!) graphs. So, Gn is a connected graph satisfying q=p-1. By Theorem 7, Gn is a tree for every n. Quite amazing indeed how much one can say re. the alkanes by using very little of their chemical properties!

But that's not all! If we look at the subgraphs Tn of Gn which we obtain by deleting all the endvertices of Gn (together with the corresponding edges), then these Tn are also trees - called carbon trees. Now, T1,T2,T3 are uniquely determined, but for T4 we have two possibilities (see Figure 4). And for n > 4 the number of different carbon trees Tn grows larger and larger (determine all possibilities for T5 and T6). That is, through graph theoretical considerations we are led to the possible existence of isomers. Which of the possible carbon trees Tn appear in reality, is being decided by chemical and physical properties of CnH2n+2. However, graph theory can say how many isomers of the form CnH2n+2 may possibly exist.

fig7.png
Figure 4:

We finish by remarking that trees also play an important role in areas like computer science and coding theory.
Previous Next Home

Please contact the webmaster for any comments on these pages.