Trees
Recall that a graph G consists of a (finite) set of vertices, denoted by
V(G),
and a finite set of edges, denoted by E(G). We then write G=V(G)ÈE(G). A graph H=V(H)ÈE(H) is a
subgraph of G if
V(H) Í V(G), E(H) Í E(G). If for this subgraph H of
G we actually have V(H)=V(G) then we call H a spanning
subgraph of G.
For the graph in Figure 1 below (where we have capitalized the vertices to
help you follow, although we will not always do this),
H1={U,V,X}È{e,f,g} is a
simple subgraph of G, and H2={U,V,X,Y}È{f,g,h} is a simple
spanning subgraph of G.
|
|
Figure 1: This was Fig.2 in Issue 6.1
|
A walk in a graph G
connecting two vertices X,Y Î V(G), denoted as W(X,Y) for short, is
an alternating sequence of the form
where `alternating' means a vertex in this sequence is followed by an edge
which in turn is followed by a vertex, then an edge again, and so on; and
if an edge k in this sequence follows vertex S and is followed by
vertex T, then k is incident with S and T. So, in the above
W(X,Y),{X,U,V,¼,Y} Í V(G) and {e,f,g,¼h} Í E(G). For example, in the graph H of Figure 1,
the sequence
|
W(V,x)=V,f,U,k,U,e,X,h,Y,h,X |
|
is a walk connecting V and X. On the other hand, the sequence S in
the same graph H,
is alternating in elements of V(H) and E(H), but it is not a walk
because l is not incident with U, and e is not incident with Y.
Observe that in a walk vertices and edges may be repeated. We can imagine
a walk in a graph as the physical action of walking through vertices and
edges. Recall that we distinguish between open and
closed walks depending on whether the first and last vertex in the
corresponding sequences are different or equal. So, the above W(V,x) is
an open walk.
If in a (closed/open) walk no edges are repeated, then we call such a
walk a (closed/open) trail. An open trail is called a path if
no vertices are repeated in the corresponding sequence; and a closed trail
is called a cycle if no vertices are being repeated (except the first
being equal to the last one) and if it contains at least one edge. So
again looking at Figure 1,
is a path. The walk
is a closed trail because no edges are repeated (simply because it does
not contain any edge), but it is not a cycle precisely because it does
not contain any edge. It is called a trivial path. However,
the sequence V,l,V is a cycle, and so is the sequence V,f,U,e,X,g,V.
A graph which has no cycles is called an acyclic graph or a
forest; an acyclic graph which is also connected is called a tree.
Figure 2: Some trees
While it is hard to imagine in day-to-day life that a forest should have no
more than one or two or three trees, this is indeed possible in graph
theory.
One of the central results in graph theory is the following: Every
connected graph G contains a spanning tree. This means a
subgraph which is a tree, and spans the original graph - that is,
includes all the original vertices.
To prove this result we proceed in two steps (to
make the whole proof easier to understand). First we prove
- Theorem 4
-
If G=V(G)ÈE(G) is a connected graph, and if e Î E(G) belongs to a
cycle G, then G-{e}=V(G)È(E(G)-{e}) is also a connected graph.
- Proof:
-
Let G1:=G-{e}. We want to show that G1 is connected. To achieve
this we consider arbitrary vertices x,y Î V(G1)=V(G) and want to prove
that a path P1(x,y) exists in G1 (for then we conclude that in G1
any two vertices can be connected by a path; i.e., G1 is connected).
In any case, a path P(x,y) containing x,y exists in G. If P(x,y)
does not contain the special edge e, then P(x,y) is also a path in
G1. So, setting in this case P1(x,y)=P(x,y) we obtain what we need.
Whence we assume that P(x,y) contains e. Suppose e is incident to
s,t Î V(G)=V(G1) (possibly s=t in which case e is a loop, or x=s,
or t=y etc. - all this does not really matter). Then we can write the
alternating sequence P(x,y) in the form
(we may assume s is reached before traversing e to reach t;
otherwise, we just rename t as s and s as t).
Now, e belongs to a cycle C by hypothesis, so we can write the
alternating sequence C in the form
i.e., we assume without loss of generality that a `run' through C starts
at s and traverses e last (and therefore, t precedes e in this
sequence). It follows that the subsequence of C containing all vertices
and edges of C except e, i.e.
is a path. By construction of P(s,t), we may replace in the above
P(x,y) the section s,e,t with P(s,t) to obtain
|
W1(x,y)=x,¼, |
s,¼,t P(s,t)
|
,¼,y |
|
which is a walk connecting x and y and not containing e. That is,
W1(x,y) is a walk in G1 connecting x and y. By Theorem
?? (in issue 6.1), W1(x,y) contains a subsequence which is a path
P1(x,y) as required since e cannot belong to P1(x,y) because it
is already not in W1(x,y). \blacksquare
Now, with Theorem 4 at hand it is easy to prove what we wanted
to show originally.
- Theorem 5
-
If G is a connected graph, then it contains a spanning tree T.
- Proof (1):(direct)
-
If G is acyclic, then T=G is
the
only spanning tree of G (note that G is connected by assumption).
Whence suppose G has a cycle C; let e be an edge in C. By Theorem
4, G1:=G-{e} is connected. Now we repeat the above argument
with G1 in place of G. If G1 is acyclic, then it is a spanning tree
of G since it is also connected and V(G1)=V(G). In this case T=G1.
So we assume G1 has a cycle C1, and let e1 be an edge in C1.
G2:=G1-{e} is connected by Theorem 4. And so on.
To see that this procedure of successively deleting edges belonging to
cycles, must terminate with some acyclic graph G (after having deleted
a certain number of edges), we make the following observations:
- (a)
E(G) is a finite set;
- (b)
at each step we obtain a connected graph (by the choice of an edge
belonging to a cycle, and by Theorem 4) which has one edge less
than the graph considered in the preceding step; the vertex set, however,
remains unchanged;
- (c)
if G has precisely one vertex, v say, then T={v} is the only
spanning tree of G (i.e., we have to delete all edges of G: in this
case E(G) is a set of loops);
- (d)
if G has two or more vertices, then the procedure must stop before
deleting all edges since H=V(G) is then a disconnected graph (every
vertex of H has degree 0) - however, the choice of the edges to be
deleted and the application of Theorem 4 always render connected
graphs.
In view of (a) - (d) we arrive at Gk after deleting k edges for some
k ³ 0 (with G0=G) such that Gk is connected and acyclic. Then
T=Gk is as required. \blacksquare
- Proof (2): (by contradiction)
-
Suppose the theorem to be false for some graph on n ³ 1 vertices. That
is, there is a connected graph G=V(G)ÈE(G), V(G)={v1,¼,vn},
n ³ 1, such that G has no spanning tree. Choose from among all
counterexamples to the theorem, which have n vertices, one graph which
has as few edges as possible; call it H. Now, H is connected; so if it
is acyclic, then it is a tree and T=H is a spanning tree of H.
Consequently since H is a counterexample it must contain a cycle C and
thus an edge f belonging to C. Then H1:=H-{f} is a connected graph
by Theorem 4, but it is not a counterexample to Theorem 5
because it has fewer edges than H. So H1 contains a spanning tree T.
By definition V(T)=V(H1), and by construction V(H1)=V(H); i.e., T
is a spanning tree of H as well, contradicting the supposition that H
is a counterexample of Theorem 5. Whence we conclude that no
counterexamples exist. \blacksquare
- Proof (3): (by induction)
-
To avoid the use of a formula which we will prove below, we proceed by
induction on the number of cycles in a connected graph G. That is we
consider all connected graphs and show by induction on the number of cycles
in each of these graphs that a spanning tree must exist.
Induction beginning: G is a connected graph with 0 cycles, i.e., G is
also acyclic, i.e., G is a tree, so T=G is as required.
Induction hypothesis: Suppose it has been shown that every connected graph
with at most k ³ 0 cycles contains a spanning tree. Let G be a
connected graph having k+1 ³ 1 cycles; so it has at least one cycle C.
Induction step: Let e be an edge in C. By Theorem 4,
G1:=G-{e} is connected and has k1 £ k cycles (every cycle of G1
is a cycle of G, but C is no longer a cycle of G1). By the
induction hypothesis, G1 has a spanning tree T. Now V(T)=V(G1)=V(G)
classifies T as a spanning tree of G as well. Whence we conclude that
every connected graph having k+1 cycles, also has a spanning tree.
\blacksquare
Note that deleting an edge belonging to a cycle may `destroy' many cycles.
This is why in the above argument, we only obtain the inequality k1 £ k, and explains why we phrased the induction hypothesis as `... graph
with at most k ³ 0 cycles...' (`having at most 0 cycles' is tantamount
to saying `having no cycles').
However, more importantly, the above shows that in proving Theorem 5
we can employ each of the three principles of proof. In
secondary school mathematics currently taught in Zimbabwe, there are
hardly any mathematical statements where this is possible. The length
of the first (direct) proof should not be considered as discouraging - for
this proof illustrates how to formulate an algorithm which produces a
spanning
tree from a given connected graph (one says: `the proof is algorithmic by
nature').
Looking at the graphs in Figure 2 we notice that they all have
some vertices
of degree 1 - such vertices are called endvertices. This is a
general phenomenon indeed.
- Lemma 6
-
If G is a tree with at least two vertices, then it has at least two
different endvertices.
- Proof:
-
Since G is a tree, it is also a connected graph; and since it has at
least two vertices, E(G) ¹ Æ must hold
(see (d) in the proof of Theorem 5 above). Start a walk at some u Î V(G) along some edge e
incident with u; e cannot be a loop since G is acyclic; so e joins
u and v ¹ u. If d(v)=1, i.e., e is the only edge incident with
v, then v is an endvertex; otherwise continue the walk along some f ¹ e. Repeating the argument, we note:
- (i)
in continuing the walk we always choose the edge at a vertex of degree
> 1, which is different from those edges already traversed;
- (ii)
because of (i) and since G is acyclic we cannot reach a vertex passed
before (think this through!).
Thus, since V(G) and E(G) are finite sets, this walk (which is a path
indeed) must terminate at some vertex t from which the walk cannot be
continued along an edge not yet traversed. That is, d(t)=1, hence t is
an endvertex. t ¹ u, but t=v is possible (see above).
Now we look at u. If d(u)=1, then u,t are two endvertices, and
nothing is left to be proved. So assume d(u) > 1, and start at u another
walk traversing e¢ ¹ e first (such e¢ incident with u exists because
of d(u) > 1). Observing that (i) and (ii) also hold w.r.t. this second
walk, and hold in this case even w.r.t. vertices passed in the
first walk, we conclude that the second walk
will terminate at some w ¹ t such that d(w)=1. That is, t and w
are two different endvertices of G. \blacksquare
In fact, this Lemma will be used in the following criterion
which characterizes trees by a simple relationship between the respective
numbers of vertices and edges.
- Theorem 7
-
A connected graph G on p vertices and q edges is a tree if and only
if q=p-1.
- Proof:
-
Since the theorem says `if and only if', we must proceed in two steps,
starting from the hypothesis that G is connected.
- Suppose the connected graph G on p vertices and q edges is a tree. We
want to show that q=p-1 (that is the `only if' part, the necessary
condition).
If G has just one vertex, then E(G)=Æ follows of necessity in
this case: i.e., p=1,q=0 implying q=0=1-1=p-1. So the claimed equation
holds for a tree with just one vertex.
If G has more than one vertex, then G being a tree implies (by Lemma
6) that G has at least two endvertices. Let t Î V(G) be one
of them, and let f be the only edge incident with t. Then
is also a tree (think this through!), and it has one vertex less than G.
Since the equation has been proved for p=1, we may invoke induction: so
if G1 has p1 vertices and q1 edges, then q1=p1-1. On the other
hand, p=p1+1,q=q1+1 by construction of G1. Whence
q=q1+1=(p1+1)-1=p-1 implying the validity of the equation for G as
well. [As an exercise, prove this part of the proof of the theorem, by
contradiction].
-
Suppose for the connected graph G that its number of vertices, p, and
its number of edges, q, satisfy q=p-1. We want to show that G is a
tree. In any case, by Theorem 5, G contains a spanning tree T.
T has p¢=p vertices and q¢ £ q edges. Now, since T is a tree we
know from 1., the first part of the proof of the theorem, that q¢=p¢-1;
and q=p-1 holds by assumption. Combining these two equations and using
p¢=p,q¢ £ q, we obtain
from which q¢=q follows. That is T=G, so G is a tree indeed.
\blacksquare
Note that if we had proved part 1 of the proof of Theorem 7
before proving Theorem 5 (in this part 1 and in the proof of
Lemma 6, no use of Theorem 5 has been made), then we
could have proved Theorem 5 by induction on the number of edges of
G, instead of proceeding by induction on the number of cycles (check this
carefully!).
Trees play an important role in graph theory - but they originate from
considerations in organic chemistry! And this is where we want to turn to,
now.
Let us consider the alkanes: their formula is CnH2n+2. But what are
their structure formulae? Don't worry, if you forgot. Let us just consider
the structure formulae for n=1,2; and the corresponding abstract graphs
(for this is what structure formulae of chemical compounds are!).
So, the structure formulae of CH4 and C2H6 are trees if we
represent atoms by vertices and valencies by edges. But is this true for
every n? In any case, the graphs corresponding to CnH2n+2 have
only two types of vertices: vertices of degree 4 and vertices of degree 1 -
and this is all that we borrow from chemistry: carbon atoms are 4-valent,
hydrogen atoms are 1-valent.
The graph Gn corresponding to CnH2n+2 has
vertices. The Handshaking Lemma (Theorem 3 in the preceding issue),
åi=1p d(vi)=2q, applied to this special case yields
|
n·4+(2n+2)·1 = 2q, i.e., q=3n+1 |
|
(observe that in this case, the first n terms in the sum are all 4, while
the others are all 1). That is, the graphs Gn, n=1,2,¼, with
p=3n+2 and - as just shown - q=3n+1 thus satisfy q=p-1. Since we are
not dealing with polymers, the molecules CnH2n+2 correspond to
connected (!) graphs. So, Gn is a connected graph satisfying q=p-1. By
Theorem 7, Gn is a tree for every n. Quite amazing indeed how
much one can say re. the alkanes by using very little of their chemical
properties!
But that's not all! If we look at the subgraphs Tn of Gn which we
obtain by deleting all the endvertices of Gn (together with the
corresponding edges), then these Tn are also trees - called carbon
trees. Now, T1,T2,T3 are uniquely determined, but for T4 we have
two possibilities (see Figure 4). And for n > 4 the number of
different carbon trees Tn grows larger and larger (determine all
possibilities for T5 and T6). That is, through graph theoretical
considerations we are led to the possible existence of isomers. Which of
the possible carbon trees Tn appear in reality, is being decided by
chemical and physical properties of CnH2n+2. However, graph theory
can say how many isomers of the form CnH2n+2
may possibly exist.
We finish by remarking that trees also play an important role
in areas like computer science and coding theory.
Please contact the webmaster
for any comments on these pages.