Data Structures with .NET - Part 5: From Trees to Graphs

** An Extensive Examination of Data Structures **

** Part 5: From Trees to Graphs **

Scott Mitchell
4GuysFromRolla.com

March 2004

** Summary: ** A graph, like a tree, is a collection of nodes and edges, but has no rules dictating the connection among the nodes. In this fifth part of the article series, we'll learn all about graphs, one of the most versatile data structures. (26 printed pages)

Download the Graphs.msi sample file .

** Contents **

Introduction
Examining the Different Classes of Edges
Creating a C# Graph Class
A Look at Some Common Graph Algorithms
Conclusion
Related Books

** Introduction **

Part 1 and Part 2 of this article series focused on linear data structures—the array, the ArrayList, the Queue, the Stack, and the Hashtable. In Part 3 , we began our investigation of trees. Recall that trees consist of a set of nodes , where all of the nodes share some connection to other nodes. These connections are referred to as edges . As we discussed, there are numerous rules as to how these connections can occur. For example, all nodes in a tree except for one—the root—must have precisely one parent node, while all nodes can have an arbitrary number of children. These simple rules ensure that, for any tree, the following three statements will hold true:

1. ????????????????? Starting from any node, any other node in the tree can be reached. That is, there exists no node that can't be reached through some simple path.

2. ????????????????? There are no cycles . A cycle exists when, starting from some node v , there is some path that travels through some set of nodes v 1 , v 2 , ..., v k that then arrives back at v .

3. ????????????????? The number of edges in a tree is precisely one less than the number of nodes.

In Part 3 we focused on binary trees , which are a special form of trees. Binary trees are trees whose nodes have at most two children.

In this fifth installment of the article series we're going to examine graphs . Graphs are composed of a set of nodes and edges, just like trees, but with graphs there are no rules for the connections between nodes. With graphs, there is no concept of a root node, nor is there a concept of parents and children. Rather, a graph is a collection of interconnected nodes.

** Note??? ** Realize that all trees are graphs. A tree is a special case of a graph in which all nodes are reachable from some starting node and one that has no cycles.

Figure 1 shows three examples of graphs. Notice that graphs, unlike trees, can have sets of nodes that are disconnected from other sets of nodes. For example, graph (a) has two distinct, unconnected sets of nodes. Graphs can also contain cycles. Graph (b) has several cycles. One cycle is the path from v 1 to v 2 to v 4 and back to v 1 . Another one is from v 1 to v 2 to v 3 to v 5 to v 4 and back to v 1 . (There are also cycles in graph (a).) Graph (c) does not have any cycles, as it has one less edge than it does number of nodes, and all nodes are reachable. Therefore, it is a tree.

** Figure 1. Three examples of graphs **

Many real-world problems can be modeled using graphs. For example, search engines like Google model the Internet as a graph, where Web pages are the nodes in the graph and the links among Web pages are the edges. Programs like Microsoft MapPoint that can generate driving directions from one city to another use graphs, modeling cities as nodes in a graph and the roads connecting the cities as edges.

** Examining the Different Classes of Edges **

Graphs, in their simplest terms, are a collection of nodes and edges, but there are different kinds of edges:

· ???????????????????? Directed versus undirected edges

· ???????????????????? Weighted versus unweighted edges

When talking about using graphs to model a problem, it is important to indicate the class of graph with which you are working. Is it a graph whose edges are directed and weighted, or one whose edges are undirected and weighted? In the next two sections, we'll discuss the differences between directed and undirected edges and weighted and unweighted edges.

** Directed and Undirected Edges **

The edges of a graph provide the connections between one node and another. By default, an edge is assumed to be bidirectional. That is, if there exists an edge between nodes v and u , it is assumed that one can travel from v to u and from u to v . Graphs with bidirectional edges are said to be undirected graphs because there is no implicit direction in their edges.

For some problems, though, an edge might infer a one-way connection from one node to another. For example, when modeling the Internet as a graph, a hyperlink from Web page v linking to Web page u would imply that the edge between v to u would be unidirectional. That is, that one could navigate from v to u , but not from u to v . Graphs that use unidirectional edges are said to be directed graphs .

When drawing a graph, bidirectional edges are drawn as a straight line, as shown in Figure 1. Unidirectional edges are drawn as an arrow, showing the direction of the edge. Figure 2 shows a directed graph where the nodes are Web pages for a particular Web site and a directed edge from u to v indicates that there is a hyperlink from Web page u to Web page v . Notice that both u links to v and v links to u , two arrows are used—one from v to u and another from u to v .

** Figure 2. Model of pages making up a website **

** Weighted and Unweighted Edges **

Typically graphs are used to model a collection of "things" and their relationship among these "things." For example, the graph in Figure 2 modeled the pages in a website and their hyperlinks. Sometimes, though, it is important to associate some cost with the connection from one node to another.

A map can be easily modeled as a graph, with the cities as nodes and the roads connecting the cities as edges. If we wanted to determine the shortest distance and route from one city to another, we first need to assign a cost from traveling from one city to another. The logical solution would be to give each edge a weight , such as how many miles it is from one city to another.

Figure 3 shows a graph that represents several cities in southern California. The cost of any particular path from one city to another is the sum of the costs of the edges along the path. The shortest path, then, would be the path with the least cost. In Figure 3, for example, a trip from San Diego to Santa Barbara is 210 miles if driving through Riverside, then to Barstow, and then to Santa Barbara. The shortest trip, however, is to drive 100 miles to Los Angeles, and then another 30 miles up to Santa Barbara.

** Figure 3. Graph of California cities with edges valued as miles **

Realize that the directionality and weight of edges are orthogonal . That is, a graph can have one of four arrangements of edges:

· ???????????????????? Directed, weighted edges

· ???????????????????? Directed, unweighted edges

· ???????????????????? Undirected, weighted edges

· ???????????????????? Undirected, unweighted edges

The graphs in Figure 1 had undirected, unweighted edges. Figure 2 had directed, unweighted edges, and Figure 3 used undirected, weighted edges.

** Sparse Graphs and Dense Graphs **

While a graph could have zero or a handful of edges, typically a graph will have more edges than it has nodes. What's the maximum number of edges a graph could have, given n nodes? It depends on whether the graph is directed or undirected. If the graph is directed, then each node could have an edge to every other node. That is, all n nodes could have n – 1 edges, giving a total of n * ( n – 1) edges, which is nearly n 2 .

** Note??? ** For this article, I am assuming nodes are not allowed to have edges to themselves. In general, though, graphs allow for an edge to exist from a node v back to node v . If self-edges are allowed, the total number of edges for a directed graph would be n 2 .

If the graph is undirected, then one node, call it v 1 , could have an edge to each and every other node, or n – 1 edges. The next node, call it v 2 , could have at most n – 2 edges because an edge from v 2 to already exists. The third node, v 3 , could have at most n – 3 edges, and so forth. Therefore, for n nodes, there would be at most ( n – 1) + ( n – 2) + ... + 1 edges. As you might have guessed, summed up this comes to [ n * __ ( n -1)] / 2, or exactly half as many edges as a directed graph.

If a graph has significantly less than n 2 edges, the graph is said to be sparse . For example, a graph with n nodes and n edges, or even 2 n edges would be said to be sparse. A graph with close to the maximum number of edges is said to be dense .

When using graphs in an algorithm it is important to know the ratio between nodes and edges. As we'll see later on in this article, the asymptotic running time operations performed on a graph is typically expressed in terms of the number of nodes and edges in the graph.

** Creating a C# Graph Class **

While graphs are a very common data structure used in a wide array of different problems, there is no built-in graph data structure in the .NET Framework. Part of the reason is because an efficient implementation of a Graph class depends on a number of factors specific to the problem at hand. For example, graphs are typically modeled in one of two ways:

· ???????????????????? Adjacency list

· ???????????????????? Adjacency matrix

These two techniques differ in how the nodes and edges of the graph are maintained internally by the Graph class. Let's examine both of these approaches and weigh the pros and cons of each method.

** Representing a Graph Using an Adjacency List **

In Part 3 we created a C# class for binary trees, called BinaryTree . Recall that each node in a binary tree was represented by a Node class. The Node class contained three properties:

· ???????????????????? ** Value ** , which held the value of the node, an object

· ???????????????????? ** Left ** , a reference to the Node 's left child

· ???????????????????? ** Right ** , a reference to the Node 's right child

Clearly the Node class and BinaryTree classes are not sufficient for a graph. First, the Node class for a binary tree allows for only two edges—a left and right child. For a more general graph, though, there could be an arbitrary number of edges emanating from a node. Also, the BinaryTree class contains a reference to a single node, the root. But with a graph, there is no single point of reference. Rather, the graph would need to know about all of its nodes.

One option, then, is to create a Node class that has as one of its properties an array of Node instances, which represent the Node 's neighbors. Our Graph class would also have an array of Node instances, with one element for each of the nodes in the graphs. Such a representation is called an adjacency list because each node maintains a list of adjacent nodes. Figure 4 depicts an adjacency list representation in graphical form.

** Figure 4. Adjacency list representation in graphical form **

Notice that with an undirected graph, an adjacency list representation duplicated the edge information. For example, in adjacency list representation (b) in Figure 4, the node a has b in its adjacency list, and node b also has node a in its adjacency list.

Each node has precisely as many Nodes in its adjacency list as it has neighbors. Therefore, an adjacency list is a very space-efficient representation of a graph. You never store more data than needed. Specifically, for a graph with V nodes and E edges, a graph using an adjacency list representation will require V + E Node instances for a directed graph and V + 2 E Node instances for an undirected graph.

While Figure 4 does not show it, adjacency lists can also be used to represent weighted graphs. The only addition is that for each Node n 's adjacency list, each Node instance in the adjacency list needs to store the cost of the edge from n .

The one downside of an adjacency list is that determining if there is an edge from some node u to v requires that u 's adjacency list be searched. For dense graphs, u will likely have many Node s in its adjacency list. Determining if there is an edge between two nodes, then, takes linear time for dense adjacency list graphs. Fortunately, when using graphs we'll likely not need to determine if there exists an edge between two particular nodes. More often than not, we'll want to simply enumerate all the edges of a particular node.

** Representing a Graph Using an Adjacency Matrix **

An alternative method for representing a graph is to use an adjacency matrix . For a graph with n nodes, an adjacency matrix is an n x n two-dimensional array. For weighted graphs the array element ( u , v ) would give the cost of the edge between u and v (or, perhaps -1 if no such edge existed between u and v) . For an unweighted graph, the array could be an array of Booleans, where a True at array element ( u , v ) denotes an edge from u to v and a False denotes a lack of an edge.

Figure 5 depicts how an adjacency matrix representation in graphical form.

** Figure 5. Adjacency matrix representation in graphical form **

Note that undirected graphs display symmetry along the adjacency matrix's diagonal. That is, if there is an edge from u to v in an undirected graph then there will be two corresponding array entries in the adjacency matrix: ( u , v ) and ( v , u ).

Since determining if an edge exists between two nodes is simply an array lookup, this can be determined in constant time. The downside of adjacency matrices is that they are space inefficient. An adjacency matrix requires an n 2 element array, so for sparse graphs much of the adjacency matrix will be empty. Also, for undirected graphs half of the graph is repeated information.

While either an adjacency matrix or adjacency list would suffice as an underlying representation of a graph for our Graph class, let's move forward using the adjacency list model. I chose this approach primarily because it is a logical extension from the Node and BinaryTree classes that we've already created together.

** Creating the Node Class **

The Node class represents a single node in the graph. When working with graphs, the nodes typically represent some entity. Therefore, our Node class contains a Data property of type object that can be used to store any sort of data associated with the node. Furthermore, we'll need some way to easily identify nodes, so let's add a string Key property, which serves as a unique identifier for each Node .

Since we are using the adjacency list technique to represent the graph, each Node instance needs to have a list of its neighbors. If the graph uses weighted edges, the adjacency list also needs to store the weight of each edge. To manage this adjacency list, we'll first need to create an AdjacencyList class.

** The AdjacencyList and EdgeToNeighbor classes **

A Node contains an AdjacencyList class, which is a collection of edges to the Node 's neighbors. Since an AdjacencyList stores a collection of edges, we first need to create a class that represents an edge. Let's call this class EdgeToNeighbor , since it models an edge that extends to a neighboring node. Since we might want to associate a weight with this edge, EdgeToNeighbor needs two properties:

· ???????????????????? ** Cost ** , an integer value indicating the weight of the edge

· ???????????????????? ** Neighbor ** , a Node reference

The AdjancencyList class, then, is derived from the System.Collections.CollectionBase class, and is simply a strongly-typed collection of EdgeToNeighbor instances. The code for EdgeToNeighbors and AdjacencyList is shown below:

** public class EdgeToNeighbor **

** { **

** ?? // private member variables **

** ?? private int cost; ????? **

** ?? private Node neighbor; **

** ? **

** ?? public EdgeToNeighbor(Node neighbor) : this(neighbor, 0) {} **

** ? **

** ?? public EdgeToNeighbor(Node neighbor, int cost) **

** ?? { **

** ????? this.cost = cost; **

** ????? this.neighbor = neighbor; **

** ?? } **

** ? **

<P class=MsoN