** An Extensive Examination of Data Structures **
** Part 2: The Queue, Stack, and Hashtable **
Scott Mitchell
4GuysFromRolla.com
November 2003
** Summary: ** This article, the second in a six-part series on data structures in the .NET Framework, examines three of the most commonly studied data structures: the Queue, the Stack, and the Hashtable. As we'll see, the Queue and Stack are specialized ArrayLists, providing storage for a variable number of objects, but restricting the order in which the items may be accessed. The Hashtable provides an array-like abstraction with greater indexing flexibility. Whereas an array requires that its elements be indexed by an ordinal value, Hashtables allow items to be indexed by any object. (17 printed pages)
** Contents **
Introduction
Providing First Come, First Served Job Processing
A Look at the Stack Data Structure: First Come, Last Served
The Limitations of Ordinal Indexing
The System.Collections.Hashtable Class
Conclusion
** Introduction **
In Part 1: An Extensive Examination of Data Structures , we looked at what data structures are, how their performance can be evaluated, and how these performance considerations play into choosing which data structure to utilize for a particular algorithm. In addition to reviewing the basics of data structures and their analysis, we also looked at the most commonly used data structure, the array.
The array holds a set of homogeneous elements indexed by ordinal value. The actual contents of an array are laid-out in memory as a contiguous block, thereby making reading from or writing to a specific array element very fast.
Two downsides of arrays are their homogeneity and the fact that their size must be specified explicitly. To combat this, the .NET Framework Base Class Library offers an ArrayList data structure, which provides a heterogeneous collection of elements that needs not be explicitly resized. As discussed in the previous article, the ArrayList works by using an array of type object . Each call to the ArrayList's Add() method checks to see if an internal object array can hold any more elements, and if not, the array is automatically doubled in size.
In this second installment of the article series, we'll continue our examination of array-like data structures by first examining the Queue and Stack. These two data structures, like the ArrayList, hold a contiguous block of heterogeneous elements. However, both the Queue and Stack place limitations on how their data can be accessed.
Following our look at the Queue and Stack, we'll spend the rest of this article digging into the Hashtable data structure. A Hashtable, which is sometimes referred to as an associative array, stores a collection of heterogeneous elements, but indexes these elements by an arbitrary object (such as a string), as opposed to an ordinal index.
** Providing First Come, First Served Job Processing **
If you are creating any kind of computer service—that is, a computer program that can receive multiple requests from multiple sources for some task to be completed—then part of the challenge of creating the service is deciding the order with which the incoming requests will be handled. The two most common approaches used are:
· ???????????????????? First come, first served
· ???????????????????? Priority-based processing
First come, first served is the job-scheduling task you'll find at your grocery store, the bank, and the DMV. Those waiting for service stand in a line. The people in front of you are served before you, while the people behind you will be served after. Priority-based processing serves those with a higher priority before those with a lesser priority. For example, a hospital emergency room uses this strategy, opting to help someone with a potentially fatal wound before someone with a less threatening wound, regardless of who arrived first.
Imagine that you need to build a computer service and that you want to handle requests in the order in which they were received. Since the number of incoming requests might happen quicker than you can process them, you'll need to place the requests in some sort of buffer that can reserve the order with which they have arrived.
One option is to use an ArrayList and an integer variable called nextJobPos to indicate the array position of the next job to be completed. When each new job request comes in, simply use the ArrayList's Add() method to add it to the end of the ArrayList. Whenever you are ready to process a job in the buffer, grab the job at the nextJobPos position in the ArrayList and increment nextJobPos . The following program illustrates this algorithm:
using System;
using System.Collections;
?
public class JobProcessing
{
?? private static ArrayList jobs = new ArrayList();
?? private static int nextJobPos = 0;
??
?? public static void AddJob(string jobName)
?? {
????? jobs.Add(jobName);
?? }
??
?? public static string GetNextJob()
?? {
????? if (nextJobPos > jobs.Count - 1)
???????? return "NO JOBS IN BUFFER";
????? else
????? {
???????? string jobName = (string) jobs[nextJobPos];
???????? nextJobPos++;
???????? return jobName;
????? }
?? }
??
?? public static void Main ()
?? {
????? AddJob("1");
????? AddJob("2");
????? Console.WriteLine(GetNextJob());
????? AddJob("3");
Console.WriteLine(GetNextJob());
????? Console.WriteLine(GetNextJob());
????? Console.WriteLine(GetNextJob());
????? Console.WriteLine(GetNextJob());
????? AddJob("4");
????? AddJob("5");
????? Console.WriteLine(GetNextJob());
?? }
}
The output of this program is as follows:
1
2
3
NO JOBS IN BUFFER
NO JOBS IN BUFFER
4
While this approach is fairly simple and straightforward, it is horribly inefficient. For starters, the ArrayList continues to grow unabated with each job that's added to the buffer, even if the jobs are processed immediately after being added to the buffer. Consider the case where every second a new job is added to the buffer and a job is removed from the buffer. This means that once a second the AddJob() method is called, which calls the ArrayList's Add() method. As the Add() method is continually called, the ArrayList's internal array's size is continually redoubled as needed. After five minutes (300 seconds) the ArrayList's internal array is dimensioned for 512 elements, even though there has never been more than one job in the buffer at a time! This trend, of course, continues so long as the program continues to run and the jobs continue to come in.
The reason the ArrayList grows to such ridiculous proportion is because the buffer locations used for old jobs is not reclaimed. That is, when the first job is added to the buffer, and then processed, the first spot in the ArrayList is ready to be reused again. Consider the job schedule presented in the previous code sample. After the first two lines— AddJob("1") and AddJob("2") —the ArrayList will look Figure 1.
** Figure 1. The ArrayList after the first two lines of code **
Note that there are 16 elements in the ArrayList at this point because, by default, the ArrayList, when initialized, creates its internal object array with 16 elements. Next, the GetNextJob() method is invoked, which removes the first job, resulting in something like Figure 2.
** Figure 2. Program after the GetNextJob() method is invoked **
When AddJob("3") executes, we need to add another job to the buffer. Clearly the first ArrayList element (index 0) is available for reuse. Initially it might make sense to put the third job in the 0 index. However, this approach can be forgotten by considering what would happen if after AddJob("3") we did AddJob("4") , followed by two calls to GetNextJob() . If we placed the third job in the 0 index and then the fourth job in the 2 index, we'd have the problem demonstrated in Figure 3.
** Figure 3. Issue created by placing jobs in the O index **
Now, when GetNextJob() was called, the second job would be removed from the buffer, and nextJobPos would be incremented to point to index 2. Therefore, when GetNextJob() was called again, the fourth job would be removed and processed prior to the third job, thereby violating the first come, first served order we need to maintain.
The crux of this problem arises because the ArrayList represents the list of jobs in a linear ordering. That is, we need to keep adding the new jobs to the right of the old jobs to guarantee that the correct processing order is maintained. Whenever we hit the end of the ArrayList, the ArrayList is doubled, even if there are unused ArrayList elements due to calls to GetNextJob() .
To fix this problem, we need to make our ArrayList circular. A circular array is one that has no definite start or end. Rather, we have to use variables to maintain the beginning and end of the array. A graphical representation of a circular array is shown in Figure 4.
** Figure 4. Example of a circular array **
With a circular array, the AddJob() method adds the new job in index endPos and then "increments" endPos . The GetNextJob() method plucks the job from startPos , sets the element at the startPos index to null , and "increments" startPos . I put the word increments in quotation marks because here incrementing is a trifle more complex than simply adding one to the variable's current value. To see why we can't just add 1, consider the case when endPos equals 15. If we increment endPos by adding 1, endPos equals 16. In the next AddJob() call, the index 16 will attempt to be accessed, which will result in an IndexOutOfRangeException .
Rather, when endPos equals 15, we want to increment endPos by resetting it to 0. This can either be done by creating an increment(variable) function that checks to see if the passed-in variable equals the array's size and, if so, reset it to 0. Alternatively, the variable can have its value incremented by 1 and then mod -ed by the size of the array. In such a case, the code for increment() would look like:
int increment(int variable)
{
? return (variable + 1) % theArray.Length;
}
** Note??? ** The modulus operator, % , when used like x % y , calculates the remainder of x divided by y . The remainder is always between 0 and y – 1 .
This approach works well if our buffer never has more than 16 elements, but what happens if we want to add a new job to the buffer when there are already 16 jobs present? Like with the ArrayList's Add() method, we'll need to redimension the circular array appropriately, by, say, doubling the size of the array.
** The System.Collections.Queue Class **
The functionality we have just described—adding and removing items to a buffer in first come, first served order while maximizing space utilization—is provided in a standard data structure, the Queue. The .NET Framework Base Class Library includes such a built-in class, the System.Collections.Queue class. Whereas our earlier code provided AddJob() and GetNextJob() methods, the Queue class provides identical functionality with its Enqueue() and Dequeue() methods, respectively.
Behind the scenes, the Queue class maintains an internal circular object array and two variables that serve as markers for the beginning and ending of the circular array: head and tail . By default, the Queue's initial capacity is 32 elements, although this is customizable when calling the Queue's constructor. Since the Queue is maintained with an object array, variables of any type can be queued up.
The Enqueue() method determines if there is sufficient capacity for adding the new item to the queue. If so, it merely adds it to the element of the circular to the tail index and then "increments" tail using the modulus operator to ensure that tail does not exceed the internal array's length. If there is insufficient space, the array is increased by a specified growth factor. This growth factor has a default value of 2.0, thereby doubling the internal array's size, but you can optionally specify this factor in the Queue class's constructor.
The Dequeue() method returns the current element from the head index. It also sets the head index element to null and "increments" head . For those times where you may want to just look at the head element, but not actually dequeue it, the Queue class also provides a Peek() method.
What is important to realize is that the Queue, unlike the ArrayList, does not allow random access. That is, you cannot look at the third item in the queue without dequeing the first two items. (However, the Queue class does have a Contains() method, so you can determine whether or not a specific item exists in the Queue.) If you know you will need random access, the Queue is not the data structure to use—the ArrayList is. The Queue is, however, ideal for situations where you are only interested in processing items in the precise order in which they were received.
** Note??? ** You may hear Queues referred to as FIFO data structures. FIFO stands for First In, First Out, and is synonymous to the processing order of first come, first served.
** A Look at the Stack Data Structure: First Come, Last Served **
The Queue data structure provides first come, first served access by internally using a circular array of type object . The Queue provides such access by exposing an Enqueue() and Dequque() methods. First come, first serve processing has a number of real-world applications, especially in service programs like Web servers, print queues, and other programs that handle multiple incoming requests.
Another common processing scheme in computer programs is first come, last served. The data structure that provides this form of access is known as a Stack. The .NET Framework Base Class Library includes the System.Collections.Stack class, which, like the Queue class, maintains its elements internally using a circular array of type object . The Stack class exposes its data through two methods— Push(item) , which adds the passed-in item to the stack, and Pop() , which removes and returns the item at the top of the stack.
A Stack can be visualized graphically as a vertical collection of items. When an item is pushed onto the stack, it is placed on top of all other items. Popping an item removes the item from the top of the stack. The following two figures graphically represent a stack first after items 1, 2, and 3 have been pushed onto the stack in that order, and then after a pop.
** Figure 5. Graphical representation of a stack with three items ** </FO