## Class Implementation of a Binary Tree

As with the other data structures we've been discussing up to now, we will encapsulate a standard binary tree as a class. Indeed, we've already made a start by showing various methods of the finished class. Ideally, as we did with linked lists, for example, we don't want to bore the user of the class with the structure of nodes (it allows us to alter their structure without inconveniencing the user of the class), but with these ordinary binary trees we have to assume some knowledge of the node...

## Queues Using Linked Lists

Essentially we have to mimic the standard supermarket line with a linked list pretty easy, since the linked list is a line anyway. We just have to add items to one end of the linked list and remove them from the other. If we want to use a singly linked list, we have the decision made for us we dequeue from the front of the list and enqueue at the end of the list. With a doubly linked list we can use either end for either purpose, but we'd use more memory in the process. And, again, as it...

## Hash Functions

The first algorithm we need to discuss is the hash function. This is the routine that will take the key for an item and magically transform it into an index value. If the hash table has room for n items, then the hash function obviously has to produce index values that lie in the range 0 to n-1 (we shall assume zero-based index values, as usual). Since I've not really stated what type an item key may be, it's fairly clear that we'll have different hash functions for different key types. The...

## Ttd ObjectList Class

What we shall do at this juncture is create a new list class that works like TList does, but with two differences. It will store instances of some class (or descendants of it), and it will destroy the objects it contains whenever necessary. In other words, a specialized list that circumvents the two problems mentioned above. This class is the TtdObjectList class. It differs from the TObjectList in Delphi 5 and later versions in that it is typesafe. This class will not be a descendant of TList....

## Double Hashing

The final open-addressing scheme we will consider is that of double hashing. This is actually the most successful of the alternative open-addressing schemes. Here, we hash the item's key to an index value call it h1. Probe that slot. If it is occupied, we hash the key with another, totally separate and independent hashing algorithm to give another index value call this one h2. We probe slot h1+h2. If this is occupied, we probe slot h1 + 2h2, then h1 + 3h2, and so on (obviously, all calculations...

## Linked Lists

Looking at the code in Listing 4.9, it seems doubtful that binary search would ever work for a linked list, unless we do a bunch of indexed accesses into the linked list, which we saw was a bad idea in Chapter 3. As it happens, it's not too difficult. Firstly, we should recognize that in general, following a link in the list is going to be much faster than calling a comparison routine. Hence, we can characterize following a link as good, but calling the comparison routine as bad. This means...

## Results of Applying Tests

On the book's CD is a test program that applies each of these tests to the standard Delphi random number generator and to the minimal standard random number generator. Figure 6.1 shows the result of one of these tests on the Delphi generator. Figure 6.1 Testing the Delphi generator Minimal Standard General Combined Generator Additive Generator Distribution over (0.0, 0.0) - (0.001,1.0) Minimal Standard General Combined Generator Additive Generator As you can see, this particular test shows that...

## Middle Square Method

The history of random number generators starts off with one of the most illustrious names in computing John von Neumann. He put forward the following scheme for calculating random number sequences in about 1946 take an N-digit number, square it and from the result (expressed as a 2N-digit number, padded on the left with zeros if required) take the middle N-digits as the next number in the sequence. If we take N as 4, for example, and have 1234 as our starting point, the next few numbers in the...

## Red Black Trees

Having looked at rotations, zig-zags, and zig-zigs, and familiarized ourselves with the reorganization of binary search trees with splay trees, we shall now investigate a proper balancing algorithm. What should a balancing algorithm do Ideally, it should ensure that the path from every leaf to the root is exactly the same length, plus or minus one. In practice, this rigorous requirement is somewhat difficult to enforce (AVL trees use this definition and their balancing algorithm enforces the...

## Arrays on Disk

One application of arrays that many books gloss over is arrays on disk, that is, files of fixed length records. This type of array has its own peculiarities that deserve some discussion, after which we will write a class embodying a record file (or data file). With persistent arrays, the array is generally known as a data file, or file of records, and the elements or items in the array are known as records. The index of an item in the array is known as the record number. Pascal has always...

## Insertion and Deletion with a Binary Tree

If we are to use a binary tree in earnest we shall have to consider how to add items (i.e., nodes) to the tree, how to delete items from the tree, and how to visit all the items in the tree. The latter operation will enable us to search for a particular item. Since we can't do the latter two operations without considering the first, let's start by discussing how to insert a node into a binary tree. To be able to insert a node in a binary tree, we must select a parent node to which we can attach...

## Linear Congruential Method

The next big step forward in random number generators came from D.H. Lehmer in 1949, in the process hammering several nails into the coffin of the middle-square method, if not all of them. What he proposed is known as the linear congruential method for generating random number sequences. Choose three numbers, m, a, and c, and a starting seed X0 and use the following formula to generate a sequence of numbers Xi The operation mod m is calculated as the remainder after dividing by m, for example,...

## Binary Search Trees

Although binary trees are interesting data structures in their own right, people generally use binary trees containing items in a sorted form. Such binary trees are known as binary search trees. In a binary search tree, each node has a key. (In the binary search trees we build in this chapter, the key is assumed to be part of the item we'll be inserting into the tree. We will use a TtdCompareFunc routine to compare two items, and therefore their keys.) An ordering is applied to all of the nodes...

## Arrays

Arrays are the easiest implementation of a set of items we can search sequentially. There are two cases one, the array's items are not in any particular order, and two, the items are sorted. Let's take the unsorted case first. If the array is unsorted, there is only one algorithm to use to find a particular item visit every item in the array, and compare it with the one we want. Usually this is coded as a For loop. As an example, let's search for the value 42 in an array of 100 integers MyArray...

## Random Number Generation

One of the first things we need to consider is the definition of random number. Without a good definition of such a beast, we'd be shooting blindfolded in trying to design or write a random number generator. Is the number 2 a random number Well, on its own, devoid of context, you can't really say one way or another. If you throw a die once, it may come up 2. But that singular event doesn't really tell us anything. It could be just pure luck that it came up 2, or it could be that all six sides...

## Analysis of Algorithms

Let's look at the two possible searches for John Smith in an array the sequential search and the binary search. We'll implement both algorithms and then play with them in order to ascertain their performance attributes. Listing 1.1 is the simple sequential search. Listing 1.1 Sequential search for a name in an array function SeqSearch(aStrs PStringArray aCount integer const aName string5) integer if CompareText(aStrs i , aName) 0 then begin Result i Exit end Result -1 end Listing 1.2 shows the...

## Stacks Using Arrays

Having seen the linked list version, let's consider how to implement the stack with an array. One reason we do this is that, many times, implementing a stack of some simple type (for example, characters or floating-point double values) is most efficiently implemented with an array. For simplicity's sake, we'll use a TList as our array in other words, we'll be implementing a stack of pointers. In the linked list version, we inserted the new node during a push operation to the front of the list,...

## Chi Squared Tests

Imagine that we have a pair of coins that we think someone has tampered with. How could we prove that they were biased Of course, one putative crook might be completely dumb and have weighted them to always show heads, but he'd have been caught long ago, leaving a master crook full rein. Let's throw the coins 100 times, say, and plot the number of times we toss individual scores in a table. Our table might look like Table 6.1. Table 6.1 The results of tossing a pair of biased coins 100 times...

## What Are These Bizarre IFDEFs in the Code

The code for this book has been written, with certain noted exceptions, to compile with Delphi 1, 2, 3, 4, 5, and 6, as well as with Kylix 1. (Later compilers will be supported as and when they come out please see http www.boyet.com dads for the latest information.) Even with my best efforts, there are sometimes going to be differences in my code between the different versions of Delphi and Kylix. The answer is, of course, to IFDEF the code, to have certain blocks compile with certain compilers...

## Dynamic Arrays

More often than not, when you are programming some routine that requires an array, you don't know how many elements you want in that array. It may be ten, one hundred, or a thousand, but certainly it's only at run time that you can come up with the answer. Furthermore, because you don't know, it's hard to declare the array as a local variable declaring the maximum size you may encounter might put strains on the stack, especially in Delphi 1 it certainly makes sense to allocate it on the heap....

## Deletion from a Red Black Tree

Compared with insertion, deletion from a red-black tree is filled with special cases, and can be difficult to follow. As usual with binary search trees, we start off by searching for the node to be deleted. Like before, we'll have three initial cases the node has no children or, in red-black tree terms, both of its children are external nodes the node has one actual child and one external child and, finally, the node has two real children. We delete the node in exactly the same way as we did...

## Binary Search Tree Rearrangements

When discussing the binary search tree, I stated that adding items to a binary tree could make it woefully unbalanced, sometimes even degenerating into a long spindly tree like a linked list. The problem with this degeneration is not that the binary search tree stops functioning properly items are still being stored in sorted order , it's that the tree's good efficiency takes a fatal knock if it happens. For a perfectly balanced tree on average, all parent nodes have two children, and all...

## Using State Machines Parsing

An example will help us understand the process. Let us suppose we wish to devise an algorithm that would extract the individual words in a string of text. We'll put the words we extract into a string list. Moreover, there's a wrinkle we would like quoted text inside the string to be considered as a single word. So, if we had the string He said, State machines the routine would ignore the punctuation and white space and return Notice that the white space and punctuation inside the quoted text...

## Deterministic and Nondeterministic State Machines

Now that we have seen a couple of fairly complex state machines and are more familiar with them, I will introduce a couple of new terms. The first is automaton plural automata . This is nothing more than another name for a state machine, but it is used extensively in computer science classes and textbooks. A finite state machine or finite automaton is merely a state machine whose number of states is not infinite. Both of our examples are finite automata the first has three states and the second...

## Insertion with a Binary Search Tree

For a user of a binary search tree, we can make the insert operation a lot easier to use all that has to be provided is the item itself. No longer does the user have to worry about which node becomes the parent and as which child the new node is added. Instead, the binary search tree can do it all, hiding all the details, by using the ordering of the items within the tree as a guide. In fact, it is relatively easy to insert a new item into a binary search tree, and we've seen the majority of...

## Collision Resolution through Bucketing

There is a variant of the chaining method for collision resolution called bucketing. Instead of having a linked list at each slot, there is a bucket, essentially a fixed size array of items. When the hash table is created, we have to allocate the bucket for each slot and mark all the elements in each bucket as empty. To insert an item, we hash the key for the item to find the slot number. We then look at each element in the bucket until we find one that is marked as empty and set it to the item...

## Simple Hash Function for Strings

The argument with the previous idea would seem to indicate that we should weight each character according to its position in the string to avoid collisions when using anagrams as keys. This results in the following implementation the source code can be found in TDHshBse.pas on the CD . Listing 7.1 Simple hash function for string keys function TDSimpleHash const aKey string i integer Hash longint begin Hash 0 Hash Hash 17 ord aKey i mod aTableSize Result Hash if Result lt 0 then inc Result,...

## Queues Using Arrays

Now, let's consider how to implement a queue with an array. Again, to simplify things, we'll use a TList at least we won't have to worry about memory allocation or array growing issues. Having seen the linked list version, your first impulse might be to call Add to append items to the end of a TList instance for the enqueue operation, and to call Delete to remove the first item in the TList for the dequeue operation or vice versa insert at the front of the list, delete from the end . However,...

## Searching through a Skip List

If you look again at Figure 6.3, you'll notice that you can characterize the list as being several singly and doubly linked lists merged together. There is a doubly linked list at level 0, a singly linked list at level 1 that skips over single nodes i.e., it links every second node , another singly linked list at level 2 that skips over three nodes i.e., it links every fourth node , and another singly linked list at level 3 that skips over seven nodes i.e., it links every eighth node . So, to...

## The Singly Linked List Class

Before delving into the design of the singly linked list class TtdSingleLinkList, a couple of notes are in order. First things, first. As I said earlier, it would be nice to be able to use the linked list without having to worry about nodes. Like the TList, we would like to have the class accept untyped pointers. To be able to access items in the linked list, we'd certainly like to use an index although, as I pointed out this can be slow , but better still would be to borrow some database...

## Class Implementation of a Splay Tree

The TtdSplayTree class is a simple decendant of the TtdBinarySearchTree class, overriding the Delete, Find and Insert methods and declaring new internal methods to splay and promote a node. Listing 8.18 shows the interface to this class. Listing 8.18 The interface to TtdSplayTree Listing 8.18 The interface to TtdSplayTree TtdSplayTree class TtdBinarySearchTree function stPromote aNode PtdBinTreeNode procedure stSplay aNode PtdBinTreeNode procedure Delete aItem pointer override function Find...

## Bubble Sort

The first sort that programmers come across when they're learning the tricks of the trade is the bubble sort. This is a shame, since, of all the sorts, it tends to be the slowest. It is perhaps the easiest to code though not by much . Bubble sort goes a bit like this. Fan your cards out. Look at the twelfth and thirteenth cards on the right side. If the twelfth is larger than the thirteenth, swap them. Now look at the eleventh and the twelfth cards. If the eleventh is bigger than the twelfth,...

## Completing Heapsort

Having ordered an array into a heap, what then Removing the items one by one still means we need somewhere to put them in sorted order, presumably some auxiliary array. Or does it Think about it for a moment. If we peel off the largest item, the heap size reduces by one, leaving space at the end of the array for the item we just removed. In fact, the algorithm to remove an item from a heap requires the lowest, rightmost node to be copied to the root before being trickled down, so all we need to...

## Insertion into a Skip List

Having seen how to search for an item in a pre-existing skip list, we should consider how to build one by inserting items. Looking back at Figure 6.3, it looks like an impossible task to build this extremely regular structure through a series of unknown item insertions and deletions. The cleverness of Pugh's insertion algorithm is that he realized that it was impossible or rather, much too long-winded and time-consuming to build the completely regular structure, so he proposed building a skip...

## Unit Testing

Unit testing is the process of testing parts of your program divorced from the program itself. One of the new development methodologies being discussed at the time of this writing is extreme programming 3 . This methodology espouses a number of recommendations, some of which are fairly contentious, but at least one of them makes excellent sense. The recommendation is to write a test at the time you write a method of a class. If the method seems to require more than one test, then you do so....

## Data Alignment

Another aspect of the hardware that we must take into account is that of data alignment. Current CPU hardware is built to always access data from the cache in 32-bit chunks. Not only that, but the chunks it requests are always aligned on a 32-bit boundary. This means that the memory addresses passed to the cache from the CPU are always evenly divisible by four 4 bytes being 32 bits . It's equivalent to the lower two bits of the address being clear. When 64-bit or larger CPUs become more...

## Efficiency Considerations

If that were all there was to say about linked lists, this would be a very short chapter. We'd just present a class encapsulation of a singly linked list and move on. However, there is more to be said before writing a linked list class, especially with regard to efficiency. Look again at the code for insertion and deletion. Doesn't it strike you as messy to have separate cases for both operations It does me. We have to have this special code in order to cope with processing the first node in...

## Virtual Memory and Paging

The first performance bottleneck we should understand is virtual memory paging. This is easier to understand with 32-bit applications, and, although 16-bit applications suffer from the same problems, the mechanics are slightly different. Note that I will only be talking in layman's terms in this section my intent is not to provide a complete discussion of the paging system used by your operating system, but just to provide enough information so that you conceptually understand what's going on....

## Inserting and Deleting from a Doubly Linked List

How do we insert a new node into a doubly linked list For a singly linked list we had to break a single link and then form two new links to insert a node, so for a doubly linked list we have to break two links and form four new ones. We can insert either before or after a node in the list because the Prior pointers make it easy to traverse the list in either direction. Indeed, an insert before operation can be coded as a move back one node, insert after operation, so we'll just consider the...

## Compare Routines

The very act of searching for an item in a set of items requires us to be able to differentiate one item from another. If we cannot tell the difference between any two items in general, there's no point in trying to look for one in particular. So the first hurdle we must overcome is how to tell the difference, how to compare two items that we may come across. There are two types of comparison we could make the first is for unsorted lists of items, where we only need to know whether one item is...

## The PJW Hash Functions

During the discussion on hash tables, the Dragon Book Compilers Principles, Techniques, and Tools by Aho, et al 2 shows a hash function by PJ. Weinberger this routine is also known as the Executable and Linking Format ELF hash . This follows a similar algorithm to the routine in Listing 7.1, except it throws in a randomizing effect where the topmost nibble of the longint hash work variable the nibble that is going to disappear due to the overflow from the next multiplication , if it is...

## Stacks Using Linked Lists

With a linked list implementation of a stack, the push operation is coded as inserting a new node at the front of the list. The pop operation is coded as deleting the node at the front of the list and returning the data. Neither operation depends on the number of items in the list so we can categorize both as O 1 operations. That's it, we're done with the design. Of course, implementing this design involves a little more decision-making. We could code a stack class either to descend from the...

## The Linear Probe Hash Table Class

Listing 7.3 shows the interface for our linear probe hash table the entire source code for this class can be found in TDHshLnPpas on the CD . There are a couple of things to note about this implementation. First, we make the convention that the key for an item is a string, separate from the item itself. This makes the underlying code a lot easier to understand, and also makes designing and using the hash table easier. In the vast majority of cases, keys will be strings anyway, and converting...

## Extendible Hashing

The algorithm we need to use is called extendible hashing, and to use it we need to go back to square one with our hash function. Originally, we knew the size of our hash table and so, when we hashed a key, we would then immediately mod it with the table size and use the result as an index into our hash table. With extendible hashing, on the other hand, we do not know how big our hash table is, since it will grow whenever required to avoid a bucket overflow. In previous versions of our hash...

## Implementation of the Extended Priority Queue

As far as the user of the priority queue is concerned, this new interface is only slightly more complicated than before. Listing 9.9 shows the class interface to TtdPriorityQueueEx, the extended priority queue. Listing 9.9 The TtdPriorityQueueEx class interface Listing 9.9 The TtdPriorityQueueEx class interface procedure pqTrickleDown aHandle TtdPQHandle procedure ChangePriority aHandle TtdPQHandle function Enqueue aItem pointer TtdPQHandle function Remove aHandle TtdPQHandle pointer property...

## Class Implementation of a Binary Search Tree

As usual, we shall encapsulate a binary search tree as a class, although I would warn you again that you should only use it if you can be sure that the items being inserted are sufficiently random or small in number that the tree doesn't degenerate into a spindly affair. What we are trying to achieve with a binary search tree class is hiding the mechanics of the tree from the user. This means that the user should be able to use the class to keep a set of items in sorted order, and traverse...

## Hashing and Hash Tables

In Chapter 4, we looked at searching algorithms for finding an item in an array for example, a TList or in a linked list. The fastest general method we discussed was binary search, which required a sorted container. Binary search is a O log n algorithm. Thus, for 1,000 items, we'd need approximately 10 comparisons to either find a given item in a list, or to discover that it wasn't actually there since 210 1024 . Can we do better If we had to rely on a comparison function to help us identify an...

## The Doubly Linked List Class

The interface to the TtdDoubleLinkList class is as follows Listing 3.13 The TtdDoubleLinkList class Listing 3.13 The TtdDoubleLinkList class function dllGetItem aIndex longint pointer procedure dllSetItem aIndex longint altem pointer procedure dllError aErrorCode integer const aMethodName TtdNameString class procedure dllGetNodeManager procedure dllPositionAtNth aIndex longint public constructor Create aDispose TtdDisposeProc destructor Destroy override procedure dllSetItem aIndex longint altem...

## Advantages and Disadvantages of Linear Probing

In general, if there are few occupied slots in the hash table, we'd expect most searches, whether successful or unsuccessful, to take just one or two probes. Once the table gets pretty loaded with items, the number of empty slots would be very few, so we'd expect unsuccessful searches to take very many probes, even as much as n-1 probes if there was only one empty slot. In fact, if we are using an open addressing scheme like linear probing, it makes sense to ensure that the hash table doesn't...

## Deleting Items from a Linear Probe Hash Table

Before we look at some code, let us discuss deleting items from our hash table. It seems easy enough hash the key for the item to delete, find it using as many linear probes as required , and then mark the slot as empty. Unfortunately, this simplistic method causes some problems. Suppose that our hash function hashes the keys Smith, Jones, and Brown to 42, 42, and 43 respectively. We add them in that order to an empty hash table, to result in the following situation In other words, Smith...

## Other Random Number Distributions

If we are using random numbers to simulate a process, we'll find that the generators we've discussed so far are probably not up to the task. This is because they all produce a uniform distribution of random numbers each random number is equally likely to appear as any other. If we were performing some kind of simulation, we'd probably require another probability distribution altogether. It turns out that we can use the random number generators we've been discussing up to now to calculate...

## Inserting into Sorted Containers

If we wish to have a sorted array or linked list, we have a choice as to how to maintain the order. We can either add all the items we want to the container and sort them, and resort the container every time we add a new item, or we can perform some extra work when we insert an item to ensure that it is added in the correct place to maintain the order. If we are going to use the container in a sorted fashion more often than not, it makes sense to try and maintain the order during insertion of a...

## Implementation of a Priority Queue with a Heap

Listing 9.3 shows the interface for our final priority queue which uses a heap implemented by a TList. Listing 9.3 TtdPriorityQueue class interface property Count integer read pqGetCount property Name TtdNameString read FName write FName The Create constructor and Destroy destructor are both fairly simple to implement the former has to create the internal TList instance, and the latter just needs to free the internal TList. Like a standard queue, Create requires an item disposal routine so that...

## Overview of the TList Class

The TList class stores pointers in an array format. The pointers can be anything pointers to records, strings, or objects. Methods are provided to add and delete items, to find items in the list, to rearrange the items in the list, and, in later compiler versions, to sort the items in the list. Like an array a TList object can be used with the operator. Since the Items property is marked default you can code MyList i instead of MyList.Items i to access pointer number i in the list. TList...

## Hash Tables on Disk

The controllers for permanent storage devices such as disks, floppies, Zip drives, and tapes are designed to read and write data a block at a time. These blocks are usually some power-of-two bytes in size, like 512 bytes, 1,024 bytes, or 4,096 bytes. Since the controller must read a complete block even when it only needs a few bytes, it makes sense to capitalize on this behavior. Suppose you wish to write an application that uses a large number of records stored on disk. The records are to be...

## Preorder Inorder and Postorder Traversals

Before describing the other three traversal algorithms, which are all inter-related, let's define a binary tree in a different fashion. A binary tree consists of a root node with pointers to the root nodes of two other binary trees, known as the children. The pointers to either or both children could be nil. This definition describes a binary tree very succinctly, albeit recursively, yet it provides an ideal way to define the other three traversals. A pre-order traversal visits the root node,...

## Quicksort

The final algorithm we shall consider in this chapter is quicksort. There is one other in-memory sort we shall consider in this book, heapsort, but this requires some extra knowledge about a data structure called a heap before we can successfully discuss it. We ll defer heapsort to Chapter 9. Quicksort was invented by C.A.R. Hoare in 1960, and is probably even more famous than bubble sort. It is the most widely used sorting algorithm in programming these days, mainly because of some desirable...

## The Chained Hash Table Class

The public interface to the TtdHashTableChained class is roughly the same as that for the TtdHashTableLinear class, the differences between the classes manifesting themselves in the private and protected sections. Listing 7.11 The TtdHashTableChained class hcuFirst, ..insert at the beginning -a hash table that uses chaining to resolve collisions procedure htcSetMaxLoadFactor aMLF integer procedure htcAllocHeads aTable TList procedure htcAlterTableSize aNewTableSize...

## Merge Sort with Linked Lists

The final sort we shall consider in this chapter is merge sort again, but this time with linked lists. Recall that, although it s a speedy sort O nlog n using merge sort with arrays suffered from a requirement for a auxiliary array, at least half the size of the array being sorted. The reason for this is that the merge phase of the sort needed somewhere to put the items without having to perform some kind of insert operation. With linked lists, merge sort does not require this auxiliary array...

## First Simple Implementation

In designing a priority queue, the first attribute storing an arbitrary number of items would indicate the use of some extensible data structure like a linked list or an extensible array, such as a TList. Let s use, for now at least, a TList. The next attribute adding an item to the queue is easy with a TList we just call the TList s Add method. We ll make the assumption that the items we add to the priority queue will be objects of some description, with their priority as a property of the...