Homework Assignment #4
Due Thursday, December 6, 1:00 PM (Before Class)
65 points total

1. (10 pts) Insertion sort on small arrays in merge sort:

Although merge sort runs in O(nlgn) time, and insertion sort runs in O(n^2) time, the constant factors in insertion sort make it faster for small n. Thus, it makes sense to use insertion sort within merge sort when subproblems become sufficiently small. Consider a modification to merge sort in which n/k sublists of length k are sorted using insertion sort and then merged using the standard merging mechanism, where k is a value to be determined.

Show that the n/k sublists, each of length k, can be sorted by insertion sort in O(nk) worst-case time.
Show that the sublists can be merged in O(n lg(n/k)) worst-case time.
Given that the modified algorithm runs in O(nk + nlg(n/k)) worst case time, what is the largest asymptotic (O notation) value of k as a function of n for which the modified algorithm has the same asymptotic running time as standard merge sort?
How should k be chosen in practice?

2. (10 pts). Mystery Sort.

Analyze the following sorting algorithm (in Java) and determine its runtime. Hint: Output the array in the isSorted method. What would be an appropriate name for this sorting algorithm?

public class WeirdSort
{
    public static void main(String[] args)    // Sort a sample array
    {
		int a[] = {5, 3, 2, 6, 4};
		sort(a, a.length);
		// Output the sorted array
		for (int i = 0; i < a.length; i++)
			System.out.print(a[i] + " ");
		System.out.println();
	}

    // Returns true if the input is sorted.
    private static boolean isSorted(int a[])
    {
		for (int j = a.length-1; j > 0; j--)
		{
	      	if (a[j] < a[j-1])
				return false;
 	      }
		return true;
    }

    // Sorts the array a where n is an index into a starting at the end
    private static boolean sort(int[] a, int n)
    {
        if (n == 1)
            return (isSorted(a));

        for (int i = 0; i < n; i++) {
            swap(a, i, n-1);
            if (sort(a, n-1))
            	return true;
            swap(a, i, n-1);
        }
        return false;
    }

    private static void swap(int[] a, int i, int j)
    {
        int t;
        t = a[i]; a[i] = a[j]; a[j] = t;
    }
}

3. (10 pts). Sorting Implementation.

Using any language write an implementation of both merge sort, quicksort, and count sort that sorts an array of 1,000,000 random integers with values ranging from 0 to 100,000. Based on the observed runtime, which algorithm appears to run faster? If the random values range from 0 to 5 instead of 0 to 100,000 what is the impact on the runtime? Justify your answer experimentally in your program.

4. (15 pts). Dictionary Hashing.

Use C++ to write a simple spell checker using a hash table. You are free to choose your own hash function but effort should be made to use a good hash function. Several are discussed in the book. Your program should:

Read in this file of 45407 words: words.txt. Insert each word, converted to lowercase, into a hash table. (You may want to start with a smaller word file for testing)
Output the hash table size, percentage of entries that are used (the load), the number of insertions that resulted in a collision, and the length of the largest collision chain.
Allow the user to type in a word and output whether or not the word is in the dictionary.

Repeat the above to generate the statistics in step 2 using the following variants:

Hash table size = 60,331 with chaining
Hash table size = 60,331 with linear probing
Hash table size = 120,721 with chaining
Hash table size = 120,721 with linear probing

Your results will vary depending on the hash function used. If you have a large chain of collisions (say 15 or more) then your hash function needs improvement and you may lose some points.

5. (15 pts). Mind Reader.

The theme of this program is to write a program in C++ that will "read" the mind of the player. You should be able to re-use your hash table code from the previous problem with some minor revisions. The program interaction is as follows:

The user needs to pick either heads or tails. Before the user makes his/her pick the program will make a prediction as to whether the user is going to pick head or tail. The programs prediction is hidden from the user until he makes his actual pick. Once the user makes his pick the programs prediction is revealed. If the program correctly predicted the users pick it gets a point otherwise if the program misses the user gets a point. Who ever (program or user) gets 25 points first is deemed the winner. Example of play:

Welcome to MindReader

Guess heads or tails and I'll predict your guess.
What is your pick [h/t] ? t
No. I predicted h
Score = Comp: 0 | Player: 1

Guess heads or tails and I'll predict your guess.
What is your pick [h/t] ? h
Yes!. I too predicted h
Score = Comp: 1 | Player: 1

Guess heads or tails and I'll predict your guess.
What is your pick [h/t] ? t
No. I predicted h
Score = Comp: 1 | Player: 2

Guess heads or tails and I'll predict your guess.
What is your pick [h/t] ?

...

The "AI" of the program should use a hash table. The lookup string should consist of the last 1-4 picks made by the player. For example, if the first pick is "t" then the lookup in the hash table is "" because there is no previous pick. If the second pick is "h" then the lookup string is "t" which is the past pick. If the third pick is "h" then the lookup in the hash table is "th" which is the first "t" and the second pick of "h". If the fourth pick is "t" then the lookup in the hash table is "thh" which is the first-third picks. If the fifth pick is "t" then the lookup in the hash table is "thht". If the sixth pick is "t" then the leftmost "t" is lost and the lookup becomes "hhtt". From this point on the lookup key is always 4 characters.

Rather than store the key in the hash table, for a given key the hash table should store two integers (or an object with two integers). One of the integers should count the number of times the next pick in the sequence for that key was heads, and how many times the next pick for that key was tails. For each pick made by the player you should update the counters for up to 4 entries in the hash table: For the last 4, last 3, last 2, and last 1 picks made. In the example above:

First pick of "t" past key = "". Nothing updated since no previous key.
Second pick of "h" past key = "t". Update hash table entry for "t" with 1 heads, 0 tails.
Third pick of "t" past key = "th". Update hash table entry for "h" with 0 heads, 1 tails. Update hash table entry for "th" is updated to indicate 0 heads, 1 tails.
Fourth pick of "t" past key = "tht". Update hash table entry for "t" with 1 heads, 1 tails. Update hash table entry for "ht" with 0 heads, 1 tails. Update hash table entry for "tht" with 0 heads, 1 tails.
Fifth pick of "t" past key = "thtt". Update hash table entry for "t" with 1 heads, 2 tails. Update hash table entry for "tt" with 0 heads, 1 tails. Update hash table entry for "htt" with 0 heads, 1 tails. Update hash table entry for "thtt" with 0 heads, 1 tails.
Sixth pick of "h" past key = "httt". Update hash table entry for "t" with 2 heads, 2 tails. Update hash table entry for "tt" with 1 heads, 1 tails. Update hash table entry for "ttt" with 1 heads, 0 tails. Update hash table entry for "httt" with 1 heads, 0 tails.

To make a prediction the program picks whichever option appeared more times. If there is a tie, randomly pick heads or tails. If there is no hash table entry for the key, try with a key of length 3 (use only the previous three picks). If there is still no hash table entry, try a key of length 2, then a key of length 1. If there is no entry for a key of length 1 then randomly pick heads or tails.

6. (5 pts). Mind Reader Again.

Re-do the same program in C++, Java, or Python, but use either a C++ map, Java HashMap, or python Dictionary to implement the program. The behaviour should otherwise be the same.

Homework Assignment #4 Due Thursday, December 6, 1:00 PM (Before Class) 65 points total

Homework Assignment #4
Due Thursday, December 6, 1:00 PM (Before Class)
65 points total