Programming Projects (subject to change: not finalized until program is assigned in class and a due date is announced) Turn-in: e-mail me your source code (as an attachment), instructions to compile/execute, known bug list, and a 2-5 page report summarizing your experimental results, if any (either in the text of the e-mail or as a separate attachment). DO NOT send compressed, tarred, or shar'ed files. DO NOT send viruses. Your report should discuss the problem, your solution, provide a summary of the data, and explain the results. Include references as necessary. You are encouraged to sumbit a draft of your report prior to final submission. Do your own work, no collaboration allowed with classmates or outside sources (other than the text or the notes). Programs that do not compile will not be graded. Project 1. Write a program (any language) to input an integer n and compute how many prime numbers there are whose value is less than n. Note: be prepared for fairly large integers (use the largest built-in int data type, such as "long") Note: "1" is not prime, "2" is prime. How large an n can your program solve in 1 second? How large an n can your program solve in 5 seconds? in 10 seconds? in 1 minute? Means of solving problem (you must do two to recieve full credit): a) To test if x is prime, consider whether x mod i = 0 for all values of i in an appropriate range b) Use the sieve of Eratosthenes (dates from ~200 B.C.): in range from 2 to x, mark all numbers divisible by 2 that are greater than 2, (so you would mark 4, 6, 8, 10, ... in increasing order); then mark all numbers divisible by 3 that are greater than 3. Each subsequent iteration, mark all numbers divisible by smallest unmarked number that has not been used before. Numbers left unmarked at the end are prime. c) Use the Miller-Rabin method (available in Java library, or else code it yourself). Graduate students must do a, b, c. (other methods will be, if only briefly, discussed in class). Submit your source code, the answers for various n's and the answers to the questions above (indicating what type of machine you used). Compare the running times of different methods for each input. You should test your program on several inputs, including one or more near the upper limit of the largest integer available to you. Question: How can you modify your program to handle HUGE integers, those much much larger than "maxint"?? Project 2. Input two N by N matrices, A, B from a text file, and compute the product. You should take the filename as a command-line argument. The input file will be such the the first line will contain N, the next N lines will contain A and the next N lines will contain B. Use the "naive" matrix multiplication algorithm that computes Product[i, j] as follows. Sum = 0; For a = 1 to N Sum = Sum + A[i, a] * B[a, j] end Product[i, j] = Sum What is the largest N you can compute in 1 seconds? 10 seconds? 1 minute? How might you solve the problem of multiplying two 10,000 by 10,000 matrices? Can you declare/allocate an array that large? How long do you estimate it would take to do this multiplication (estimate based on your answers above and the running time of the algorithm and assume the 10,000 by 10,000 arrays would fit into memory)? What if all elements of the 10,000 by 10,000 array were 0 or 1? Project 3. Implement the recursive matrix multiplcation algorithm. You may assume the arrays are N by N and that N is a power of 2. Compare the running time with that of the naive algorithm. What is the smallest N for which the recursive is faster? Project 4. Sorting and Median Finding: We want to compare several different median-finding algorithms in practice. Each of a) - d) will be worth 20% of your grade and your comparisons/report will be worth 20%. a. Implement HeapSort. Use this to find the median by simply sorting the array first and then choosing the middle element. b. Implement at least one 'bad' sorting algorithm (e.g. bubble sort, selection sort, insertion sort). Then find the median as described in a). c. Implement the O(n) worst-case median finding algorithm given in class (p. 189-192 in text) d. Implement the O(n) expected time algorithm from p. 186 Now compare the running times of these median-finding algorithms as follows: 1. Randomly generate 100 arrays with 100 integer elements and run each algorithm above on each array. Keep track of the time each algorithm takes (in total) on the 100 arrays. 2. Randomly generate 100 arrays with 1000 elements each and run each algorithm above on each array. Keep track of the time each algorithm takes (in total) on the 100 arrays. 3. Randomly generate 100 arrays with 5,000 elements each and run each algorithm above on each array. Keep track of the time each algorithm takes (in total) on the 100 arrays. Note: for an array with N elements, indicate the domain from which you drew the elements. For example, you may choose elements between 0 and 32,767. Present your results in a simple tabular format and give a 1-2 page summary and interpretation of the results. Project 4B. Implement the O(n) worst-case median-finding algorithm and the O(n) expected time algorithm (see p. 186 of Cormen). Compare the actual running times on various data sizes with various (a) choices of size of base case for recursion (b) various sorting algorithms for base case (at least two of insertion sort, bubble sort, heap sort, quicksort). Project 5. Union Find. 1. Implement both Union-Find algorithms described in class. 2. Generate 100 random sequences of 10,000 unions/finds -- each over a domain of 500 integers. That is, generate 10,000 operations: each either "union(a, b)" or "find(a)" Elements a and b should be chosen randomly and with uniform probability. Unions of two elements in the same set should obviously do nothing. I suggest you ``rig'' the sequence so that finds are more likely than unions (maybe 75% finds and 25% unions). 3. Keep track of the number of comparisons made by each algorithm -- both the total over all sequences and the totals for individual sequences. Which algorithm had the best total? Which won the most individual races? What were the largest differentials? What would happen if a and b were not chosen from a uniform distribution, but some values were more likely than others? What if all the unions came before all the finds? Summarize your results in 1-3 pages. Project 6. Using the Java graph routines from the course web page (you may write in another language, but if so, you must adhere to the graph format described on those pages, using adjacency matrix), write a program to find a smallest dominating set in a graph. See graph.java file. Project 7. Using the Java graph routines from the course web page (you may write in another language, but if so, you must adhere to the graph format described on those pages, using adjacency matrix), write a program to find the shortest path in a weighted graph such that the path contains at most 3 edges. Compare the running time of this algorithm with that of Disjktra's algorithm for a number of large graphs (random graphs suggested: in which case you need to modify randomgraph.java to generate random edge weights). Project 8. Implement Floyd-Warshall All-Pairs Shortest Path using Java and the graph routines from the course web page. Your program must use the graph format and style of the existing programs. Your program should use adjacency matrices (see graph.java or randomgraph.java) and output the all-pairs shortest path matrix. See graph.txt for sample input format (1st line is # of vertices, followed by adjacency matrix). Note: Since the graph is weighted, in the input matrix, you may wish to use a non-numeric character to represent an edge which is not present. Alternatively, you might use a 0 and assume there are no weight 0 edges. Or, you can let the input be in the form of the file weighted.txt (adjacency list format, and then translate that after reading it into an adjacency matrix). Note2: You may wish to modify the existing class random graph to generate random edge weights also. Project 9. Write a program to input a DNA sequence from the keyboard (at most 50 characters) and find the closest matching sequence that occurs in a file "samples.txt" (each sequence in the sample file has at most 60 characters and each appears on a seperate line). Define a score function so that spaces at the beginning or end of an alignment are penalized 1, mis-matched characters have a penalty of 2, matching a space with a character has a penalty of three 3. Matching characters get a +1 score. The output of your program should be the sequence from the file that best matched the input sequence, as well as the score that that sequence attained. [Note: it is acceptable if you penalize spaces at beginning/end of sequence with 3 rather than 1, but I will give full credit only if you do 1 ... doing so will require you do modify the algorithm a bit. This is worth 10% of the grade.] Extra Credit (10%): Forbid there to be two spaces in a row except at the beginning or end of an alignment. So A--BC AAABB is illegal, but the following is legal AABB-- --BBCC Project 10. You have 3 choices: 1) Implement Miller-Rabin primality testing from the book and compare its performance with the trial-divsion method (Java strongly recommended so that you can use its BigInteger class and test the programs on very big integers) 2) Implement any of the Max-Flow algoritms (Must be done in Java using the graph.java class shown on the class web page) 3) Write a program to input a graph (adjacency matrix ... you can use the graph.java program) and count the number of dominating sets of size i that the graph has for all i between 1 and n. A dominating set is s subset, D, of the vertex set, V, such that every vertex in V is either in D or has at least one neighbor in D [Note that a vertx can be in D and also have neighbors that are in D]. Project 11. Let A be a square, symmetric 0-1 matrix with 1's on the diagonal. We wish to determine if A has a non-zero determinant or not. We will use two different algorithms and compare the running times of each. Assume A has at most 25 rows adn at least 2 rows. Method 1. Test if there exists a non-empty subset of colums X such that each row has an even number of 1's in the columns corresponding to X. You will need to consider all possible subsets of the columns! If such a set exists, the determinant is equal to 0. Method 2. Define the determinant of a 2 by 2 matrix as a b = ad - bc (mod 2) c d We can recursively define the determinant of an n by n matrix M as follows. For each entry a_1, j on row 1, consider the matrix M_1, j obtained from M by deleting row 1 and column j. Then the determinant of M is defined as a_1, 1 * det(M_1, 1) - a_1, 2 * det(M_1, 2) + a_1, 3 * det(M_1, 3) -a_1, 4 * det(M_1, 4) + a_1, 5 * det(M_1, 5) - ... (mod 2) Run each algorithm on several matrices of each size and compare the running times in practice. Determine the O() running time of each as well and report your finings in a short report. 12. Input two integers and compute their greatest common divisor. This is to be done two ways: a) Use Euclid's GCD algorithm (page 858) b) Factor each integer into its prime factors and determine if there are any factors in common. Factoring should be done by modification of the "trial division" algorithm. Run each program for a variety of inputs (including integers as large as practical) and report the running times for each (you may wish to compute running times with library function or simply by using a watch with a secondhand). E-mail me your source code and a brief text summary of your experimental results. 13. Input integers x, n, p and computer x^n (mod p) two ways (you are allowed to only use +, -, *, /, mod functions in the computation) a) iteratively using n-1 multiplcations b) using O(log n) multiplications 14. Input n integers (one per line from a text file) and determine if they can be divided into two sets each having the same sum. Run the program on some test data and report the running times (of varying sizes and numbers of integers). For full credit, if there are two sets, print them out. Ideally, you should try the dynamic programming approach and a brute force approach whereby you consider all possible subsets and compare the two running times. Bonus: design, analyze, and implement an algorithm to determine if n integers can be partition into three sets, each having the same sum.