CSE5311 Design and Analysis of Algorithms

Dr. Mohan Kumar

FALL 2009

Project Problems

 

Home Page

 

The project has two parts: Part I and Part II.

For each problem, you are required to submit (by email) the algorithm, the code, sample data, results, a one page report, and instructions for executing the code, to the TA.

Presentation of Results: Measure CPU times, compare results and use tables or plots.

 

PLEASE NOTE THAT your PROGRAMS WILL BE TESTED WITH DIFFERENT DATA SETS by the TA.

 

The submissions should be made on or before 9 AM December 1, 2009.

 

  • Listed below are the programming projects for Fall 2009
  • The problems can be attempted individually or in groups
  • Group discussions are encouraged. However, each participant is expected to know the details of the project work done.
  • You will have to write and execute the code for each problem and demonstrate the working with a sample input.
  • All references must be listed
  • Read the problem selection section below carefully to understand the number and weights of problems for individual or group (2 or 3) teams
  • After completion of the project, each team member will be required to complete a feedback form summarizing the contributions of teammates.
  • Sample feedback form
  • All questions regarding the project should be emailed to the TA with CC's to Dr. Kumar and all the teammates.

 

 

Project Problem Assignments

 

3-Person Teams
Part I: Any Four Problems

PLUS

Part II: All Problems

 

            2-Person Teams

Part I: Any three Problems

PLUS

Part II: Problem 1 AND (Problem 2 or Problem 3.)

 

1-Person Team
Part I: Any two Problems

PLUS

Part II: Any one Problem

 

PART I

(for Problems 1 through 5 Please refer Text book: Algorithm Design by Kleinberg and Tardos)

 

 

1.      Chapter 5, Pg 247, Problem 4.

2.      Chapter 6, Pg 319, Problem 8.

3.      Chapter 6, Page 320, Problem 9.

4.      Chapter 7, Page 427, Problem 21.

5.      Chapter 7, Pg 439, Problem 38.


 

Part I, Problem 6:

 

The Quicksort algorithm is an efficient and popular sorting technique that sorts a list of keys S[1],S[2], . . ., S[n], recursively by choosing a pivot key. The best-case running time of Quicksort is O(n log2n) and its worst-case running time is O(n2 ). Several improvements and modifications have been proposed to improve Quicksort’s worst-case behavior. For example, the paper by Wainwright [1] presents Bsort, a variation of Quicksort that combines Bubble-sorting techniques with the Quicksort algorithm. Other methods include, Quickersort[2], qsort[3],  CKsort[4]. You can choose ONE improvisation of Quicksort (of your choice) – let’s call it MY_CHOICE_QSORT. Alternatively, you can choose a method not listed above, but available in the literature, please include the reference in your report.

Write programs to implement sorting algorithms that employ MY_CHOICE_QSORT, Quicksort, Mergesort and Heapsort for sorting keys. Execute your sorting programs for the following sets of data:

a. Set_1:Ordered List

b. Set_2:Reverse order List

c. Set_3:A list containing the same value through out

d. Set_4:Random List

e. Set_5:25% of the List sorted

 

 

Presentation of Results: Measure CPU time, number of partitions (only for Quicksort and, MY_CHOICE_QSORT) and number of comparisons for data sizes 1000, 10K, and 1M. Present your results using tables or graphs and write a 1-page report. The report should have a psuedocode for MY_CHOICE_QSORT and summarize the behavior of all Sorting algorithms tested and their suitability.

 

References

[1] R.L. Wainwright, A Class of Sorting Algorithms based on Quicksort, Communications of the ACM, Vol. 28, No. 4, April 1985, pgs. 396-402.

[2] R.S. Scowen, Algorithm 271: Quicksort, Communications of the ACM, Vol. 8, No. 11, Nov. 1965, pgs. 669-670.

[3] M.N. vanEmden, Algorithm 402: Increasing the efficiency of Quicksort, Communications of the ACM, Vol. 13, No. 11, Nov. 1970, pgs. 693-694.

[4] C.R. Cook, and Kim D.J, Best sorting algorithm for nearly sorted lists, Communications of the ACM, Vol. 23, No. 11, Nov. 1980, pgs. 620-624.

[5] C.A.R. Hoare, Algorithm 64:Quicksort, Communications of the ACM,Vol. 4, No. 7, July 1961, pg. 321.

 

Part I, Problem 7

 

Write a program to extend the Rabin-Karp method to handle the problem of looking for a given  m _ m pattern in an n _ n array of characters. (The pattern may be shifted vertically or horizontally, but it may not be rotated). Use another method, for example, a variation of the KMP algorithm, a different method from a research paper, or your own method and compare the results with those of the Rabin-Karp extension.

 

a. Set_1: n=64, m=4,8, and16

b. Set_2: n=256, m=4, 16, and64

c. Set_3: n = 1024, m= 4, 16, and 64

Presentation of Results: Measure CPU time, compare results and present using tables or plots.

Part II, Problem 1:

 

The list in http://www.fec.gov/pages/elecvote.htm gives the number of electoral votes for each state for the Presidential election (the candidate receiving the majority of the votes in a state collects all the electoral votes for that state.) There are altogether 538 electoral votes. Write a heuristic algorithm (e.g., Branch and Bound, Backtracking, or other) to determine if it is possible for the election to end up in a tie. If so what are the possible configurations. Implement your algorithm.

 

Part II, Problem 2:

 

Consider a communication network that can be modeled as a weighted undirected connected graph G = (V, E). Each site in the network is represented as a vertex and each line of communication is bidirectional and has a cost associated with it. The cost may correspond to the expected delay on the line, or to the tariff for using each line. Each site has online local information; i.e., it knows online the edges (and vertices) adjacent to it. An MCST of the network can be used to broadcast messages to all sites. If we broadcast the messages by using only the edges of the MCST, then the total cost is minimized. Assume that such an MCST is computed by some method and that each site knows which of the edges adjacent to it belongs to the MCST. Assume now that the sites in a certain subset U subset of V share between them the information that is known to all of them. In other words, every site in U knows not only about the edges and vertices adjacent to itself, but it also knows all the edges and vertices adjacent to all vertices of U.

Furthermore, assure that the partial MCST restricted to U is connected (i.e. it is a tree). Consider an edge e _ U, which belongs to the MCST, whose cost has just changed.

 

a.       Find the conditions under the change in e’s cost is guaranteed not to affect the MCST. Consider only conditions that can be checked with the information known to the sires in U. In other words, how can sites in U determine that no action needs to be taken top modify the MCST after the change?

b.       Find the conditions under which the modified MCST is different from the original one only in edges that belong to U (hence, the change can be handled locally). Consider only conditions that can be checked with the information known to the sites in U.

c.       Design and implement an algorithm to check for the conditions in parts a and b (again only within U), and to modify the MCST accordingly. The algorithm does not need to handle the case where the change to the MCST may be outside U. Generate a random graph with uniform distribution of  edges to be the underlying structure.

 

Part II, Problem 3:

 

Write a program that constructs Huffman code for a given English text and encode it. Write a program for decoding an English text that has been encoded with Huffman code. Experiment with your encoding and decoding programs for at least 5, 10, 25, 50 and 100 different texts, each text having a minimum of 50 words and at least one page in length. What range of compression ratios do you get? Can you improve the compression ratios by using some estimates of frequencies instead of actual frequencies? You are encouraged to look into published literature for ideas to improve the compression ratio.

 

Part II, Problem 4:

Crawler and Analyzer

Task: Write a crawler which starts from cse.uta.edu and follows all hyperlinks on the given page when these hyperlinks point to other pages in the cse.uta.edu and crystal.uta.edu domains. Eg: While starting from cse.uta.edu, if cse.uta.edu/faculty is encountered, it is followed. If www.uta.edu is found on cse.uta.edu, it is ignored since it is outside the cse or crystal domain. A crawler is a tool which starts at a page and follows some algorithm to traverse to pages that can be reached by following hyperlinks. In the real world, crawlers are used primarily for traversing the structure of the domain and creating indices on the content of the underlying pages. For the purpose of this project the focus should be to only find the underlying structure of the document graph where each document represents a single vertex. Directed edges exist between vertices when there are hyperlinks connecting one document to the other. Thus, after the crawling procedure is complete, a directed graph representing the underlying structure should be the output of the crawl.

 

The first task consists of writing a Breadth First Search crawler to crawl cse.uta.edu as mentioned above. This task requires proper and adequate data structures to plot the graph. Parsing the right information from every page needs to be done carefully.

After the crawl is complete, the next task is to find the following information:

a.       Diameter of the underlying graph

b.       Shortest path between cse.uta.edu and the class webpage on crystal.uta.edu

 

For basic literature on crawlers and web crawling refer to: Arvind Arasu, Junghoo Cho, Hector Garcia-Molina, Andreas Paepcke, Sriram Raghavan: Searching the Web. ACM Trans. Internet Techn. 1(1): 2-43 (2001)

M. Najork and A. Heydon. High-performance web crawling. SRC Research Report 173, Compaq Systems Research Center, Palo Alto, CA, Sep. 2001.

 

SUBMISSION FORMAT

 

o     Any platform or language can be used.

o     During the project demo, each team is requested to bring a laptop where they have built and tested their code. Another option is to run the code on OMEGA or GAMMA (no laptops are required in this case).

o     All problems are to be run on an input set of data (can be conceptualized by the team members) to generate an output. Varied test data sets will be used to test the correctness of the code.

o     Each team has to submit detailed documentation for each part (steps to run the code, description of each module, etc.)

o     All code and documentation (.ZIP format please) is to be emailed to the TA (CC’ Dr. Kumar) before the submission deadline with “CSE5311 SPRING 2007 PROJECT + <Team member last names (comma separated)>” as the email Subject.

o     Teams selecting Problem III.7 need to submit their solution in writing (a scanned copy in the email submission will do too) to the TA before the submission deadline.

o     Late Submissions will not be encouraged.  The penalty for late submission is 20% per day. If you miss the deadline due to unavoidable circumstances (e.g., health), email the instructor for an appointment or meet with him during office hours.

 

Submission Deadline: December 1, 2009. (09:00 AM CST)