CSE5311 Design and
Analysis of Algorithms
Dr. Mohan Kumar
FALL 2009
Project
Problems
The project has two parts: Part I and
Part II. For each problem, you are required to
submit (by email) the algorithm, the code, sample data, results, a one page report, and
instructions for executing the code, to the TA. Presentation
of Results: Measure CPU times, compare results and use tables or plots. PLEASE NOTE THAT your PROGRAMS WILL
BE TESTED WITH DIFFERENT DATA SETS by the TA. The submissions should be made on or
before 9 AM December 1, 2009.
Project Problem Assignments 3-Person Teams PLUS Part II: All Problems
2-Person Teams Part I: Any three Problems PLUS Part II: Problem 1 AND (Problem 2 or Problem 3.) 1-Person Team PLUS Part II: Any one Problem PART
I (for Problems 1 through 5 Please refer Text book:
Algorithm Design by Kleinberg and Tardos) 1.
Chapter 5, Pg
247, Problem 4. 2.
Chapter 6, Pg
319, Problem 8. 3.
Chapter 6,
Page 320, Problem 9. 4.
Chapter 7,
Page 427, Problem 21. 5.
Chapter 7, Pg
439, Problem 38. Part I,
Problem 6: The Quicksort algorithm is an efficient and popular sorting
technique that sorts a list of keys S[1],S[2], . . ., S[n], recursively by
choosing a pivot key. The best-case running time of Quicksort is O(n log2n)
and its worst-case running time is O(n2 ). Several improvements and
modifications have been proposed to improve Quicksort’s worst-case behavior.
For example, the paper by Wainwright [1] presents Bsort, a variation of
Quicksort that combines Bubble-sorting techniques with the Quicksort
algorithm. Other methods include, Quickersort[2], qsort[3], CKsort[4].
You can choose ONE improvisation of Quicksort (of your choice) – let’s call
it MY_CHOICE_QSORT. Alternatively, you can choose a method not listed above,
but available in the literature, please include the reference in your report.
Write programs to implement sorting algorithms that employ
MY_CHOICE_QSORT, Quicksort, Mergesort and Heapsort for sorting keys. Execute
your sorting programs for the following sets of data: a. Set_1:Ordered List b. Set_2:Reverse order List c. Set_3:A list containing the same value through out d. Set_4:Random List e. Set_5:25% of the List sorted Presentation of Results: Measure CPU time, number of
partitions (only for Quicksort and, MY_CHOICE_QSORT) and number of
comparisons for data sizes 1000, 10K, and 1M. Present your results using
tables or graphs and write a 1-page report. The report should have a
psuedocode for MY_CHOICE_QSORT and summarize the behavior of all Sorting
algorithms tested and their suitability. References [1] R.L. Wainwright, A Class of Sorting Algorithms based
on Quicksort, Communications of the ACM, Vol. 28, No. 4, April 1985, pgs.
396-402. [2] R.S. Scowen, Algorithm 271: Quicksort, Communications
of the ACM, Vol. 8, No. 11, Nov. 1965, pgs. 669-670. [3] M.N. vanEmden, Algorithm 402: Increasing the
efficiency of Quicksort, Communications of the ACM, Vol. 13, No. 11, Nov.
1970, pgs. 693-694. [4] C.R. Cook, and Kim D.J, Best sorting algorithm for
nearly sorted lists, Communications of the ACM, Vol. 23, No. 11, Nov. 1980,
pgs. 620-624. [5] C.A.R. Hoare, Algorithm
64:Quicksort, Communications of the ACM,Vol. 4, No. 7, July 1961, pg. 321. Part I, Problem 7 Write a program to extend the Rabin-Karp method to handle
the problem of looking for a given m _ m pattern in an n _ n array of characters. (The pattern
may be shifted vertically or horizontally, but it may not be rotated). Use
another method, for example, a variation of the KMP algorithm, a different
method from a research paper, or your own method and compare the results with
those of the Rabin-Karp extension. a. Set_1: n=64, m=4,8, and16 b. Set_2: n=256, m=4, 16, and64 c. Set_3: n = 1024, m= 4, 16, and 64 Presentation of Results: Measure CPU
time, compare results and present using tables or plots. Part II, Problem 1: The list in http://www.fec.gov/pages/elecvote.htm
gives the number of electoral votes for each state for the Presidential
election (the candidate receiving the majority of the votes in a state
collects all the electoral votes for that state.) There are altogether 538
electoral votes. Write a heuristic algorithm (e.g., Branch and Bound,
Backtracking, or other) to determine if it is possible for the election to
end up in a tie. If so what are the possible configurations. Implement your
algorithm. Part II, Problem 2: Consider a communication network that can be modeled as a
weighted undirected connected graph G = (V, E). Each site in the network is represented
as a vertex and each line of communication is bidirectional and has a cost
associated with it. The cost may correspond to the expected delay on the
line, or to the tariff for using each line. Each site has online local
information; i.e., it knows online the edges (and vertices) adjacent to it.
An MCST of the network can be used to broadcast messages to all sites. If we
broadcast the messages by using only the edges of the MCST, then the total
cost is minimized. Assume that such an MCST is computed by some method and
that each site knows which of the edges adjacent to it belongs to the MCST.
Assume now that the sites in a certain subset U subset of V share between them the information
that is known to all of them. In other words, every site in U knows not
only about the edges and vertices adjacent to itself, but it also knows all
the edges and vertices adjacent to all vertices of U. Furthermore, assure that the partial MCST restricted to U is
connected (i.e. it is a tree). Consider an edge e _ U, which belongs to the MCST, whose cost has just changed. a. Find the conditions under the change in e’s cost is
guaranteed not to affect the MCST. Consider only conditions that can be
checked with the information known to the sires in U. In other words, how can sites in U determine
that no action needs to be taken top modify the MCST after the change? b. Find the conditions under which the modified MCST is
different from the original one only in edges that belong to U (hence,
the change can be handled locally). Consider only conditions that can be
checked with the information known to the sites in U. c. Design and implement an algorithm to check for the
conditions in parts a and b (again only within U), and to modify the MCST accordingly. The
algorithm does not need to handle the case where the change to the MCST may
be outside U.
Generate a random graph with uniform distribution of edges to be the
underlying structure. Part II, Problem 3: Write a program that constructs Huffman code for a given
English text and encode it. Write a program for decoding an English text that
has been encoded with Huffman code. Experiment with your encoding and
decoding programs for at least 5, 10, 25, 50 and 100 different texts, each
text having a minimum of 50 words and at least one page in length. What range
of compression ratios do you get? Can you improve the compression ratios by
using some estimates of frequencies instead of actual frequencies? You are
encouraged to look into published literature for ideas to improve the
compression ratio. Part II, Problem 4: Crawler and Analyzer Task: Write a crawler which starts
from cse.uta.edu and follows all hyperlinks
on the given page when these hyperlinks point to other pages in the cse.uta.edu and crystal.uta.edu domains. Eg: While
starting from cse.uta.edu, if cse.uta.edu/faculty is encountered, it
is followed. If www.uta.edu is found on cse.uta.edu, it is ignored since it is outside
the cse or crystal domain. A crawler is a tool which starts at a page and
follows some algorithm to traverse to pages that can be reached by following
hyperlinks. In the real world, crawlers are used primarily for traversing the
structure of the domain and creating indices on the content of the underlying
pages. For the purpose of this project the focus should be to only find the
underlying structure of the document graph where each document represents a
single vertex. Directed edges exist between vertices when there are
hyperlinks connecting one document to the other. Thus, after the crawling
procedure is complete, a directed graph representing the underlying structure
should be the output of the crawl. The first task consists of writing a
Breadth First Search crawler to crawl cse.uta.edu
as mentioned above. This task requires proper and adequate data structures to
plot the graph. Parsing the right information from every page needs to be
done carefully. After the crawl is complete, the next
task is to find the following information: a.
Diameter of the underlying graph b.
Shortest path between cse.uta.edu and the
class webpage on crystal.uta.edu For basic literature on crawlers and
web crawling refer to: Arvind Arasu, Junghoo Cho, Hector Garcia-Molina,
Andreas Paepcke, Sriram Raghavan: Searching the Web. ACM Trans. Internet
Techn. 1(1): 2-43 (2001) M. Najork and A. Heydon.
High-performance web crawling. SRC Research Report 173, Compaq Systems
Research Center, Palo Alto, CA, Sep. 2001. SUBMISSION
FORMAT o Any platform or language can be used. o During the project demo, each team is requested to bring a
laptop where they have built and tested their code. Another option is to run
the code on OMEGA or GAMMA (no laptops are required in this case). o All problems are to be run on an input set of data (can be
conceptualized by the team members) to generate an output. Varied test data
sets will be used to test the correctness of the code. o Each team has to submit detailed documentation for each
part (steps to run the code, description of each module, etc.) o All code and documentation (.ZIP format please) is to be
emailed to the TA (CC’ Dr. Kumar) before the submission deadline with
“CSE5311 SPRING 2007 PROJECT + <Team member last names (comma
separated)>” as the email Subject. o Teams selecting Problem III.7 need to submit their
solution in writing (a scanned copy in the email submission will do too) to
the TA before the submission deadline. o Late Submissions will not be encouraged. The penalty
for late submission is 20% per day. If you miss the deadline due to unavoidable
circumstances (e.g., health), email the instructor for an appointment or meet
with him during office hours. Submission
Deadline: December 1, 2009. (09:00 AM CST) |