CSE 6339 SEC 001




Fall 2009

Class Time and Place: Tu/Th 12:30-1:50 pm (WH 221)

Instructor: Dr. Gautam Das


Office: Nedderman Hall, Room# 302
Phone: 817 272 7595

Office Hours:  Wed 1:00-2:00 pm and Thu 2:00-3:00 pm

Teaching Assistant: Ning Yan


Office: GeoScience Building, Room# 237

Office Hours:  Thu 3:00-5:00 pm and by appointment

About the Course

Much of the world’s recorded data is locked up in structured sources such as databases, which are often the propriety information of private corporations and government agencies. Searching and exploring for information within databases is currently very cumbersome - often the data explorer has to know comprehensive query languages (such as SQL), as well as important information on how the data is structured into different tables and columns (the database schema). In recent years, researchers have pondered on the problems of improving the search and exploration capabilities for relational databases. This includes adapting probabilistic and approximate querying methods to improve the scalability of query answering, as well as information retrieval techniques such as relevance ranking and keyword search. This class will explore the recent efforts by researchers in these extremely important and challenging fields. We will read and discuss latest research literature gleaned from premier conferences in databases and information retrieval. It is hoped that this class will spur students to pursuing further research in these areas.

The following is a tentative list of topics which we will attempt to cover:

1. Probabilistic Methods in Databases

            Sampling Methods in Databases: Basics

            Approximate Query Processing

            Processing of Fuzzy/Uncertain Data

2. Unstructured Search in Databases Keyword Queries in Databases

            Ranking of Database Query Results 

3. DB and IR integration

            Top-K algorithms

We will cover various topics in breadth, understand the central contributions of these efforts and try and predict future research directions.


Advanced Algorithms and Database II are the prerequisite courses. However, exceptions will be made on a case by case basis, especially if the student has prior exposure or demonstrates initiative to quickly learn these concepts on his/her own.


The actual reading list, consisting of recent research papers, will be selected and finalized as the course progresses. Each student will present one or more papers (depending on the enrollment) during the semester. Students will participate in class discussions during and after each presentation. Attendance is required.


There is going to be two homework which are used for students to prepare for the exams.


The grade will be based on the paper presentations, class attendance and participation, performance in the projects, one midterm exam and one final exam.




Midterm exam


Final exam




Class participation



Tentative Schedule (Presentation, Lectures and Exams)

Each paper will be presented by a group of two/three students based on the enrollment. Each group will present two papers. The papers will be presented in two cycles, the first one before the mid term exam and the second one after the midterm exam. Given below is a tentative schedule for class presentations over the course of this semester.





Sep 01 - Sep 10

Dr. Gautam Das


Sep 15 Tuesday

Gopinath,Shruti P

Ganti V., Lee M. L., Ramakrishnan R. ICICLES: Self-tuning Samples for Approximate Query Answering, VLDB, 2000. ppt slides

Sep 15 Thursday

Shah,Nishit Haresh

Chaudhuri S., Das G., Datar M., Motwani R., Narasayya V. Overcoming Limitations of Sampling for Aggregation Queries, ICDE 2001. ppt slides

Sep 17  Tuesday

Hoskere,Vinay S

Acharya S., Gibbons P. B., Poosala V., Ramaswamy S. Join Synopses for Approximate Query Answering, ACM SIGMOD 1999. ppt slides

Sep 17  Thursday

Rao,Divya Ganesha Babu

Acharya S., Gibbons P. B., Poosala V. Congressional Samples for Approximate Answering of Group-By Queries, ACM SIGMOD 2000. ppt slides

Sep 22 Tuesday

Fernandes,Melissa J

Chaudhuri S., Das G., Narasayya V. A Robust, Optimization- Based Approach for Approximate Answering of Aggregation Queries, ACM SIGMOD 2001. ppt slides

Sep 24  Thursday

Kamble Gnanoba,Shashank

Babcock B., Chaudhuri C. and Das G. Dynamic Sample Selection for Approximate Query Processing, ACM SIGMOD 2003. ppt slides

Sep 24  Thursday


Hellerstein J., Haas P., Wang H. Online Aggregation, ACM SIGMOD 1997. ppt slides

Sep 29 Tuesday

Chandrashekar,Nag P

P. J. Haas and J. M. Hellerstein. Ripple Joins for Online Aggregation, ACM SIGMOD 1999. ppt slides

Oct 01 Tuesday


Kaushik Chakrabarti, Minos Garofalakis, Rajeev Rastogi, Kyuseok Shim. Approximate Query Processing Using Wavelets, VLDB 2000. ppt slides

Oct 06 Thursday

Nema,Srikantha Aswathanarayana

Chaudhuri S., Motwani R., Narasayya V. On Random Sampling Over Joins, ACM SIGMOD 1999.  ppt slides

Oct 08 Tuesday

Dr. Gautam Das

Review for Midterm exam

Oct 13 Tuesday



Oct 15 – Oct 27

Dr. Gautam Das


Oct 29 Tuesday


Jon M. Kleinberg. Authoritative sources in a hyperlinked environment, Journal of the  ACM 46(1999). ppt slides

Oct 29 Thursday

Byrappa Nagaraj,Chandrik

L. Page, S. Brin, R. Motwani, T. Winograd. The PageRank Citation Ranking: Bringing Order to the Web, 1998. ppt slides

Nov 03 Tuesday

Bansal,Kushal N

Charuta Nakhe, Arvind Hulgeri, Gaurav Bhalotia,  Soumen Chakrabarti, S. Sudarshan. Keyword Searching and Browsing in Databases using BANKS, ICDE 2002. ppt slides

Nov 05 Thursday


Sanjay Agrawal, Surajit  Chaudhuri, Gautam Das. DBXplorer: A System For Keyword-Based Search Over Relational Databases, ICDE 2002. ppt slides

Nov 10 Tuesday


Sanjay Agrawal, Surajit Chaudhuri, Gautam Das, Aristides Gionis. Automated Ranking of Database Query Results, CIDR 2003. ppt slides

Nov 12 Thursday

Satyanarayana,Sindhu and Raju,Ranjan Alankar

Surajit Chaudhuri, Gautam Das, Vagelis Hristidis, Gerhard Weikum, Probabilistic Ranking of Database Query Results, VLDB 2004. ppt slides

Nov 17 Tuesday


Christopher Re, Nilesh Dalvi, Dan Suciu: Efficient Top-k Query Evaluation on Probabilistic Data, ICDE 2007. ppt slides

Nov 17 Tuesday


Ihab F. Ilyas, Walid G. Aref, Ahmed K. Elmagarmid. Supporting top-k join queries in relational databases, VLDB 2003. ppt slides

Nov 19 Thursday

Mangipudi,Aditya and Madiraju Keshavaraju,Pav

Zhen Zhang, Seung-won Hwang, Kevin Chen-Chuan Chang, Min Wang, Christian A. Lang, Yuan-Chi Chang: Boolean + ranking: querying a database by k-constrained optimization, ACM SIGMOD 2006. ppt slides

Nov 24 Tuesday

Ancha,Poornima and Buchi,Raju

Gautam Das, Dimitrios Gunopulos, Nick Koudas, Dimitris Tsirogiannis. Answering Top-k queries Using Views, VLDB 2006. ppt slides

Nov 26 Thursday


Thanksgiving Holidays

Dec 01 Tuesday

Dr. Gautam Das

Review for Final exam

Dec 08 Tuesday


FINAL EXAM (NH109 11am-1pm)



  • Please check this section regularly during the semester for updates and announcements on the course

  • Ethics statement is available here. Please print, sign and submit it to the instructor during class.

  • Group assignment for presentation might change in future. Any changes will be updated on the website.

  • Please send the TA your presentation slides one day before the scheduled presentation.

  • If you do not send you presentation slides before the scheduled presentation, you might loose some points.


Home Research Publications Professional | Teaching | Personal