About
the Course
Much of the world’s
recorded data is locked up in structured sources such as
databases, which are often the propriety information of
private corporations and government agencies. Searching
and exploring for information within databases is
currently very cumbersome - often the data explorer has to
know comprehensive query languages (such as SQL), as well
as important information on how the data is structured
into different tables and columns (the database schema).
In recent years, researchers have pondered on the problems
of improving the search and exploration capabilities for
relational databases. This includes adapting probabilistic
and approximate querying methods to improve the
scalability of query answering, as well as information
retrieval techniques such as relevance ranking and keyword
search. This class will explore the recent efforts by
researchers in these extremely important and challenging
fields. We will read and discuss latest research
literature gleaned from premier conferences in databases
and information retrieval. It is hoped that this class
will spur students to pursuing further research in these
areas.
The following is a
tentative list of topics which we will attempt to cover:
1. Probabilistic
Methods in Databases
Sampling Methods in Databases: Basics
Approximate Query Processing
Processing of Fuzzy/Uncertain Data
2. Unstructured
Search in Databases Keyword Queries in Databases
Ranking of Database Query Results
3. DB and IR
integration
Top-K algorithms
We will cover
various topics in breadth, understand the central
contributions of these efforts and try and predict future
research directions.
Prerequisites Advanced
Algorithms and Database II are the prerequisite courses.
However, exceptions will be made on a case by case basis,
especially if the student has prior exposure or
demonstrates initiative to quickly learn these concepts on
his/her own.
Presentations
The actual reading
list, consisting of recent research papers, will be
selected and finalized as the course progresses. Each
student will present one or more papers (depending on the
enrollment) during the semester. Students will participate
in class discussions during and after each presentation.
Attendance is required.
Homework
There is going to be
two homework which are used for students to prepare for
the exams.
Evaluation
The grade will be
based on the paper presentations, class attendance and
participation, performance in the projects, one midterm
exam and one final exam.
Homework
|
25%
|
Midterm exam
|
25%
|
Final exam
|
25%
|
Presentation
|
20%
|
Class
participation
|
05%
|
Tentative
Schedule (Presentation, Lectures and Exams)
Each paper will be
presented by a group of two/three students based on the
enrollment. Each group will present two papers. The papers
will be presented in two cycles, the first one before the
mid term exam and the second one after the midterm exam.
Given below is a tentative schedule for class
presentations over the course of this semester.
Date/Day
|
Presenter(s)
|
Paper
|
Sep 01 - Sep 10
|
Dr. Gautam Das
|
Lectures
|
Sep 15 Tuesday
|
Gopinath,Shruti P
|
Ganti V., Lee M.
L., Ramakrishnan R. ICICLES: Self-tuning Samples for
Approximate Query Answering, VLDB, 2000. ppt
slides
|
Sep 15 Thursday
|
Shah,Nishit Haresh
|
Chaudhuri
S., Das G., Datar M., Motwani R., Narasayya V.
Overcoming Limitations of Sampling for Aggregation
Queries, ICDE 2001. ppt
slides
|
Sep 17
Tuesday
|
Hoskere,Vinay S
|
Acharya
S., Gibbons P. B., Poosala V., Ramaswamy S. Join
Synopses for Approximate Query Answering, ACM SIGMOD
1999. ppt
slides
|
Sep 17
Thursday
|
Rao,Divya Ganesha Babu
|
Acharya
S., Gibbons P. B., Poosala V. Congressional Samples for
Approximate Answering of Group-By Queries, ACM SIGMOD
2000. ppt
slides
|
Sep 22 Tuesday
|
Fernandes,Melissa J
|
Chaudhuri
S., Das G., Narasayya V. A Robust, Optimization- Based
Approach for Approximate Answering of Aggregation
Queries, ACM SIGMOD 2001. ppt
slides
|
Sep 24
Thursday
|
Kamble Gnanoba,Shashank
|
Babcock
B., Chaudhuri C. and Das G. Dynamic Sample Selection
for Approximate Query Processing, ACM SIGMOD 2003. ppt
slides
|
Sep 24
Thursday
|
Dave,Arjav
|
Hellerstein
J., Haas P., Wang H. Online Aggregation, ACM SIGMOD
1997. ppt
slides
|
Sep 29 Tuesday
|
Chandrashekar,Nag P
|
P.
J. Haas and J. M. Hellerstein. Ripple Joins for Online
Aggregation, ACM SIGMOD 1999. ppt
slides
|
Oct 01 Tuesday
|
Sudheendra,Supriya
|
Kaushik
Chakrabarti, Minos Garofalakis, Rajeev Rastogi,
Kyuseok Shim. Approximate Query Processing Using
Wavelets, VLDB 2000. ppt
slides
|
Oct 06 Thursday
|
Nema,Srikantha Aswathanarayana
|
Chaudhuri
S., Motwani R., Narasayya V. On Random Sampling Over
Joins, ACM SIGMOD 1999. ppt
slides
|
Oct 08 Tuesday
|
Dr. Gautam Das
|
Review
for Midterm exam
|
Oct 13 Tuesday
|
-
|
MIDTERM
EXAM
|
Oct 15 – Oct
27
|
Dr. Gautam Das
|
Lectures
|
Oct
29 Tuesday
|
Krishnamoorthy,Pranav
|
Jon
M. Kleinberg. Authoritative sources in a hyperlinked
environment, Journal of the ACM 46(1999). ppt
slides
|
Oct 29 Thursday
|
Byrappa Nagaraj,Chandrik
|
L.
Page, S. Brin, R. Motwani, T. Winograd. The PageRank
Citation Ranking: Bringing Order to the Web, 1998. ppt
slides
|
Nov 03 Tuesday
|
Bansal,Kushal N
|
Charuta
Nakhe, Arvind Hulgeri, Gaurav Bhalotia, Soumen
Chakrabarti, S. Sudarshan. Keyword Searching and
Browsing in Databases using BANKS, ICDE 2002. ppt
slides
|
Nov 05 Thursday
|
Mannem,Anusha
|
Sanjay
Agrawal, Surajit Chaudhuri, Gautam Das.
DBXplorer: A System For Keyword-Based Search Over
Relational Databases, ICDE 2002. ppt
slides
|
Nov 10 Tuesday
|
Somuri,Ramya
|
Sanjay
Agrawal, Surajit Chaudhuri, Gautam Das, Aristides
Gionis. Automated Ranking of Database Query Results,
CIDR 2003. ppt
slides
|
Nov 12 Thursday
|
Satyanarayana,Sindhu and Raju,Ranjan
Alankar
|
Surajit
Chaudhuri, Gautam Das, Vagelis Hristidis,
Gerhard Weikum, Probabilistic Ranking of Database
Query Results, VLDB 2004. ppt
slides
|
Nov 17 Tuesday
|
Gupta,Jitendra
|
Christopher
Re, Nilesh
Dalvi, Dan
Suciu:
Efficient Top-k Query Evaluation on Probabilistic Data,
ICDE 2007. ppt
slides
|
Nov 17 Tuesday
|
Muniraju,Sowmya
|
Ihab
F. Ilyas, Walid G. Aref, Ahmed K. Elmagarmid.
Supporting top-k join queries in relational databases,
VLDB 2003. ppt
slides
|
Nov 19 Thursday
|
Mangipudi,Aditya and Madiraju
Keshavaraju,Pav
|
Zhen
Zhang,
Seung-won
Hwang, Kevin
Chen-Chuan Chang,
Min
Wang, Christian
A. Lang,
Yuan-Chi
Chang: Boolean
+ ranking: querying a database by k-constrained
optimization, ACM SIGMOD 2006. ppt
slides
|
Nov 24 Tuesday
|
Ancha,Poornima and Buchi,Raju
|
Gautam
Das, Dimitrios Gunopulos, Nick Koudas, Dimitris
Tsirogiannis. Answering Top-k queries Using Views, VLDB
2006. ppt
slides
|
Nov 26 Thursday
|
-
|
Thanksgiving
Holidays
|
Dec 01 Tuesday
|
Dr. Gautam Das
|
Review
for Final exam
|
Dec 08 Tuesday
|
-
|
FINAL
EXAM (NH109 11am-1pm)
|
Announcements
Please check this section regularly during the semester
for updates and announcements on the course
Ethics statement is available here.
Please print, sign and submit it to the instructor during
class.
Group assignment for presentation might change in future.
Any changes will be updated on the website.
Please send the TA your presentation slides one day
before the scheduled presentation.
If you do not send you presentation slides before the
scheduled presentation, you might loose some points.
|