About
the Course
Much
of the world's recorded data is locked up in structured
sources such as databases, which are often the propriety
information of private corporations and government
agencies. Searching and exploring for information within
databases is currently very cumbersome - often the data
explorer has to know comprehensive query languages (such
as SQL), as well as important information on how the data
is structured into different tables and columns (the
database schema). In recent years, researchers have
pondered on the problems of improving the search and
exploration capabilities for relational databases. This
includes adapting probabilistic and approximate querying
methods to improve the scalability of query answering, as
well as information retrieval techniques such as relevance
ranking and keyword search. This class will explore the
recent efforts by researchers in these extremely important
and challenging fields. We will read and discuss latest
research literature gleaned from premier conferences in
databases and information retrieval. It is hoped that this
class will spur students to pursuing further research in
these areas.
The following is a tentative
list of topics which we will attempt to cover:
1. Probabilistic Methods in
Databases
Sampling Methods in
Databases: Basics
Approximate Query Processing
Processing of Fuzzy/Uncertain
Data
2. DB and IR integration
Ranking of Database Query
Results
Top-K algorithms
We will cover various topics
in breadth, understand the central contributions of these
efforts and try and predict future research directions.
Prerequisites Advanced
Algorithms and Database II are the prerequisite courses.
However, exceptions will be made on a case by case basis,
especially if the student has prior exposure or
demonstrates initiative to quickly learn these concepts on
his/her own.
Evaluation
The grade will be based on
the paper presentations, class attendance and
participation, performance in the projects, one midterm
exam and one final exam.
Presentation
|
25%
|
Midterm exam
|
30%
|
Final exam
|
30%
|
Final Project
|
15%
|
Presentations
The actual reading list,
consisting of recent research papers, will be selected and
finalized as the course progresses. Each student will
present one or more papers (depending on the enrollment)
during the semester. Students will participate in class
discussions during and after each presentation. Attendance
is required. The slides of the presentation must be
emailed to the TA at least one day before the
presentation. Failure to do so might result in deduction
of few points.
Final
Project
There will be a final project
which will involve implementation of the systems described
in the papers. You can work individually or form a team of at most two students.
Potential topics for projects will be announced after the
first midterm. The dates for the demo and other details will be
updated as the course progresses.
List of Projects
Tentative
Schedule (Presentation, Lectures and Exams)
Given
below is a tentative schedule for class presentations over
the course of this semester.
Date/Day
|
Presenter(s)
|
Paper
|
Aug 26 - Sep 09
|
Dr. Gautam Das
|
Lectures
|
Sep 14 Tuesday
|
Raghavendra Rao Madala
|
Ganti
V., Lee M. L., Ramakrishnan R. ICICLES: Self-tuning
Samples for Approximate Query Answering, VLDB, 2000. [PPT]
|
Sep 14 Tuesday
|
Amr Elkhatib
|
Acharya S., Gibbons P. B.,
Poosala V., Ramaswamy S. Join Synopses for Approximate
Query Answering, ACM SIGMOD 1999 [PPT] .
|
Sep 16
Thursday
|
Dr.Gautam Das
|
Optional Readings:
1. Chaudhuri S., Motwani R.,
Narasayya V. On Random Sampling Over Joins, ACM SIGMOD
1999.
2. Acharya S., Gibbons P. B.,
Poosala V. Congressional Samples for Approximate
Answering of Group-By Queries, ACM SIGMOD 2000.
|
Sep 21
Tuesday
|
Ashubenghan Shyamsin Raj
|
Mingxi Wu, Chris Jermaine.
A Bayesian Method for Guessing the Extreme Values in a
Data Set , VLDB 2007. [PPT]
Optional Readings :
1.
Guessing
the extreme values in a data set: a Bayesian method and
its applications, VLDBJ 18(2): 2009 (special issue:
Best Papers of VLDB 2007) - This is the journal
version of the paper and can be referred for additional
details.
2.
EM Algorithm : The
Expectation Maximization Algorithm: A short tutorial
or A
Gentle Tutorial of the EM Algorithm and its Application
to Parameter Estimation for Gaussian Mixture and Hidden
Markov Models
3.
Monte Carlo Methods : tutorial
|
Sep 21
Tuesday
|
Sayed Muchallil
|
Chaudhuri
S., Das G., Narasayya V. A Robust, Optimization- Based
Approach for Approximate Answering of Aggregation
Queries, ACM SIGMOD 2001. [PPT]
|
Sep 23
Thursday
|
Srikanth Vadada
|
Babcock
B., Chaudhuri C. and Das G. Dynamic Sample Selection
for Approximate Query Processing, ACM SIGMOD 2003. [PPT]
|
Sep 28 Tuesday
|
Sthuti Kripanidhi
|
P. J. Haas and J. M.
Hellerstein. Ripple Joins for Online Aggregation, ACM
SIGMOD 1999. [PPT]
|
Sep 30 Thursday
|
Harsh Singh
|
Arjun Dasgupta, Gautam
Das, Heikki Mannila. A Random Walk Approach to Sampling
Hidden Databases, ACM SIGMOD 2007 [PPT]
Optional Readings:
1.
Google's
Deep-Web Crawl , VLDB 2008
2.
http://www.deeppeep.org/
3.
Random sampling from a search engine's index WWW 2006.
|
Oct 05 Tuesday
|
Dr. Gautam Das
|
Review for Midterm exam
|
Oct 07 Thursday
|
-
|
MIDTERM
EXAM
|
Oct 12 - Oct 21
|
Dr. Gautam Das
|
Lectures
|
Oct 26 Tuesday
|
Lokesh Chikkakempanna
|
Jon
M. Kleinberg. Authoritative sources in a hyperlinked
environment, Journal of the ACM 46(1999). [PPT]
|
Oct 28 Thursday
|
Aishwarya Rengamannan
|
L.
Page, S. Brin, R. Motwani, T. Winograd. The PageRank
Citation Ranking: Bringing Order to the Web, 1998. [PPT]
|
Oct 28 Thursday
|
Sruthi Gungidi
|
Sanjay Agrawal,
Surajit Chaudhuri, Gautam Das. DBXplorer: A
System For Keyword-Based Search Over Relational
Databases, ICDE 2002.
[PPT]
|
Nov 2 Tuesday
|
Suvigya Jaiswal
|
Sanjay Agrawal, Surajit
Chaudhuri, Gautam Das, Aristides Gionis. Automated
Ranking of Database Query Results, CIDR 2003.
[PPT]
|
Nov 4 Thursday
|
Kiran Karnam
|
Surajit Chaudhuri, Gautam
Das, Vagelis Hristidis, Gerhard Weikum,
Probabilistic Ranking of Database Query Results, VLDB
2004.
[PPT]
|
Nov 9 Tuesday
|
Gaurav Dutta
|
Quang Hieu Vu, Beng Chin
Ooi, Dimitris Papadias, Anthony K. H. Tung . A Graph
Method for Keyword-based Selection of the top-K
Databases, VLDB 2008
[PPT]
Optional Readings :
Charuta Nakhe, Arvind
Hulgeri, Gaurav Bhalotia, Soumen Chakrabarti, S.
Sudarshan. Keyword Searching and Browsing in Databases
using BANKS, ICDE 2002.
|
Nov 9 Tuesday
|
Niharika Pamu
|
Ihab F. Ilyas, Walid G.
Aref, Ahmed K. Elmagarmid. Supporting top-k join
queries in relational databases, VLDB 2003.
[PPT]
|
Nov 16 Tuesday
|
Mahashweta Das
|
Ming Hua, Jian Pei, Wenjie
Zhang, Xuemin Lin. Ranking Queries on Uncertain Data: A
Probabilistic Threshold Approach, SIGMOD 2008.
[PPT1]
[PPT2]
Optional Readings :
1.
Probabilistic Databases : Tutorial
2. Christopher Re, Nilesh
Dalvi, Dan Suciu: Efficient Top-k Query Evaluation on
Probabilistic Data, ICDE 2007.
|
Nov 18 Thursday
|
Shankarkumar Kumar
|
Sihem Amer Yahia, Michael
Benedikt , Laks V. S. Lakshmanan , Julia Stoyanovich .
Efficient Network-Aware Search in Collaborative Tagging
Sites, VLDB 2008.
[PPT]
Optional Readings :
Presentation on Social tagging systems.
|
Nov 18 Thursday
|
Sandeep Chittal
|
Senjuti Basu Roy, Haidong
Wang, Ullas Nambiar, Gautam Das and Mukesh Mohania.
Minimum-Effort Driven Dynamic Faceted Search in
Structured Databases. CIKM 2008.
[PPT]
Optional Readings:
Senjuti Basu Roy, Haidong
Wang, Ullas Nambiar, Gautam Das, Mukesh Mohania.
DynaCet: Building Dynamic Faceted Search Systems over
Databases. ICDE 2009.
|
Nov 23 Tuesday
|
Yesha Gupta
|
Gautam Das, Dimitrios
Gunopulos, Nick Koudas, Dimitris Tsirogiannis.
Answering Top-k queries Using Views, VLDB 2006.
[PPT]
|
Nov 23 Tuesday
|
Anitha Royappan
|
Gautam Das, Dimitrios
Gunopulos, Nick Koudas, Nikos Sarkas. Ad-hoc Top-k
Query Answering for Data Streams, VLDB 2007.
[PPT]
Optional Readings:
Data Streams : tutorial 1 and tutorial 2.
|
Nov
25 Thursday
|
-
|
Thanksgiving
Holidays
|
Nov
30 Tuesday
|
Dr. Gautam Das
|
Review
for Final exam
|
Dec
2 Thursday
|
-
|
FINAL
EXAM
|
Dec
3 Friday
|
-
|
Project Demo
|
Announcements
Please check this section
regularly during the semester for updates and
announcements on the course
Ethics
statement is available here.
Please print, sign and submit it to the instructor during
class.
Group assignment for
presentation might change in future. Any changes will be
updated on the website.
Please send the TA your
presentation slides one day before the scheduled
presentation.
If you do not send you
presentation slides before the scheduled presentation,
you might loose some points.
|