About the Course
Much of the world’s recorded data is locked up in structured
sources such as databases, which are often the propriety information of
private corporations and government agencies. Searching and exploring for
information within databases is currently very cumbersome - often the
data explorer has to know comprehensive query languages (such as SQL), as
well as important information on how the data is structured into
different tables and columns (the database schema). In recent years,
researchers have pondered on the problems of improving the search and
exploration capabilities for relational databases. This includes adapting
probabilistic and approximate querying methods to improve the scalability
of query answering, as well as information retrieval techniques such as
relevance ranking and keyword search. This class will explore the recent
efforts by researchers in these extremely important and challenging
fields. We will read and discuss latest research literature gleaned from
premier conferences in databases and information retrieval. It is hoped
that this class will spur students to pursuing further research in these
areas.
The following is a tentative list of topics which we will attempt
to cover:
1. Probabilistic
Methods in Databases
Sampling Methods in Databases: Basics
Approximate Query Processing
Processing of Fuzzy/Uncertain Data
2. Unstructured Search
in Databases Keyword Queries in Databases
Ranking of Database Query Results
3. DB and IR
integration
Top-K algorithms
We will cover various topics in breadth, understand the central
contributions of these efforts and try and predict future research
directions.
Prerequisites
Advanced Algorithms and Database II are the prerequisite courses.
However, exceptions will be made on a case by case basis, especially if the
student has prior exposure or demonstrates initiative to quickly learn
these concepts on his/her own.
Presentations
The actual reading list, consisting of recent research papers, will
be selected and finalized as the course progresses. Each student will
present one or more papers (depending on the enrollment) during the
semester. Students will participate in class discussions during and after
each presentation. Attendance is required.
Project
In addition to reading papers and presenting it in class, students
will be attempting a project during the
semester. The project will be either a programming project or a
research project. The projects will involve developing portions of information
retrieval systems for structured databases based on the techniques
suggested in the papers. The projects will also be tested out using real
data that the students should get access to. A long-term objective is
that the more promising projects will serve as infrastructure/test-beds
for students to continue with their research in these areas beyond the
course.
Evaluation
The grade will be based on the paper presentations, class
attendance and participation, performance in the projects, one midterm
exam and one final exam.
Midterm exam |
30% |
Final
exam |
25% |
Presentation |
20% |
Project |
20% |
Class
participation |
05% |
Tentative Schedule
(Presentation, Lectures and Exams)
Each paper will be presented by a group of two/three students based
on the enrollment. Each group will present two papers. The
papers will be presented in two cycles, the first one before the mid term
exam and the second one after the midterm exam. Given below is a tentative schedule for class presentations over
the course of this semester.
Date/Day |
Presenter(s) |
Paper |
Aug 26
- Sep 11 |
Dr.
Gautam Das |
Lectures |
Sep 16
Tuesday |
Harikrishnan
Karunakaran
Sulabha Balan |
Ganti V., Lee M. L., Ramakrishnan R. ICICLES: Self-tuning Samples for
Approximate Query Answering, VLDB, 2000.
ppt slides |
Sep 16 Tuesday |
Deep Pancholi
Vinit Asher |
Chaudhuri S., Das G., Datar M., Motwani R., Narasayya
V. Overcoming Limitations of Sampling for Aggregation Queries, ICDE 2001.
ppt slides |
Sep 18
Thursday |
Saranya Gottipati
Jeevan Gogineni |
Acharya S., Gibbons P. B., Poosala V., Ramaswamy S. Join Synopses for
Approximate Query Answering, ACM SIGMOD 1999.
ppt slides |
Sep 18
Thursday |
Anusha Reddy Rachapalli Muni
Agasthya Padisala |
Acharya S., Gibbons P. B., Poosala V. Congressional Samples for
Approximate Answering of Group-By Queries, ACM SIGMOD 2000.
ppt slides |
Sep 23 Tuesday |
Venkata Jammula
Vivek Tanneeru |
Chaudhuri S., Das G., Narasayya V. A Robust,
Optimization- Based Approach for Approximate Answering of Aggregation
Queries,
ACM SIGMOD 2001.
ppt slides |
Sep 25
Thursday |
Chandrashekar Vijayarenu
Anirban Maiti |
Babcock B., Chaudhuri C. and Das G. Dynamic
Sample Selection for Approximate Query Processing, ACM SIGMOD 2003.
ppt slides |
Sep 25
Thursday |
Calvin
Noronha
Deepak Anand |
Hellerstein J., Haas P., Wang H. Online Aggregation,
ACM SIGMOD 1997.
ppt slides |
Sep 30 Tuesday |
Ronda
Hilton |
P. J. Haas and J. M. Hellerstein. Ripple Joins for Online Aggregation,
ACM SIGMOD 1999.
ppt slides |
Sep 30 Tuesday |
Charanmai Koorapati Ramesh
Harika Guniganti |
Kaushik Chakrabarti, Minos Garofalakis, Rajeev Rastogi,
Kyuseok Shim. Approximate Query Processing
Using Wavelets, VLDB 2000.
ppt slides |
Oct 02
Thursday |
Saranya Gottipati
Jeevan Gogineni |
Chaudhuri S., Motwani R., Narasayya V. On Random Sampling Over Joins, ACM SIGMOD 1999. ppt slides |
Oct 07 Tuesday |
Dr. Gautam Das |
Review for Midterm exam |
Oct 09 Thursday |
- |
MIDTERM EXAM |
Oct 14,
16, 21, 23 |
Dr.
Gautam Das |
Lectures |
Oct 28 Tuesday |
Vikrant Khosla
Sridhar Kameswara Nemani |
Jon M. Kleinberg.
Authoritative sources in a hyperlinked environment, Journal of
the ACM 46(1999).
ppt slides |
Oct 30 Thursday |
Ashish Chawla
Vinit Asher |
L. Page, S. Brin, R. Motwani, T. Winograd. The
PageRank Citation Ranking: Bringing Order to the Web, 1998.
ppt slides |
Nov 04
Tuesday |
Harikrishnan
Karunakaran
Sulabha Balan |
Charuta
Nakhe, Arvind Hulgeri, Gaurav Bhalotia, Soumen Chakrabarti, S.
Sudarshan. Keyword Searching and Browsing in Databases using BANKS,
ICDE 2002.
ppt slides |
Nov 06
Thursday |
Deep Pancholi |
Sanjay
Agrawal, Surajit Chaudhuri, Gautam Das. DBXplorer: A System
For Keyword-Based Search Over Relational Databases, ICDE 2002.
ppt slides |
Nov 11
Tuesday |
Mahadevkirthi Mahadevraj
Sameer Gupta |
Sanjay
Agrawal, Surajit Chaudhuri, Gautam Das, Aristides Gionis. Automated
Ranking of Database Query Results, CIDR 2003.
ppt slides |
Nov 13
Thursday |
Raghunath Ravi
Sivaramakrishnan Subramani |
Surajit
Chaudhuri, Gautam Das, Vagelis Hristidis, Gerhard Weikum,
Probabilistic Ranking of Database Query Results, VLDB 2004.
ppt slides |
Nov 18
Tuesday |
Chandrashekar Vijayarenu
Anirban Maiti |
Christopher Re,
Nilesh Dalvi,
Dan Suciu:
Efficient Top-k Query Evaluation on Probabilistic Data, ICDE 2007.
ppt slides |
Nov 18
Tuesday |
Calvin
Noronha
Deepak Anand |
Ihab F.
Ilyas, Walid G. Aref, Ahmed K. Elmagarmid. Supporting top-k join
queries in relational databases, VLDB 2003.
ppt slides |
Nov 20
Thursday |
Rashmi Pagadala
Swetta Bhaskar |
Zhen Zhang,
Seung-won Hwang,
Kevin
Chen-Chuan Chang,
Min Wang,
Christian A.
Lang,
Yuan-Chi Chang: Boolean
+ ranking: querying a database by k-constrained optimization, ACM
SIGMOD 2006.
ppt slides |
Nov 25
Tuesday |
Lipsa Patel
Kushal Shah |
Gautam
Das, Dimitrios Gunopulos, Nick Koudas, Dimitris Tsirogiannis.
Answering Top-k queries Using Views, VLDB 2006.
ppt slides |
Nov
27-30 |
- |
Thanksgiving Holidays |
Dec 01
Monday |
Students |
Project Presentation at DBXLAB (GS 237) |
Dec 02
Tuesday |
Students |
Project Presentation at DBXLAB (GS 237) |
Dec 02
Tuesday |
Dr.
Gautam Das |
Review for Final exam |
Dec
04 Thursday |
- |
FINAL EXAM |
|
|
|
Project
Presentation
Schedule
NEW
Project presentations will be held in Room# 237 of
GeoScience Building (TA's office). In addition to presentation,
students are required to submit the code and 1-2 page user manual by
midnight of Dec 02. The schedule is as follows:
Group No. |
Student Name |
Project Title |
Presentation Date |
Presentation Time |
1 |
Mahadevkirthi Mahadevraj
Sameer Gupta |
Outlier Indexing Method for Approximate Query
Answering |
Dec 01
Monday |
10:30 - 11:00 |
2 |
Deepak Anand
Calvin Richard Noronha |
Rank Join algorithm |
Dec 01
Monday |
11:00 - 11:30 |
3 |
Vikrant Khosla
Sridhar Kameswara Nemani |
DBXplorer: A System For Keyword-Based Search
Over Relational Databases |
Dec 01
Monday |
11:30 - 12:00 |
4 |
Anusha Reddy Rachapalli Muni
Agasthya Padisala |
Congressional Samples for Approximate Answering of Group-By Queries |
Dec 01
Monday |
12:30 - 01:00 |
5 |
Charanmai Koorapati Ramesh
Harika Guniganti |
ICICLES: Self-tuning Samples for Approximate
Query Answering |
Dec 01
Monday |
02:30 - 03:00 |
6 |
Rashmi Pagadala
Swetha Bhaskar |
ICICLES: Self-tuning Samples for Approximate
Query Answering |
Dec 01
Monday |
03:00 - 03:30 |
7 |
Vivek
Tanneeru
Venkata Jammula |
Overcoming Limitations of Sampling for
Aggregation Queries |
Dec 01
Monday |
03:30 - 04:00 |
8 |
Jeevan Gogineni
Saranya Gottipati |
Join Synopses |
Dec 01
Monday |
04:00 - 04:30 |
9 |
Deep
Pancholi
Vinit
Asher |
DBXplorer: A System For Keyword-Based Search
Over Relational Databases |
Dec 02
Tuesday |
11:00 - 11:30 |
10 |
Chandrashekar Vijayarenu
Anirban Maiti |
Efficient Top-k Query Evaluation on
Probabilistic Data |
Dec 02
Tuesday |
11:30 - 12:00 |
11 |
Ronda
Hilton |
Top-K Queries Using Views |
Dec 02
Tuesday |
12:00 - 12:30 |
12 |
Harikrishnan Karunakaran
Sulabha Balan |
Online Aggregation |
Dec 02
Tuesday |
02:30 - 03:00 |
13 |
Sivaramakrishnan Subramani
Raghunath Ravi |
Fagin's Threshold Algorithm |
Dec 02
Tuesday |
03:00 - 03:30 |
14 |
Lipsa
Patel
Kushal Shah |
Implementing Threshold Algorithm for Top-k Query
Processing |
Dec 02
Tuesday |
03:30 - 04:00 |
15 |
Ashish Chawla |
Research Project - topic????? |
Dec 02
Tuesday |
04:00 - 04:30 |
Announcements
- Please check
this section regularly during the semester for updates and
announcements on the course
- Ethics statement
is available
here.
Please print, sign and submit it to the instructor during class.
- Please let the TA
know the names of your group members by Sep 09, 2008.
- Group assignment
for presentation might change in future. Any changes will be updated on
the website.
- Please send the
TA your presentation slides before the scheduled presentation.
- If you do not
send you presentation slides before (or at least the same day) the
scheduled presentation, you might loose some points.
- All project
presentations will be held on Dec 1 and 2 at TA's office. Specific slots
will be announced later. You must inform the TA about your team members,
specific project topic and desired presentation date and time (if any) by
Nov 20. Make sure you finalize your project topic with TA before you
start working on implementing your project as each group should work on
different topic.
- Project presentation
schedule has been put up (see above).
NEW
- Please submit the
codes and user manual in a single zip file. The zip file name should be
as the names of the group members. NEW
|