About the Course
Much of the world’s recorded data is locked
up in structured sources such as databases, which are often the
propriety information of private corporations and government
agencies. Searching and exploring for information within
databases is currently very cumbersome - often the data explorer
has to know comprehensive query languages (such as SQL), as well
as important information on how the data is structured into
different tables and columns (the database schema). In recent
years, researchers have pondered on the problems of improving
the search and exploration capabilities for relational
databases. This includes adapting probabilistic and approximate
querying methods to improve the scalability of query answering,
as well as information retrieval techniques such as relevance
ranking and keyword search. This class will explore the recent
efforts by researchers in these extremely important and
challenging fields. We will read and discuss latest research
literature gleaned from premier conferences in databases and
information retrieval. It is hoped that this class will spur
students to pursuing further research in these areas.
The following is a tentative list of topics
which we will attempt to cover:
1. Probabilistic
Methods in Databases
Sampling Methods in Databases: Basics
Approximate Query Processing
Processing of Fuzzy/Uncertain Data
2. Unstructured
Search in DatabasesKeyword Queries in Databases
Ranking of Database Query Results
3. DB and IR
integration
Top-K algorithms
We will cover various topics in breadth,
understand the central contributions of these efforts and try
and predict future research directions.
Prerequisites
Advanced Algorithms and Database II are the
prerequisite courses. However, exceptions will be made on a case
by case basis, especially if the student has prior exposure or
demonstrates initiative to quickly learn these concepts on
his/her own.
Presentations
The actual reading list, consisting of
recent research papers, will be selected and finalized by the
first week of classes. Each student will present one or more
papers (depending on the enrollment) during the semester.
Students will participate in class discussions during and after
each presentation. Attendance is required.
Project
Additionally to reading papers, students
will have the option of attempting a programming project during
the semester. The projects will involve developing portions of
information retrieval systems for structured databases based on
the techniques suggested in the papers. The projects will also
be tested out using real data that the students should get
access to. A long-term objective is that the more promising
projects will serve as infrastructure/test-beds for students to
continue with their research in these areas beyond the course.
Evaluation
The grade will be based on the paper
presentations, class attendance and participation, and
performance in the projects.
Schedule
Day & Date |
Presented By |
Topics/Papers |
Tuesday
Jan 17 |
|
- Introduction to
Approximate Query Processing, Information Retrieval,
Sampling
|
Thursday
Jan 19 |
|
|
Tuesday
Jan 24 |
|
|
Thursday
Jan 26 |
|
|
Tuesday
Jan 31 |
|
- Data Exploration and Analysis in Relational
Databases
ppt slides
|
Thursday
Feb 02 |
|
- Central Limit Theorem, Stratified Sampling,
Acquiring Random Sample
ppt slides
|
Tuesday
Feb
07 |
|
- Ganti V., Lee M. L., Ramakrishnan R. ICICLES: Self-tuning
Samples for Approximate Query Answering. Proc. of VLDB, 2000
ppt slides
|
Tuesday
Feb 07 |
|
- Chaudhuri S., Das G., Datar M., Motwani R., Narasayya V.
Overcoming Limitations of Sampling for Aggregation Queries.
Proc. of IEEE Conf. on Data Engineering, 2001
ppt slides
|
Tuesday
Feb 09 |
|
- P. B. Gibbons and Y. Matias. New
Sampling-Based Summary Statistics for Improving Approximate
Query Answers. ACM SIGMOD 1998
ppt slides
|
Thursday
Feb 09 |
|
- Acharya S., Gibbons P. B., Poosala V., Ramaswamy S. Join
Synopses for Approximate Query Answering. Proc. of ACM
SIGMOD, 1999
ppt slides
|
Thursday
Feb 09 |
|
- Chaudhuri S., Motwani R.,
Narasayya V. Random Sampling Over Joins. Proc. of ACM
SIGMOD, 1999
ppt slides
|
Thursday
Feb 14 |
|
- P. B. Gibbons, Y. Matias, and V. Poosala. Fast Incremental
Maintenance of Approximate Histograms. VLDB 1997
ppt slides
|
Tuesday
Feb 14 |
|
- Chaudhuri S., Das G., Narasayya V. A Robust, Optimization-
Based Approach for Approximate Answering of Aggregation
Queries. Proc. of ACM SIGMOD, 2001
ppt slides
|
Tuesday
Feb 14 |
|
- Babcock B., Chaudhuri C. and Das G. Dynamic Sample
Selection for Approximate Query Processing. SIGMOD
2003: 539-550
ppt slides
|
Tuesday
Feb 14 |
|
- Acharya S., Gibbons P. B., Poosala V. Congressional
Samples for Approximate Answering of Group-By Queries. Proc.
of ACM SIGMOD, 2000
ppt slides
|
Thursday
Feb 16 |
|
- Jermaine C. Robust Estimation With Sampling and
Approximate Pre-Aggregation. VLDB 2003: 886-897
ppt slides
|
Thursday
Feb 16 |
|
- P. J. Haas and J. M. Hellerstein. Ripple Joins for Online
Aggregation. ACM SIGMOD 1999
ppt slides
|
Thursday
Feb 16 |
|
- Hellerstein J., Haas P., Wang H. Online Aggregation. Proc.
of ACM SIGMOD, 1997
ppt slides
|
Tuesday
Feb 21 |
|
- Wavelets and Ranking of
Database Query Results
ppt slides
|
Thursday
Feb 23 |
|
|
Tuesday
Feb 28 |
|
- FA, TA, Query Specific
Ranking, and Adding Selection
ppt slides
|
Thursday
Mar 02 |
|
- Ron
Fagin: Combining fuzzy information, an overview
ppt slides
|
Thursday
Mar 02 |
-
Arjun
Dasgupta and Sushruth
Puttaswamy
|
- Ronald
Fagin, Amnon Lotem, Moni Naor: Optimal Aggregation
Algorithms for Middleware. PODS 2001
ppt slides
|
Tuesday
Mar 07 |
|
|
Thursday
Mar 09 |
-
Muhammed Miah
and Archana
Vijayalakshmanan
|
- Amélie
Marian, Nicolas Bruno, Luis Gravano: Evaluating top-k
queries over web-accessible databases.
ACM Trans. Database Syst. 29(2): 319-362 (2004)
ppt slides
|
Thursday
Mar 09 |
|
- Vagelis
Hristidis, Nick Koudas, Yannis Papakonstantinou: PREFER:
A System for the Efficient Execution of Multi-parametric
Ranked Queries. SIGMOD, 2001
ppt slides
|
Thursday
Mar 09 |
|
- Ihab F.
Ilyas, Walid G. Aref, Ahmed K. Elmagarmid: Supporting
top-k join queries in relational databases. VLDB J.
13(3): 207-221 (2004)
ppt slides
|
Tuesday
Mar 21 |
|
-
Chengkai Li, Mohamed A. Soliman, Kevin Chen-Chuan Chang,
Ihab F. Ilyas: RankSQL: Supporting Ranking Queries in
Relational Database Management Systems. VLDB 2005:
1342-1345
ppt slides
|
Tuesday
Mar 21 |
|
|
Thursday
Mar 23 |
|
- Ranking in Information
Retrieval Systems
ppt slides
|
Tuesday
Mar 27 |
|
|
Thursday
Mar 29 |
|
|
Tuesday
Apr 04 |
|
-
Modern Information Retrieval: A Brief Overview
ppt slides
|
Tuesday
Apr 04 |
|
-
DBExplorer: A System For Keyword Based Search Over
Relational Databases
ppt slides
|
Tuesday
Apr 04 |
|
- Keyword
Searching and Browsing in Databases using BANKS
ppt slides
|
Thursday
Apr 06 |
|
- L.
Page, S. Brin, R. Motwani, T. Winograd. The PageRank
Citation Ranking: Bringing Order to the Web
ppt slides
|
Thursday
Apr 06 |
|
- J.
Kleinberg. Authoritative sources in a hyperlinked
environment. Journal of the ACM 46(1999)
ppt slides
|
Tuesday
Apr 11 |
|
- Sanjay
Agrawal, Surajit Chaudhuri, Gautam Das, Aristides Gionis:
Automated Ranking of Database Query Results. CIDR 2003
ppt slides
|
Tuesday
Apr 11 |
|
- Surajit
Chaudhuri, Gautam Das, Vagelis Hristidis, Gerhard Weikum:
Probabilistic Ranking of Database Query Results. VLDB
2004
ppt slides
|
Tuesday
Apr 11 |
|
-
Answering Imprecise Queries over Autonomous Web
Databases. U. Nambiar and S. Kambhampati. 22nd
International Conference on Data Engineering (ICDE)
ppt slides
|
Thursday
Apr 13 |
|
- Martin
Theobald, Gerhard Weikum, Ralf Schenkel: Top-k Query
Evaluation with Probabilistic Guarantees. VLDB 2004:
648-659
ppt slides
|
Tuesday
Apr 18 |
|
- Kaushik
Chakrabarti, Surajit Chaudhuri, Seung-won Hwang:
Automatic Categorization of Query Results. SIGMOD
Conference 2004: 755-766
ppt slides
|
Tuesday
Apr 18 |
|
- Page
Quality: In Search of an Unbiased Web Ranking Junghoo
Cho, Sourashis Roy, Robert Adams (UCLA) SIGMOD 2005
ppt slides
|
Thursday
Apr 20 |
|
- Data Exploration and
Analysis in Relational Databases
ppt slides
|
Tuesday
Apr 25 |
|
|
Tuesday
Apr 25 |
|
-
Counting at Large: Efficient Cardinality Estimation in
Internet-Scale Data Networks - Nikos Ntarmos, Peter
Triantafillou, Gerhard Weikum
ppt slides
|
Tuesday
Apr 25 |
|
- Benjamin Arai, Gautam Das, Dimitrios Gunopulos, Vana
Kalogeraki: Approximating Aggregation Queries in
Peer-to-Peer Networks. ICDE 2006
|
Thursday
Apr 27 |
|
- Information Retrieval
Techniques for Peer-to-Peer Networks Demetrios
Zeinalipour-Yazti, University of California, Riverside
Vana Kalogeraki, University of California, Riverside
Dimitrios Gunopulos, University of California, Riverside
ppt slides
|
Thursday
Apr 27 |
|
- KLEE: A Framework for
Distributed Top-k Query Algorithms
ppt slides
|
Thursday
Apr 27 |
|
- P2P Content Search: Give the
web back to the people
ppt slides
|
Tuesday
May 02 |
|
|
Tuesday
May 02 |
|
- Gossip-Based Computation of
Aggregate Information: David Kempe, Alin Dobra, Johannes
Gehrke
ppt slides
|
Tuesday
May 02 |
|
- Samuel Madden, Michael J.
Franklin, Joseph M. Hellerstein, Wei Hong: The Design of
an Acquisitional Query Processor For Sensor Networks
ppt slides
|
Thursday
May 04 |
|
- D. Zeinalipour-Yazti, Z.
Vagena, D. Gunopulos, V. Kalogeraki, V. Tsotras, M.
Vlachos, N. Koudas, D. Srivastava "The Threshold Join
Algorithm for Top-k Queries in Distributed Sensor
Networks", Proceedings of the 2nd international workshop
on Data management for sensor networks DMSN (VLDB'2005),
Trondheim, Norway, ACM International Conference
Proceeding Series; Vol. 96, Pages: 61-66, 2005
ppt slides
|
|
|
|
- Garofalakis M. N. and Gibbons P. B. Approximate Query
Processing: Taming the TeraBytes. VLDB 2001
-- This one is a Survey - Chakrabarti K., Garofalakis M., Rastogi R., Shim K.
Approximate Query Processing Using Wavelets. Proc. of VLDB
2000
|