About the Course
Much of the world’s recorded data is locked up in structured
sources such as databases, which are often the propriety information of
private corporations and government agencies. Searching and exploring for
information within databases is currently very cumbersome - often the
data explorer has to know comprehensive query languages (such as SQL), as
well as important information on how the data is structured into
different tables and columns (the database schema). In recent years,
researchers have pondered on the problems of improving the search and
exploration capabilities for relational databases. This includes adapting
probabilistic and approximate querying methods to improve the scalability
of query answering, as well as information retrieval techniques such as
relevance ranking and keyword search. This class will explore the recent
efforts by researchers in these extremely important and challenging
fields. We will read and discuss latest research literature gleaned from
premier conferences in databases and information retrieval. It is hoped
that this class will spur students to pursuing further research in these
areas.
The following is a tentative list of topics which we will attempt
to cover:
1. Probabilistic
Methods in Databases
Sampling Methods in Databases: Basics
Approximate Query Processing
Processing of Fuzzy/Uncertain Data
2. Unstructured Search
in DatabasesKeyword Queries in Databases
Ranking of Database Query Results
3. DB and IR
integration
Top-K algorithms
We will cover various topics in breadth, understand the central
contributions of these efforts and try and predict future research
directions.
Prerequisites
Advanced Algorithms and Database II are the prerequisite courses.
However, exceptions will be made on a case by case basis, especially if the
student has prior exposure or demonstrates initiative to quickly learn
these concepts on his/her own.
Presentations
The actual reading list, consisting of recent research papers, will
be selected and finalized as the course progresses. Each student will
present one or more papers (depending on the enrollment) during the
semester. Students will participate in class discussions during and after
each presentation. Attendance is required.
Project
In addition to reading papers and presenting it in class, students
will have the option of attempting a programming project during the
semester. The projects will involve developing portions of information
retrieval systems for structured databases based on the techniques
suggested in the papers. The projects will also be tested out using real
data that the students should get access to. A long-term objective is
that the more promising projects will serve as infrastructure/test-beds
for students to continue with their research in these areas beyond the
course.
Evaluation
The grade will be based on the paper presentations, class
attendance and participation, performance in the projects, and possibly
1-2 take-home examinations.
Schedule
Given below is a tentative schedule for class presentations over
the course of this semester.
Announcements
- Please check
this section regularly during the semester for updates and
announcements on the course
- Ethics statement
is available here.
Please print, sign and submit it to the instructor during class.
- Presentation
allotments have been put up!
|