About
the Course
Much
of the world's recorded data is locked up in structured
sources such as databases, which are often the propriety
information of private corporations and government
agencies. Searching and exploring for information within
databases is currently very cumbersome - often the data
explorer has to know comprehensive query languages (such
as SQL), as well as important information on how the data
is structured into different tables and columns (the
database schema). In recent years, researchers have
pondered on the problems of improving the search and
exploration capabilities for relational databases. This
includes adapting probabilistic and approximate querying
methods to improve the scalability of query answering, as
well as information retrieval techniques such as relevance
ranking and keyword search. With the rising popularity of Social networks, we will also look at techniques to effectively explore the data produced by them. This class will explore the
recent efforts by researchers in these extremely important
and challenging fields. We will read and discuss latest
research literature gleaned from premier conferences in
databases and information retrieval. It is hoped that this
class will spur students to pursuing further research in
these areas.
The following is a tentative
list of topics which we will attempt to cover:
1. Probabilistic Methods in
Databases
Sampling Methods in
Databases: Basics
Approximate Query Processing
Processing of Fuzzy/Uncertain
Data
2. DB and IR integration
Ranking of Database Query
Results
Top-K algorithms
3. Exploration of Social Media Data
Collaborative Filtering
Recommendations
We will cover various topics
in breadth, understand the central contributions of these
efforts and try and predict future research directions.
Prerequisites Advanced
Algorithms and Database II are the prerequisite courses.
However, exceptions will be made on a case by case basis,
especially if the student has prior exposure or
demonstrates initiative to quickly learn these concepts on
his/her own.
Evaluation
The grade will be based on
the paper presentations, class attendance and
participation, performance in the projects, one midterm
exam and one final exam.
Presentation
|
25%
|
Midterm exam
|
25%
|
Final exam
|
25%
|
Final Project
|
25%
|
Presentations
The actual reading list,
consisting of recent research papers, will be selected and
finalized as the course progresses. Each student will
present one or more papers (depending on the enrollment)
during the semester. Students will participate in class
discussions during and after each presentation. Attendance
is required. The slides of the presentation must be
emailed to the TA at least one day before the
presentation. Failure to do so might result in deduction
of few points.
Final
Project
There will be a final project
which will involve implementation of the systems described
in the papers. You can work individually or form a team of at most two students.
Potential topics for projects will be announced after the
first midterm. The projects will have to demoed in the
lab. The dates for the demo and other details will be
updated as the course progresses.
List of Sample Projects
Lecture Notes/Presentations
1. Sampling for Approximate Query Processing
2. Introduction to Social Network Analysis
3. Interesting Problems in Social Networks
Tentative
Schedule (Presentation, Lectures and Exams)
Given
below is a tentative schedule for class presentations over
the course of this semester.
Date/Day
|
Presenter(s)
|
Paper
|
Aug 26 - Sep 06
|
Dr. Gautam Das
|
Lectures
|
Sep 08 Thursday
|
Satyakanth Kagitala
|
Ganti
V., Lee M. L., Ramakrishnan R. ICICLES: Self-tuning
Samples for Approximate Query Answering, VLDB, 2000. PPT
|
Sep 08 Thursday
|
Anvitha Banakal Sadananda
|
Acharya S., Gibbons P. B.,
Poosala V., Ramaswamy S. Join Synopses for Approximate
Query Answering, ACM SIGMOD 1999. PPT
|
Sep 13
Tuesday
|
Harshit Shah
|
Chaudhuri
S., Das G., Narasayya V. A Robust, Optimization- Based
Approach for Approximate Answering of Aggregation
Queries, ACM SIGMOD 2001. PPT
|
Sep 13
Tuesday
|
Nickolas Bielik
|
Babcock
B., Chaudhuri C. and Das G. Dynamic Sample Selection
for Approximate Query Processing, ACM SIGMOD 2003. PPT
|
Sep 15
Thursday
|
-
|
P. J. Haas and J. M.
Hellerstein. Ripple Joins for Online Aggregation, ACM
SIGMOD 1999.
|
Sep 15
Thursday
|
Rohan Ramekar
|
Arjun Dasgupta, Gautam
Das, Heikki Mannila. A Random Walk Approach to Sampling
Hidden Databases, ACM SIGMOD 2007
PPT
|
Sep 20 Tuesday
|
Sarker Ahmed Rumee
|
Ziv Bar-Yossef, Maxim Gurevich. Random sampling from a search engine's index, WWW 2006. PPT
|
Sep 22 Thursday
|
Dr.Gautam Das
|
Review for Mid-Term
|
Sep 27 Tuesday
|
-
|
MIDTERM
EXAM
|
Sep 29 - Oct 6
|
Dr. Gautam Das
|
Lectures
|
Oct 11 Tuesday
|
Neha Matroja
|
Jon
M. Kleinberg. Authoritative sources in a hyperlinked
environment, Journal of the ACM 46(1999). PPT
|
Oct 11 Tuesday
|
-
|
L.
Page, S. Brin, R. Motwani, T. Winograd. The PageRank
Citation Ranking: Bringing Order to the Web, 1998.
|
Oct 13 Thursday
|
Ganesh Viswanathan
|
Sanjay Agrawal,
Surajit Chaudhuri, Gautam Das. DBXplorer: A
System For Keyword-Based Search Over Relational
Databases, ICDE 2002. PPT
|
Oct 13 Thursday
|
Upa Gupta
|
Sanjay Agrawal, Surajit
Chaudhuri, Gautam Das, Aristides Gionis. Automated
Ranking of Database Query Results, CIDR 2003. PPT
|
Oct 18 Tuesday
|
Shrikant Desai
|
Surajit Chaudhuri, Gautam
Das, Vagelis Hristidis, Gerhard Weikum,
Probabilistic Ranking of Database Query Results, VLDB
2004. PPT
|
Oct 18 Tuesday
|
Mainul Islam
|
Ihab F. Ilyas, Walid G.
Aref, Ahmed K. Elmagarmid. Supporting top-k join
queries in relational databases, VLDB 2003. PPT
|
Oct 20 Thursday
|
Nandish Jayaram
|
Christopher Re, Nilesh
Dalvi, Dan Suciu: Efficient Top-k Query Evaluation on
Probabilistic Data, ICDE 2007. PDF
|
Oct 25 Tuesday
|
Na Li
|
Gautam Das, Dimitrios
Gunopulos, Nick Koudas, Dimitris Tsirogiannis.
Answering Top-k queries Using Views, VLDB 2006. PPT
|
Oct 25 Tuesday
|
-
|
Senjuti Basu Roy, Haidong
Wang, Ullas Nambiar, Gautam Das and Mukesh Mohania.
Minimum-Effort Driven Dynamic Faceted Search in
Structured Databases. CIKM 2008.
|
Oct 27 Thursday
|
Dr.Gautam Das
|
Review for Mid-Term
|
Nov 01 Tuesday
|
-
|
MIDTERM
EXAM
|
Nov 3-10
|
Dr.Gautam Das
|
Lectures
Recommeded Readings :
M. E. J. Newman. Models of the Small World A Review , Journal of Statistical Physics Lada Adamic and Eytan Adar. How to search a social network, Social Networks, 27(3):187-203, July 2005.
Google News Personalization: Scalable Online Collaborative Filtering by Abhinandan Das, Mayur Datar, Ashutosh Garg and Shyam Rajaram in WWW 2007.
|
Nov 15 Tuesday
|
Sujoy Bhattacharya
|
D. Liben-Nowell, J. Kleinberg. The Link Prediction Problem for Social Networks. CIKM, 2003.
PPT
|
Nov 17 Thursday
|
Manisha Kundoor
|
Sihem Amer Yahia, Michael
Benedikt , Laks V. S. Lakshmanan , Julia Stoyanovich .
Efficient Network-Aware Search in Collaborative Tagging
Sites, VLDB 2008.
PPT
|
Nov 22 Tuesday
|
Ajith Ajjaranichandrappa
|
Mahashweta Das, Gautam Das, and Vagelis Hristidis: Leveraging Collaborative Tagging for Web Item Design. Full paper, in Proc. of ACM SIGKDD 2011
PPT
|
Nov 22 Tuesday
|
Shreya Mamadapur
|
Sihem Amer-Yahia, Senjuti Basu Roy, Ashish Chawla, Gautam Das, Cong Yu: Group Recommendation: Semantics and Efficiency. Full paper, in VLDB 2009.
PPT
|
Nov
24 Thursday
|
-
|
Thanksgiving
Holidays
|
Nov 29 Tuesday
|
Ismat Jahan
|
Senjuti Basu Roy, Sihem Amer-Yahia, Gautam Das and Cong Yu: Interactive Itinerary Planning, In Proc. of ICDE 2011 .
PPT
|
Dec 01 Thursday
|
Habibur Rahman
|
Arvind Narayanan and Vitaly Shmatikov : Robust De-anonymization of Large Sparse Datasets , IEEE Symposium on Security and Privacy 2008
|
Dec 6,8
|
-
|
Project Presentations
|
Announcements
Please check this section
regularly during the semester for updates and
announcements on the course
Ethics
statement is available here.
Please print, sign and submit it to the instructor during
class.
Group assignment for
presentation might change in future. Any changes will be
updated on the website.
Please send the TA your
presentation slides one day before the scheduled
presentation.
If you do not send you
presentation slides before the scheduled presentation,
you might loose some points.
|