Research

Publications

Professional

Teaching

Personal

Home

 

 

 

 

CSE 6339 SEC 001

DATA EXPLORATION AND ANALYSIS IN

RELATIONAL DATABASES

 

Fall 2010


Class Time and Place: Tue/Thu 12:30-1:50 pm (NH 110)


Instructor: Dr. Gautam Das

Office: Nedderman Hall, Room# 302
Phone: 817 272 7595
Email: gdas AT uta DOT edu

Office Hours:   Tue/Wed 2:30-3:30 PM and by appointment


Teaching Assistant: Saravanan Thirumuruganathan

Office: GeoScience Building, Room# 237
Email: saravanan DOT thirumuruganathan AT mavs DOT uta DOT edu

Office Hours:  Mon/Tue 11:00-12:00 PM and by appointment


About the Course

Much of the world's recorded data is locked up in structured sources such as databases, which are often the propriety information of private corporations and government agencies. Searching and exploring for information within databases is currently very cumbersome - often the data explorer has to know comprehensive query languages (such as SQL), as well as important information on how the data is structured into different tables and columns (the database schema). In recent years, researchers have pondered on the problems of improving the search and exploration capabilities for relational databases. This includes adapting probabilistic and approximate querying methods to improve the scalability of query answering, as well as information retrieval techniques such as relevance ranking and keyword search. This class will explore the recent efforts by researchers in these extremely important and challenging fields. We will read and discuss latest research literature gleaned from premier conferences in databases and information retrieval. It is hoped that this class will spur students to pursuing further research in these areas.

The following is a tentative list of topics which we will attempt to cover:

1. Probabilistic Methods in Databases

            Sampling Methods in Databases: Basics

            Approximate Query Processing

            Processing of Fuzzy/Uncertain Data

2. DB and IR integration

            Ranking of Database Query Results

            Top-K algorithms

We will cover various topics in breadth, understand the central contributions of these efforts and try and predict future research directions.


Prerequisites
Advanced Algorithms and Database II are the prerequisite courses. However, exceptions will be made on a case by case basis, especially if the student has prior exposure or demonstrates initiative to quickly learn these concepts on his/her own.


Evaluation

The grade will be based on the paper presentations, class attendance and participation, performance in the projects, one midterm exam and one final exam.

Presentation

25%

Midterm exam

30%

Final exam

30%

Final Project

15%

 

Presentations

The actual reading list, consisting of recent research papers, will be selected and finalized as the course progresses. Each student will present one or more papers (depending on the enrollment) during the semester. Students will participate in class discussions during and after each presentation. Attendance is required. The slides of the presentation must be emailed to the TA at least one day before the presentation. Failure to do so might result in deduction of few points.

Final Project

There will be a final project which will involve implementation of the systems described in the papers. You can work individually or form a team of at most two students. Potential topics for projects will be announced after the first midterm. The dates for the demo and other details will be updated as the course progresses.

List of Projects

Tentative Schedule (Presentation, Lectures and Exams)

Given below is a tentative schedule for class presentations over the course of this semester.

 

Date/Day

Presenter(s)

Paper

Aug 26 - Sep 09

Dr. Gautam Das

Lectures

Sep 14 Tuesday

Raghavendra Rao Madala

Ganti V., Lee M. L., Ramakrishnan R. ICICLES: Self-tuning Samples for Approximate Query Answering, VLDB, 2000. [PPT]

Sep 14 Tuesday

Amr Elkhatib

Acharya S., Gibbons P. B., Poosala V., Ramaswamy S. Join Synopses for Approximate Query Answering, ACM SIGMOD 1999 [PPT] .

Sep 16  Thursday

Dr.Gautam Das

Optional Readings:

1. Chaudhuri S., Motwani R., Narasayya V. On Random Sampling Over Joins, ACM SIGMOD 1999.
2. Acharya S., Gibbons P. B., Poosala V. Congressional Samples for Approximate Answering of Group-By Queries, ACM SIGMOD 2000.

Sep 21  Tuesday

Ashubenghan Shyamsin Raj

Mingxi Wu, Chris Jermaine. A Bayesian Method for Guessing the Extreme Values in a Data Set , VLDB 2007. [PPT]

Optional Readings :

1. Guessing the extreme values in a data set: a Bayesian method and its applications, VLDBJ 18(2): 2009 (special issue: Best Papers of VLDB 2007) - This is the journal version of the paper and can be referred for additional details.

2. EM Algorithm : The Expectation Maximization Algorithm: A short tutorial or A Gentle Tutorial of the EM Algorithm and its Application to Parameter Estimation for Gaussian Mixture and Hidden Markov Models

3. Monte Carlo Methods : tutorial

Sep 21  Tuesday

Sayed Muchallil

Chaudhuri S., Das G., Narasayya V. A Robust, Optimization- Based Approach for Approximate Answering of Aggregation Queries, ACM SIGMOD 2001. [PPT]

Sep 23  Thursday

Srikanth Vadada

Babcock B., Chaudhuri C. and Das G. Dynamic Sample Selection for Approximate Query Processing, ACM SIGMOD 2003. [PPT]

Sep 28 Tuesday

Sthuti Kripanidhi

P. J. Haas and J. M. Hellerstein. Ripple Joins for Online Aggregation, ACM SIGMOD 1999. [PPT]

Sep 30 Thursday

Harsh Singh

Arjun Dasgupta, Gautam Das, Heikki Mannila. A Random Walk Approach to Sampling Hidden Databases, ACM SIGMOD 2007 [PPT]

Optional Readings:

1. Google's Deep-Web Crawl , VLDB 2008

2. http://www.deeppeep.org/

3. Random sampling from a search engine's index WWW 2006.

Oct 05 Tuesday

Dr. Gautam Das

Review for Midterm exam

Oct 07 Thursday

-

MIDTERM EXAM

Oct 12 - Oct 21

Dr. Gautam Das

Lectures

Oct 26 Tuesday

Lokesh Chikkakempanna

Jon M. Kleinberg. Authoritative sources in a hyperlinked environment, Journal of the  ACM 46(1999). [PPT]

Oct 28 Thursday

Aishwarya Rengamannan

L. Page, S. Brin, R. Motwani, T. Winograd. The PageRank Citation Ranking: Bringing Order to the Web, 1998. [PPT]

Oct 28 Thursday

Sruthi Gungidi

Sanjay Agrawal, Surajit  Chaudhuri, Gautam Das. DBXplorer: A System For Keyword-Based Search Over Relational Databases, ICDE 2002. [PPT]

Nov 2 Tuesday

Suvigya Jaiswal

Sanjay Agrawal, Surajit Chaudhuri, Gautam Das, Aristides Gionis. Automated Ranking of Database Query Results, CIDR 2003. [PPT]

Nov 4 Thursday

Kiran Karnam

Surajit Chaudhuri, Gautam Das, Vagelis Hristidis, Gerhard Weikum, Probabilistic Ranking of Database Query Results, VLDB 2004. [PPT]

Nov 9 Tuesday

Gaurav Dutta

Quang Hieu Vu, Beng Chin Ooi, Dimitris Papadias, Anthony K. H. Tung . A Graph Method for Keyword-based Selection of the top-K Databases, VLDB 2008 [PPT]

Optional Readings :

Charuta Nakhe, Arvind Hulgeri, Gaurav Bhalotia, Soumen Chakrabarti, S. Sudarshan. Keyword Searching and Browsing in Databases using BANKS, ICDE 2002.

Nov 9 Tuesday

Niharika Pamu

Ihab F. Ilyas, Walid G. Aref, Ahmed K. Elmagarmid. Supporting top-k join queries in relational databases, VLDB 2003. [PPT]

Nov 16 Tuesday

Mahashweta Das

Ming Hua, Jian Pei, Wenjie Zhang, Xuemin Lin. Ranking Queries on Uncertain Data: A Probabilistic Threshold Approach, SIGMOD 2008.

[PPT1] [PPT2]

Optional Readings :

1. Probabilistic Databases : Tutorial

2. Christopher Re, Nilesh Dalvi, Dan Suciu: Efficient Top-k Query Evaluation on Probabilistic Data, ICDE 2007.

Nov 18 Thursday

Shankarkumar Kumar

Sihem Amer Yahia, Michael Benedikt , Laks V. S. Lakshmanan , Julia Stoyanovich . Efficient Network-Aware Search in Collaborative Tagging Sites, VLDB 2008. [PPT]

Optional Readings :
Presentation on Social tagging systems.

Nov 18 Thursday

Sandeep Chittal

Senjuti Basu Roy, Haidong Wang, Ullas Nambiar, Gautam Das and Mukesh Mohania. Minimum-Effort Driven Dynamic Faceted Search in Structured Databases. CIKM 2008. [PPT]

Optional Readings:

Senjuti Basu Roy, Haidong Wang, Ullas Nambiar, Gautam Das, Mukesh Mohania. DynaCet: Building Dynamic Faceted Search Systems over Databases. ICDE 2009.

Nov 23 Tuesday

Yesha Gupta

Gautam Das, Dimitrios Gunopulos, Nick Koudas, Dimitris Tsirogiannis. Answering Top-k queries Using Views, VLDB 2006. [PPT]

Nov 23 Tuesday

Anitha Royappan

Gautam Das, Dimitrios Gunopulos, Nick Koudas, Nikos Sarkas. Ad-hoc Top-k Query Answering for Data Streams, VLDB 2007. [PPT]

Optional Readings:

Data Streams : tutorial 1 and tutorial 2.

Nov 25 Thursday

-

Thanksgiving Holidays

Nov 30 Tuesday

Dr. Gautam Das

Review for Final exam

Dec 2 Thursday

-

FINAL EXAM

Dec 3 Friday

-

Project Demo

  

Announcements

  • Please check this section regularly during the semester for updates and announcements on the course

  • Ethics statement is available here. Please print, sign and submit it to the instructor during class.

  • Group assignment for presentation might change in future. Any changes will be updated on the website.

  • Please send the TA your presentation slides one day before the scheduled presentation.

  • If you do not send you presentation slides before the scheduled presentation, you might loose some points.




 


Home Research Publications Professional | Teaching | Personal