Research

Publications

Professional

Teaching

Personal

Home

 

 

 

 

CSE 6339 SEC 001

DATA EXPLORATION AND ANALYSIS IN

RELATIONAL DATABASES

 

Fall 2009


Class Time and Place: Tu/Th 12:30-1:50 pm (WH 221)


Instructor: Dr. Gautam Das

 

Office: Nedderman Hall, Room# 302
Phone: 817 272 7595
Email: gdas@uta.edu

Office Hours:  Wed 1:00-2:00 pm and Thu 2:00-3:00 pm


Teaching Assistant: Ning Yan

 

Office: GeoScience Building, Room# 237
Email: ning.yan@mavs.uta.edu

Office Hours:  Thu 3:00-5:00 pm and by appointment


About the Course

Much of the world’s recorded data is locked up in structured sources such as databases, which are often the propriety information of private corporations and government agencies. Searching and exploring for information within databases is currently very cumbersome - often the data explorer has to know comprehensive query languages (such as SQL), as well as important information on how the data is structured into different tables and columns (the database schema). In recent years, researchers have pondered on the problems of improving the search and exploration capabilities for relational databases. This includes adapting probabilistic and approximate querying methods to improve the scalability of query answering, as well as information retrieval techniques such as relevance ranking and keyword search. This class will explore the recent efforts by researchers in these extremely important and challenging fields. We will read and discuss latest research literature gleaned from premier conferences in databases and information retrieval. It is hoped that this class will spur students to pursuing further research in these areas.

The following is a tentative list of topics which we will attempt to cover:

1. Probabilistic Methods in Databases

            Sampling Methods in Databases: Basics

            Approximate Query Processing

            Processing of Fuzzy/Uncertain Data

2. Unstructured Search in Databases Keyword Queries in Databases

            Ranking of Database Query Results 

3. DB and IR integration

            Top-K algorithms

We will cover various topics in breadth, understand the central contributions of these efforts and try and predict future research directions.


Prerequisites

Advanced Algorithms and Database II are the prerequisite courses. However, exceptions will be made on a case by case basis, especially if the student has prior exposure or demonstrates initiative to quickly learn these concepts on his/her own.


Presentations

The actual reading list, consisting of recent research papers, will be selected and finalized as the course progresses. Each student will present one or more papers (depending on the enrollment) during the semester. Students will participate in class discussions during and after each presentation. Attendance is required.


Homework

There is going to be two homework which are used for students to prepare for the exams.


Evaluation

The grade will be based on the paper presentations, class attendance and participation, performance in the projects, one midterm exam and one final exam.

 

Homework

25%

Midterm exam

25%

Final exam

25%

Presentation

20%

Class participation

05%

 

Tentative Schedule (Presentation, Lectures and Exams)

Each paper will be presented by a group of two/three students based on the enrollment. Each group will present two papers. The papers will be presented in two cycles, the first one before the mid term exam and the second one after the midterm exam. Given below is a tentative schedule for class presentations over the course of this semester.

 

Date/Day

Presenter(s)

Paper

Sep 01 - Sep 10

Dr. Gautam Das

Lectures

Sep 15 Tuesday

Gopinath,Shruti P

Ganti V., Lee M. L., Ramakrishnan R. ICICLES: Self-tuning Samples for Approximate Query Answering, VLDB, 2000. ppt slides

Sep 15 Thursday

Shah,Nishit Haresh

Chaudhuri S., Das G., Datar M., Motwani R., Narasayya V. Overcoming Limitations of Sampling for Aggregation Queries, ICDE 2001. ppt slides

Sep 17  Tuesday

Hoskere,Vinay S

Acharya S., Gibbons P. B., Poosala V., Ramaswamy S. Join Synopses for Approximate Query Answering, ACM SIGMOD 1999. ppt slides

Sep 17  Thursday

Rao,Divya Ganesha Babu

Acharya S., Gibbons P. B., Poosala V. Congressional Samples for Approximate Answering of Group-By Queries, ACM SIGMOD 2000. ppt slides

Sep 22 Tuesday

Fernandes,Melissa J

Chaudhuri S., Das G., Narasayya V. A Robust, Optimization- Based Approach for Approximate Answering of Aggregation Queries, ACM SIGMOD 2001. ppt slides

Sep 24  Thursday

Kamble Gnanoba,Shashank

Babcock B., Chaudhuri C. and Das G. Dynamic Sample Selection for Approximate Query Processing, ACM SIGMOD 2003. ppt slides

Sep 24  Thursday

Dave,Arjav

Hellerstein J., Haas P., Wang H. Online Aggregation, ACM SIGMOD 1997. ppt slides

Sep 29 Tuesday

Chandrashekar,Nag P

P. J. Haas and J. M. Hellerstein. Ripple Joins for Online Aggregation, ACM SIGMOD 1999. ppt slides

Oct 01 Tuesday

Sudheendra,Supriya

Kaushik Chakrabarti, Minos Garofalakis, Rajeev Rastogi, Kyuseok Shim. Approximate Query Processing Using Wavelets, VLDB 2000. ppt slides

Oct 06 Thursday

Nema,Srikantha Aswathanarayana

Chaudhuri S., Motwani R., Narasayya V. On Random Sampling Over Joins, ACM SIGMOD 1999.  ppt slides

Oct 08 Tuesday

Dr. Gautam Das

Review for Midterm exam

Oct 13 Tuesday

-

MIDTERM EXAM

Oct 15 – Oct 27

Dr. Gautam Das

Lectures

Oct 29 Tuesday

Krishnamoorthy,Pranav

Jon M. Kleinberg. Authoritative sources in a hyperlinked environment, Journal of the  ACM 46(1999). ppt slides

Oct 29 Thursday

Byrappa Nagaraj,Chandrik

L. Page, S. Brin, R. Motwani, T. Winograd. The PageRank Citation Ranking: Bringing Order to the Web, 1998. ppt slides

Nov 03 Tuesday

Bansal,Kushal N

Charuta Nakhe, Arvind Hulgeri, Gaurav Bhalotia,  Soumen Chakrabarti, S. Sudarshan. Keyword Searching and Browsing in Databases using BANKS, ICDE 2002. ppt slides

Nov 05 Thursday

Mannem,Anusha

Sanjay Agrawal, Surajit  Chaudhuri, Gautam Das. DBXplorer: A System For Keyword-Based Search Over Relational Databases, ICDE 2002. ppt slides

Nov 10 Tuesday

Somuri,Ramya

Sanjay Agrawal, Surajit Chaudhuri, Gautam Das, Aristides Gionis. Automated Ranking of Database Query Results, CIDR 2003. ppt slides

Nov 12 Thursday

Satyanarayana,Sindhu and Raju,Ranjan Alankar

Surajit Chaudhuri, Gautam Das, Vagelis Hristidis, Gerhard Weikum, Probabilistic Ranking of Database Query Results, VLDB 2004. ppt slides

Nov 17 Tuesday

Gupta,Jitendra

Christopher Re, Nilesh Dalvi, Dan Suciu: Efficient Top-k Query Evaluation on Probabilistic Data, ICDE 2007. ppt slides

Nov 17 Tuesday

Muniraju,Sowmya

Ihab F. Ilyas, Walid G. Aref, Ahmed K. Elmagarmid. Supporting top-k join queries in relational databases, VLDB 2003. ppt slides

Nov 19 Thursday

Mangipudi,Aditya and Madiraju Keshavaraju,Pav

Zhen Zhang, Seung-won Hwang, Kevin Chen-Chuan Chang, Min Wang, Christian A. Lang, Yuan-Chi Chang: Boolean + ranking: querying a database by k-constrained optimization, ACM SIGMOD 2006. ppt slides

Nov 24 Tuesday

Ancha,Poornima and Buchi,Raju

Gautam Das, Dimitrios Gunopulos, Nick Koudas, Dimitris Tsirogiannis. Answering Top-k queries Using Views, VLDB 2006. ppt slides

Nov 26 Thursday

-

Thanksgiving Holidays

Dec 01 Tuesday

Dr. Gautam Das

Review for Final exam

Dec 08 Tuesday

-

FINAL EXAM (NH109 11am-1pm)

  

Announcements

  • Please check this section regularly during the semester for updates and announcements on the course

  • Ethics statement is available here. Please print, sign and submit it to the instructor during class.

  • Group assignment for presentation might change in future. Any changes will be updated on the website.

  • Please send the TA your presentation slides one day before the scheduled presentation.

  • If you do not send you presentation slides before the scheduled presentation, you might loose some points.




 


Home Research Publications Professional | Teaching | Personal