Research

Publications

Professional

Teaching

Personal

Home

 

 

 

 

CSE 6339 SEC 001

DATA EXPLORATION AND ANALYSIS IN

RELATIONAL DATABASES

 

Fall 2008


Class Time and Place: Tu/Th 12:30-1:50 pm (NH 203)


Instructor: Dr. Gautam Das

 

Office: Nedderman Hall, Room# 302
Phone: 817 272 7595
Email: gdas@uta.edu

Office Hours:  Tue 4:30-5:30 pm and Wed 2:00-3:00 pm


Teaching Assistant: Muhammed Miah

 

Office:    GeoScience Building, Room# 237
Email: mzmiah@uta.edu

Office Hours:  Tue 3:00-5:00 pm and by appointment


About the Course

Much of the world’s recorded data is locked up in structured sources such as databases, which are often the propriety information of private corporations and government agencies. Searching and exploring for information within databases is currently very cumbersome - often the data explorer has to know comprehensive query languages (such as SQL), as well as important information on how the data is structured into different tables and columns (the database schema). In recent years, researchers have pondered on the problems of improving the search and exploration capabilities for relational databases. This includes adapting probabilistic and approximate querying methods to improve the scalability of query answering, as well as information retrieval techniques such as relevance ranking and keyword search. This class will explore the recent efforts by researchers in these extremely important and challenging fields. We will read and discuss latest research literature gleaned from premier conferences in databases and information retrieval. It is hoped that this class will spur students to pursuing further research in these areas.

The following is a tentative list of topics which we will attempt to cover:

1. Probabilistic Methods in Databases

            Sampling Methods in Databases: Basics

            Approximate Query Processing

            Processing of Fuzzy/Uncertain Data

2. Unstructured Search in Databases Keyword Queries in Databases

            Ranking of Database Query Results 

3. DB and IR integration

            Top-K algorithms

We will cover various topics in breadth, understand the central contributions of these efforts and try and predict future research directions.


Prerequisites

Advanced Algorithms and Database II are the prerequisite courses. However, exceptions will be made on a case by case basis, especially if the student has prior exposure or demonstrates initiative to quickly learn these concepts on his/her own.


Presentations

The actual reading list, consisting of recent research papers, will be selected and finalized as the course progresses. Each student will present one or more papers (depending on the enrollment) during the semester. Students will participate in class discussions during and after each presentation. Attendance is required.


Project

In addition to reading papers and presenting it in class, students will be attempting a project during the semester. The project will be either a programming  project or a research project. The projects will involve developing portions of information retrieval systems for structured databases based on the techniques suggested in the papers. The projects will also be tested out using real data that the students should get access to. A long-term objective is that the more promising projects will serve as infrastructure/test-beds for students to continue with their research in these areas beyond the course.


Evaluation

The grade will be based on the paper presentations, class attendance and participation, performance in the projects, one midterm exam and one final exam.

 

Midterm exam 30%
Final exam 25%
Presentation 20%
Project 20%
Class participation 05%

 

Tentative Schedule (Presentation, Lectures and Exams)

Each paper will be presented by a group of two/three students based on the enrollment. Each group will present two papers. The papers will be presented in two cycles, the first one before the mid term exam and the second one after the midterm exam. Given below is a tentative schedule for class presentations over the course of this semester.

 

Date/Day Presenter(s) Paper
Aug 26 - Sep 11 Dr. Gautam Das Lectures
Sep 16 Tuesday Harikrishnan Karunakaran
Sulabha Balan
Ganti V., Lee M. L., Ramakrishnan R. ICICLES: Self-tuning Samples for Approximate Query Answering, VLDB, 2000. ppt slides
Sep 16 Tuesday Deep Pancholi
Vinit Asher
Chaudhuri S., Das G., Datar M., Motwani R., Narasayya V. Overcoming Limitations of Sampling for Aggregation Queries, ICDE 2001. ppt slides
Sep 18  Thursday Saranya Gottipati
Jeevan Gogineni
Acharya S., Gibbons P. B., Poosala V., Ramaswamy S. Join Synopses for Approximate Query Answering, ACM SIGMOD 1999. ppt slides
Sep 18  Thursday Anusha Reddy Rachapalli Muni
Agasthya Padisala
Acharya S., Gibbons P. B., Poosala V. Congressional Samples for Approximate Answering of Group-By Queries, ACM SIGMOD 2000. ppt slides
Sep 23 Tuesday Venkata Jammula
Vivek Tanneeru
Chaudhuri S., Das G., Narasayya V. A Robust, Optimization- Based Approach for Approximate Answering of Aggregation Queries, ACM SIGMOD 2001. ppt slides
Sep 25  Thursday Chandrashekar Vijayarenu
Anirban Maiti
Babcock B., Chaudhuri C. and Das G. Dynamic Sample Selection for Approximate Query Processing, ACM SIGMOD 2003. ppt slides
Sep 25  Thursday Calvin Noronha
Deepak Anand
Hellerstein J., Haas P., Wang H. Online Aggregation, ACM SIGMOD 1997. ppt slides
Sep 30 Tuesday Ronda Hilton P. J. Haas and J. M. Hellerstein. Ripple Joins for Online Aggregation, ACM SIGMOD 1999. ppt slides
Sep 30 Tuesday Charanmai Koorapati Ramesh
Harika Guniganti
Kaushik Chakrabarti, Minos Garofalakis, Rajeev Rastogi, Kyuseok Shim. Approximate Query Processing Using Wavelets, VLDB 2000. ppt slides
Oct 02 Thursday Saranya Gottipati
Jeevan Gogineni
Chaudhuri S., Motwani R., Narasayya V. On Random Sampling Over Joins, ACM SIGMOD 1999. ppt slides
Oct 07 Tuesday Dr. Gautam Das Review for Midterm exam
Oct 09 Thursday - MIDTERM EXAM
Oct 14, 16, 21, 23 Dr. Gautam Das Lectures
Oct 28 Tuesday Vikrant Khosla
Sridhar Kameswara Nemani
Jon M. Kleinberg. Authoritative sources in a hyperlinked environment, Journal of the  ACM 46(1999). ppt slides
Oct 30 Thursday

Ashish Chawla

Vinit Asher

L. Page, S. Brin, R. Motwani, T. Winograd. The PageRank Citation Ranking: Bringing Order to the Web, 1998. ppt slides
Nov 04 Tuesday Harikrishnan Karunakaran
Sulabha Balan
Charuta Nakhe, Arvind Hulgeri, Gaurav Bhalotia,  Soumen Chakrabarti, S. Sudarshan. Keyword Searching and Browsing in Databases using BANKS, ICDE 2002. ppt slides
Nov 06 Thursday Deep Pancholi Sanjay Agrawal, Surajit  Chaudhuri, Gautam Das. DBXplorer: A System For Keyword-Based Search Over Relational Databases, ICDE 2002. ppt slides
Nov 11 Tuesday Mahadevkirthi Mahadevraj
Sameer Gupta
Sanjay Agrawal, Surajit Chaudhuri, Gautam Das, Aristides Gionis. Automated Ranking of Database Query Results, CIDR 2003. ppt slides
Nov 13 Thursday Raghunath Ravi
Sivaramakrishnan Subramani
Surajit Chaudhuri, Gautam Das, Vagelis Hristidis, Gerhard Weikum, Probabilistic Ranking of Database Query Results, VLDB 2004. ppt slides
Nov 18 Tuesday Chandrashekar Vijayarenu
Anirban Maiti
Christopher Re, Nilesh Dalvi, Dan Suciu: Efficient Top-k Query Evaluation on Probabilistic Data, ICDE 2007. ppt slides
Nov 18 Tuesday Calvin Noronha
Deepak Anand
Ihab F. Ilyas, Walid G. Aref, Ahmed K. Elmagarmid. Supporting top-k join queries in relational databases, VLDB 2003. ppt slides
Nov 20 Thursday Rashmi Pagadala
Swetta Bhaskar
Zhen Zhang, Seung-won Hwang, Kevin Chen-Chuan Chang, Min Wang, Christian A. Lang, Yuan-Chi Chang: Boolean + ranking: querying a database by k-constrained optimization, ACM SIGMOD 2006. ppt slides
Nov 25 Tuesday Lipsa Patel
Kushal Shah
Gautam Das, Dimitrios Gunopulos, Nick Koudas, Dimitris Tsirogiannis. Answering Top-k queries Using Views, VLDB 2006. ppt slides
Nov 27-30 - Thanksgiving Holidays
Dec 01 Monday Students Project Presentation at DBXLAB (GS 237)
Dec 02 Tuesday Students Project Presentation at DBXLAB (GS 237)
Dec 02 Tuesday Dr. Gautam Das Review for Final exam
Dec 04 Thursday - FINAL EXAM

 

 Project Presentation Schedule NEW

Project presentations will be held in Room# 237 of GeoScience Building (TA's office). In addition to presentation, students are required to submit the code and 1-2 page user manual by midnight of Dec 02. The schedule is as follows:

 

Group No. Student Name Project Title Presentation Date Presentation Time
1

Mahadevkirthi Mahadevraj 

Sameer Gupta

Outlier  Indexing  Method for Approximate Query Answering

Dec 01

Monday

10:30 - 11:00
2

Deepak Anand 

Calvin Richard Noronha

Rank Join algorithm

Dec 01

Monday

11:00 - 11:30
3

Vikrant Khosla 

Sridhar Kameswara Nemani

DBXplorer: A System For Keyword-Based Search Over Relational Databases

Dec 01

Monday

11:30 - 12:00
4

Anusha Reddy Rachapalli Muni

Agasthya Padisala

Congressional Samples for Approximate Answering of Group-By Queries

Dec 01

Monday

12:30 - 01:00
5

Charanmai Koorapati Ramesh

Harika Guniganti

ICICLES: Self-tuning Samples for Approximate Query Answering

Dec 01

Monday

02:30 - 03:00
6

Rashmi Pagadala

Swetha Bhaskar

ICICLES: Self-tuning Samples for Approximate Query Answering

Dec 01

Monday

03:00 - 03:30
7

Vivek Tanneeru

Venkata Jammula

Overcoming Limitations of Sampling for Aggregation Queries

Dec 01

Monday

03:30 - 04:00
8

Jeevan Gogineni

Saranya Gottipati

Join Synopses

Dec 01

Monday

04:00 - 04:30
9

Deep Pancholi

Vinit Asher 

DBXplorer: A System For Keyword-Based Search Over Relational Databases

Dec 02

Tuesday

11:00 - 11:30
10

Chandrashekar Vijayarenu

Anirban Maiti

Efficient Top-k Query Evaluation on Probabilistic Data

Dec 02

Tuesday

11:30 - 12:00
 11

Ronda Hilton

Top-K Queries Using Views

Dec 02

Tuesday

12:00 - 12:30
12

Harikrishnan Karunakaran

Sulabha Balan

Online Aggregation

Dec 02

Tuesday

02:30 - 03:00
13

Sivaramakrishnan Subramani

Raghunath Ravi 

Fagin's Threshold Algorithm

Dec 02

Tuesday

03:00 - 03:30
14

Lipsa Patel

Kushal Shah

Implementing Threshold Algorithm for Top-k Query Processing

Dec 02

Tuesday

03:30 - 04:00
15

Ashish Chawla

Research Project - topic?????

Dec 02

Tuesday

04:00 - 04:30

 

 

Announcements

  • Please check this section regularly during the semester for updates and announcements on the course
  • Ethics statement is available here. Please print, sign and submit it to the instructor during class.
  • Please let the TA know the names of your group members by Sep 09, 2008.
  • Group assignment for presentation might change in future. Any changes will be updated on the website.
  • Please send the TA your presentation slides before the scheduled presentation.
  • If you do not send you presentation slides before (or at least the same day) the scheduled presentation, you might loose some points.
  • All project presentations will be held on Dec 1 and 2 at TA's office. Specific slots will be announced later. You must inform the TA about your team members, specific project topic and desired presentation date and time (if any) by Nov 20. Make sure you finalize your project topic with TA before you start working on implementing your project as each group should work on different topic.
  • Project  presentation schedule has been put up (see above).  NEW
  • Please submit the codes and user manual in a single zip file. The zip file name should be as the names of the group members.  NEW


 


Home | Research | Publications | Professional | Teaching | Personal