Research

Publications

Professional Teaching Personal

Home

 

 

CSE 6392 SEC 003

DATA EXPLORATION AND ANALYSIS IN

RELATIONAL DATABASES

 
Spring 2007
 
Instructor: Dr. Gautam Das
Office: 302 Nedderman Hall
Phone: 817 272 7595
Email: gdas@cse.uta.edu

Office Hours:  Tue   : 1:00  - 2:00 pm
                                                              Wed : 2:00  - 3:00 pm (or by appointment)

 


About the Course

Much of the world’s recorded data is locked up in structured sources such as databases, which are often the propriety information of private corporations and government agencies. Searching and exploring for information within databases is currently very cumbersome - often the data explorer has to know comprehensive query languages (such as SQL), as well as important information on how the data is structured into different tables and columns (the database schema). In recent years, researchers have pondered on the problems of improving the search and exploration capabilities for relational databases. This includes adapting probabilistic and approximate querying methods to improve the scalability of query answering, as well as information retrieval techniques such as relevance ranking and keyword search. This class will explore the recent efforts by researchers in these extremely important and challenging fields. We will read and discuss latest research literature gleaned from premier conferences in databases and information retrieval. It is hoped that this class will spur students to pursuing further research in these areas.

The following is a tentative list of topics which we will attempt to cover:

1. Probabilistic Methods in Databases

            Sampling Methods in Databases: Basics

            Approximate Query Processing

            Processing of Fuzzy/Uncertain Data

2. Unstructured Search in DatabasesKeyword Queries in Databases

            Ranking of Database Query Results 

3. DB and IR integration

            Top-K algorithms

We will cover various topics in breadth, understand the central contributions of these efforts and try and predict future research directions.


Prerequisites


Advanced Algorithms and Database II are the prerequisite courses. However, exceptions will be made on a case by case basis, especially if the student has prior exposure or demonstrates initiative to quickly learn these concepts on his/her own.


Presentations

The actual reading list, consisting of recent research papers, will be selected and finalized as the course progresses. Each student will present one or more papers (depending on the enrollment) during the semester. Students will participate in class discussions during and after each presentation. Attendance is required.


Project

In addition to reading papers and presenting it in class, students will have the option of attempting a programming project during the semester. The projects will involve developing portions of information retrieval systems for structured databases based on the techniques suggested in the papers. The projects will also be tested out using real data that the students should get access to. A long-term objective is that the more promising projects will serve as infrastructure/test-beds for students to continue with their research in these areas beyond the course.


Evaluation

The grade will be based on the paper presentations, class attendance and participation, performance in the projects, and possibly 1-2 take-home examinations.


Schedule
 

Sl No. Date Presenter Paper
1. 13th Feb
Tuesday
Haidong Wang
  • Ganti V., Lee M. L., Ramakrishnan R. ICICLES: Self-tuning Samples for Approximate Query Answering. Proc. of VLDB, 2000.  (ppt slides

2. 15th Feb
Thursday
Daniel Kuang
  • Chaudhuri S., Das G., Datar M., Motwani R., Narasayya V. Overcoming Limitations of Sampling for Aggregation Queries. Proc. of IEEE Conf. on Data Engineering, 2001.  (ppt slides

3. 20th Feb
Tuesday
Rongfang Li
  • P. B. Gibbons and Y. Matias. New Sampling-Based Summary Statistics for Improving Approximate Query Answers. ACM SIGMOD 1998. (ppt slides
4. 20th Feb
Tuesday
Bhushan Pachpande
  • Acharya S., Gibbons P. B., Poosala V., Ramaswamy S. Join Synopses for Approximate Query Answering. Proc. of ACM SIGMOD, 1999. (ppt slides
5. 27th Feb
Tuesday
Lekhendro
  • Chaudhuri S., Motwani R., Narasayya V. Random Sampling Over Joins. Proc. of  ACM SIGMOD, 1999.  (ppt slides
6. 6th March
Thursday
Rebecca Atchley
  • Chaudhuri S., Das G., Narasayya V. A Robust, Optimization- Based Approach for Approximate Answering of Aggregation Queries. Proc. of ACM SIGMOD, 2001. (ppt slides
7. 6th March
Tuesday
Daniel Kuang
  • Acharya S., Gibbons P. B., Poosala V. Congressional Samples for Approximate Answering of Group-By Queries. Proc. of ACM SIGMOD, 2000. (ppt slides)  
8. 8th March
Tuesday
Senjuti Basu Roy
  • Babcock B., Chaudhuri C. and Das G. Dynamic Sample Selection for Approximate Query Processing. SIGMOD  2003: 539-550.
9. 20th March
Tuesday
Haidong Wang
  • P. B. Gibbons, Y. Matias, and V. Poosala. Fast Incremental Maintenance of Approximate Histograms.  VLDB 1997.
10. 20th March
Tuesday
Rongfang Li
  • P. J. Haas and J. M. Hellerstein. Ripple Joins for Online Aggregation.
    ACM SIGMOD 1999.
11. 22nd March
Thursday
Bhushan Pachpande
  • Hellerstein J., Haas P., Wang H. Online Aggregation. Proc. of ACM
    SIGMOD, 1997.
    (ppt slides). 
12. 12th April
Thursday
Lekhendro
  • NRA (ppt slides
  • Answering Top-k queries Using Views, BLDB 2006, Gautam Das, Dimitrios Gunopulos, Nick Koudas, Dimitris Tsirogiannis  (ppt slide).
13. 17th April
Tuesday
Senjuti Basu Roy
  • Approximate query processing using Wavelets Kausik chakrabarti , Mopni Garofallakis
14. 19th April
Thursday
Rebecca Atchley
  • Supporting top-k join queries in relational databases Ihab F. Ilyas, Walid G. Aref, Ahmed K. Elmagarmid  (ppt slides
15. 26th April Thursday Bhushan Pachpande
  • DBExplorer: A System For Keyword Based Search Over Relational  Databases - Sanjay Agrawal, Surajit  Chaudhuri, Gautam Das. (ppt slides
16. 26th April Thursday Daniel Kuang
  • Keyword Searching and Browsing in Databases using BANKS - Charuta Nakhe, Arvind Hulgeri, Gaurav Bhalotia,  Soumen Chakrabarti, S. Sudarshan.
17. 1st May
Tuesday
Rebecca Atchley
  • Automated Ranking of Database Query Results- Sanjay Agrawal, Surajit Chaudhuri, Gautam Das, Aristides Gionis, (CIDR 2003).
18. 1st May
Tuesday
Senjuti Basu Roy
  • Probabilistic Ranking of Database Query Results -Surajit Chaudhuri, Gautam Das, Vagelis Hristidis, Gerhard Weikum, (VLDB 2004).
19. 3rd May
Thursday
Lekhendro
  • Authoritative sources in a hyperlinked environment - Kleinberg. Journal of the  ACM 46(1999).
    (ppt slides).
20. 3rd May
Thursday
Haidong Wang
  • The PageRank Citation Ranking: Bringing Order to the Web- L. Page, S. Brin, R. Motwani, T. Winograd.

Home | Research | Publications | Professional | Teaching | Personal