Research

Publications

Professional Teaching Personal

Home

 

 

CSE 6392 SEC 013

DATA EXPLORATION AND ANALYSIS IN

RELATIONAL DATABASES

 
Spring 2006
 
 
Dr. Gautam Das
Office: 302 Nedderman Hall
Phone: 817 272 7595
Email: gdas@cse.uta.edu

Office Hours: Tue-Thu 1:00-2:00pm (or by appt)

 

GTA: Muhammed Z Miah    Tel: 817-272-0896 (DBXLab GS237)    Email: mzmiah@yahoo.com


About the Course

Much of the world’s recorded data is locked up in structured sources such as databases, which are often the propriety information of private corporations and government agencies. Searching and exploring for information within databases is currently very cumbersome - often the data explorer has to know comprehensive query languages (such as SQL), as well as important information on how the data is structured into different tables and columns (the database schema). In recent years, researchers have pondered on the problems of improving the search and exploration capabilities for relational databases. This includes adapting probabilistic and approximate querying methods to improve the scalability of query answering, as well as information retrieval techniques such as relevance ranking and keyword search. This class will explore the recent efforts by researchers in these extremely important and challenging fields. We will read and discuss latest research literature gleaned from premier conferences in databases and information retrieval. It is hoped that this class will spur students to pursuing further research in these areas.

The following is a tentative list of topics which we will attempt to cover:

1. Probabilistic Methods in Databases

            Sampling Methods in Databases: Basics

            Approximate Query Processing

            Processing of Fuzzy/Uncertain Data

2. Unstructured Search in DatabasesKeyword Queries in Databases

            Ranking of Database Query Results 

3. DB and IR integration

            Top-K algorithms

We will cover various topics in breadth, understand the central contributions of these efforts and try and predict future research directions.

Prerequisites

Advanced Algorithms and Database II are the prerequisite courses. However, exceptions will be made on a case by case basis, especially if the student has prior exposure or demonstrates initiative to quickly learn these concepts on his/her own.

Presentations

The actual reading list, consisting of recent research papers, will be selected and finalized by the first week of classes. Each student will present one or more papers (depending on the enrollment) during the semester. Students will participate in class discussions during and after each presentation. Attendance is required.

Project

 Additionally to reading papers, students will have the option of attempting a programming project during the semester. The projects will involve developing portions of information retrieval systems for structured databases based on the techniques suggested in the papers. The projects will also be tested out using real data that the students should get access to. A long-term objective is that the more promising projects will serve as infrastructure/test-beds for students to continue with their research in these areas beyond the course.

Evaluation

 The grade will be based on the paper presentations, class attendance and participation, and performance in the projects.

Schedule

 

   

Day & Date

          Presented By

         Topics/Papers

Tuesday

Jan 17

  • Dr. Gautam Das
  • Introduction to Approximate Query Processing, Information Retrieval, Sampling
Thursday

Jan 19

  • Dr. Gautam Das
Tuesday

Jan 24

  • Dr. Gautam Das

Thursday

Jan 26

  • Dr. Gautam Das
Tuesday

Jan 31

  • Dr. Gautam Das
  • Data Exploration and Analysis in Relational Databases  ppt slides
Thursday

Feb 02

  • Dr. Gautam Das
  • Central Limit Theorem, Stratified Sampling, Acquiring Random Sample  ppt slides
Tuesday

Feb 07

  • Anthony Okorodudu
  • Ganti V., Lee M. L., Ramakrishnan R. ICICLES: Self-tuning Samples for Approximate Query Answering. Proc. of VLDB, 2000  ppt slides
Tuesday

Feb 07

  • Ranjan Dash
  • Chaudhuri S., Das G., Datar M., Motwani R., Narasayya V. Overcoming Limitations of Sampling for Aggregation Queries. Proc. of IEEE Conf. on Data Engineering, 2001  ppt slides
Tuesday

Feb 09

  • Yinghui (Cathy) Wang
  • P. B. Gibbons and Y. Matias. New Sampling-Based Summary Statistics for Improving Approximate Query Answers. ACM SIGMOD 1998  ppt slides
Thursday

Feb 09

  • Sushruth Puttaswamy
  • Acharya S., Gibbons P. B., Poosala V., Ramaswamy S. Join Synopses for Approximate Query Answering. Proc. of ACM SIGMOD, 1999  ppt slides
Thursday

Feb 09

  • Arjun Dasgupta
  • Chaudhuri S., Motwani R., Narasayya V. Random Sampling Over Joins. Proc. of  ACM SIGMOD, 1999  ppt slides
Thursday

Feb 14

  • Amrita Tamrakar
  • P. B. Gibbons, Y. Matias, and V. Poosala. Fast Incremental Maintenance of Approximate Histograms.  VLDB 1997  ppt slides
Tuesday

Feb 14

  • Sushanth Sivaram Vallath
  • Chaudhuri S., Das G., Narasayya V. A Robust, Optimization- Based Approach for Approximate Answering of Aggregation Queries. Proc. of ACM SIGMOD, 2001  ppt slides
Tuesday

Feb 14

  • Mariam John
  • Babcock B., Chaudhuri C. and Das G. Dynamic Sample Selection for Approximate Query Processing. SIGMOD  2003: 539-550  ppt slides
Tuesday

Feb 14

  • Muhammed Miah
  • Acharya S., Gibbons P. B., Poosala V. Congressional Samples for Approximate Answering of Group-By Queries. Proc. of ACM SIGMOD, 2000  ppt slides
Thursday

Feb 16

  • William Fred Eberle
  • Jermaine C. Robust Estimation With Sampling and Approximate Pre-Aggregation. VLDB 2003: 886-897  ppt slides
Thursday

Feb 16

  • Zubin Mathew Joseph
  • P. J. Haas and J. M. Hellerstein. Ripple Joins for Online Aggregation. ACM SIGMOD 1999  ppt slides
Thursday

Feb 16

  • Archana Vijayalakshmanan
  • Hellerstein J., Haas P., Wang H. Online Aggregation. Proc. of ACM SIGMOD, 1997  ppt slides
Tuesday

Feb 21

  • Dr. Gautam Das
  • Wavelets and Ranking of Database Query Results  ppt slides
Thursday

Feb 23

  • Dr. Gautam Das
Tuesday

Feb 28

  • Dr. Gautam Das
  • FA, TA, Query Specific Ranking, and Adding Selection  ppt slides
Thursday

Mar 02

  • William Fred Eberle
  • Ron Fagin: Combining fuzzy information, an overview  ppt slides
Thursday

Mar 02

  • Arjun Dasgupta and Sushruth Puttaswamy
  • Ronald Fagin, Amnon Lotem, Moni Naor: Optimal Aggregation Algorithms for Middleware. PODS 2001  ppt slides
Tuesday

Mar 07

  • Dr. Gautam Das
Thursday

Mar 09

  • Muhammed Miah and Archana Vijayalakshmanan
  • Amélie Marian, Nicolas Bruno, Luis Gravano: Evaluating top-k queries over web-accessible databases. ACM Trans. Database Syst. 29(2): 319-362 (2004)  ppt slides
Thursday

Mar 09

  • Amrita Tamrakar
  • Vagelis Hristidis, Nick Koudas, Yannis Papakonstantinou: PREFER: A System for the Efficient Execution of Multi-parametric Ranked Queries. SIGMOD, 2001  ppt slides
Thursday

Mar 09

  • Zubin Joseph
  • Ihab F. Ilyas, Walid G. Aref, Ahmed K. Elmagarmid: Supporting top-k join queries in relational databases. VLDB J. 13(3): 207-221 (2004)  ppt slides
Tuesday

Mar 21

  • Mariam John
  • Chengkai Li, Mohamed A. Soliman, Kevin Chen-Chuan Chang, Ihab F. Ilyas: RankSQL: Supporting Ranking Queries in Relational Database Management Systems. VLDB 2005: 1342-1345  ppt slides
Tuesday

Mar 21

  • Dr. Gautam Das
Thursday

Mar 23

  • Dr. Gautam Das
Tuesday

Mar 27

  • Dr. Gautam Das
  • Implementation of Vector Space Model  ppt slides
Thursday

Mar 29

  • Dr. Gautam Das
  • Probabilistic Information Retrieval  ppt slides
Tuesday

Apr 04

  • Ranjan Dash
  • Modern Information Retrieval: A Brief Overview  ppt slides
Tuesday

Apr 04

  • Yinghui Wang
  • DBExplorer: A System For Keyword Based Search Over Relational Databases  ppt slides
Tuesday

Apr 04

  • Sushanth Vallath
  • Keyword Searching and Browsing in Databases using BANKS  ppt slides
Thursday

Apr 06

  • William Eberle
  • L. Page, S. Brin, R. Motwani, T. Winograd. The PageRank Citation Ranking: Bringing Order to the Web  ppt slides
Thursday

Apr 06

  • William Eberle
  • J. Kleinberg. Authoritative sources in a hyperlinked environment. Journal of the ACM 46(1999)  ppt slides
Tuesday

Apr 11

  • Archana Vijayalakshmanan
  • Sanjay Agrawal, Surajit Chaudhuri, Gautam Das, Aristides Gionis: Automated Ranking of Database Query Results. CIDR 2003  ppt slides
Tuesday

Apr 11

  • Zubin Joseph
  • Surajit Chaudhuri, Gautam Das, Vagelis Hristidis, Gerhard Weikum: Probabilistic Ranking of Database Query Results. VLDB 2004  ppt slides
Tuesday

Apr 11

  • Anthony Okorodudu
  • Answering Imprecise Queries over Autonomous Web Databases. U. Nambiar and S. Kambhampati. 22nd International Conference on Data Engineering (ICDE)  ppt slides
Thursday

Apr 13

  • Bhushan Chaudhury
  • Martin Theobald, Gerhard Weikum, Ralf Schenkel: Top-k Query Evaluation with Probabilistic Guarantees. VLDB 2004: 648-659  ppt slides
Tuesday

Apr 18

  • Sushruth  Puttaswamy
  • Kaushik Chakrabarti, Surajit Chaudhuri, Seung-won Hwang: Automatic Categorization of Query Results. SIGMOD Conference 2004: 755-766  ppt slides
Tuesday

Apr 18

  • Arjun Dasgupta
  • Page Quality: In Search of an Unbiased Web Ranking Junghoo Cho, Sourashis Roy, Robert Adams (UCLA) SIGMOD 2005  ppt slides
Thursday

Apr 20

  • Dr. Gautam Das
  • Data Exploration and Analysis in Relational Databases  ppt slides
Tuesday

Apr 25

  • Zubin Joseph
  • Introduction on Peer-to-Peer Networks  ppt slides
Tuesday

Apr 25

  • Anthony Okorodudu
  • Counting at Large: Efficient Cardinality Estimation in Internet-Scale Data Networks - Nikos Ntarmos, Peter Triantafillou, Gerhard Weikum  ppt slides
Tuesday

Apr 25

  • Dr. Gautam Das
  • Benjamin Arai, Gautam Das, Dimitrios Gunopulos, Vana Kalogeraki: Approximating Aggregation Queries in Peer-to-Peer Networks. ICDE 2006
Thursday

Apr 27

  • Ranjan Dash
  • Information Retrieval Techniques for Peer-to-Peer Networks Demetrios Zeinalipour-Yazti, University of California, Riverside Vana Kalogeraki, University of California, Riverside Dimitrios Gunopulos, University of California, Riverside  ppt slides
Thursday

Apr 27

  • Amrita Tamrakar
  • KLEE: A Framework for Distributed Top-k Query Algorithms  ppt slides
Thursday

Apr 27

  • Mariam John
  • P2P Content Search: Give the web back to the people  ppt slides
Tuesday

May 02

  • Sushanth Vallath
Tuesday

May 02

  • Hao Zhou
  • Gossip-Based Computation of Aggregate Information: David Kempe, Alin Dobra, Johannes Gehrke  ppt slides
Tuesday

May 02

  • Muhammed Miah
  • Samuel Madden, Michael J. Franklin, Joseph M. Hellerstein, Wei Hong: The Design of an Acquisitional Query Processor For Sensor Networks  ppt slides
Thursday

May 04

  • Yinghui Wang
  • D. Zeinalipour-Yazti, Z. Vagena, D. Gunopulos, V. Kalogeraki, V. Tsotras, M. Vlachos, N. Koudas, D. Srivastava "The Threshold Join Algorithm for Top-k Queries in Distributed Sensor Networks", Proceedings of the 2nd international workshop on Data management for sensor networks DMSN (VLDB'2005), Trondheim, Norway, ACM International Conference Proceeding Series; Vol. 96, Pages: 61-66, 2005  ppt slides
  • Additional Papers

- Garofalakis M. N. and Gibbons P. B. Approximate Query Processing: Taming the TeraBytes. VLDB 2001
   -- This one is a Survey

- Chakrabarti K., Garofalakis M., Rastogi R., Shim K. Approximate Query Processing Using Wavelets. Proc. of VLDB 2000

 

Home | Research | Publications | Professional | Teaching | Personal