Research

Publications

Professional

Teaching

Personal

Home

 

 

 

 

CSE 6339 SEC 001

DATA EXPLORATION AND ANALYSIS IN

RELATIONAL DATABASES

 

Fall 2011


Class Time and Place: Tue/Thu 12:30-1:50 pm (GACB 105)


Instructor: Dr. Gautam Das

Office: Engineering Research Building , Room# 626
Phone: 817 272 7595
Email: gdas AT uta DOT edu

Office Hours:   Wed 3-4, Thu 2:30-3:30


Teaching Assistant: Saravanan Thirumuruganathan

Office: Engineering Research Building, Room# 504
Email: saravanan DOT thirumuruganathan AT mavs DOT uta DOT edu

Office Hours:  By appointment on all days.(Previously TuWe 11-12, Fri 10-12)


About the Course

Much of the world's recorded data is locked up in structured sources such as databases, which are often the propriety information of private corporations and government agencies. Searching and exploring for information within databases is currently very cumbersome - often the data explorer has to know comprehensive query languages (such as SQL), as well as important information on how the data is structured into different tables and columns (the database schema). In recent years, researchers have pondered on the problems of improving the search and exploration capabilities for relational databases. This includes adapting probabilistic and approximate querying methods to improve the scalability of query answering, as well as information retrieval techniques such as relevance ranking and keyword search. With the rising popularity of Social networks, we will also look at techniques to effectively explore the data produced by them. This class will explore the recent efforts by researchers in these extremely important and challenging fields. We will read and discuss latest research literature gleaned from premier conferences in databases and information retrieval. It is hoped that this class will spur students to pursuing further research in these areas.

The following is a tentative list of topics which we will attempt to cover:

1. Probabilistic Methods in Databases

            Sampling Methods in Databases: Basics

            Approximate Query Processing

            Processing of Fuzzy/Uncertain Data

2. DB and IR integration

            Ranking of Database Query Results

            Top-K algorithms

3. Exploration of Social Media Data

            Collaborative Filtering

            Recommendations

We will cover various topics in breadth, understand the central contributions of these efforts and try and predict future research directions.


Prerequisites
Advanced Algorithms and Database II are the prerequisite courses. However, exceptions will be made on a case by case basis, especially if the student has prior exposure or demonstrates initiative to quickly learn these concepts on his/her own.


Evaluation

The grade will be based on the paper presentations, class attendance and participation, performance in the projects, one midterm exam and one final exam.

Presentation

25%

Midterm exam

25%

Final exam

25%

Final Project

25%

 

Presentations

The actual reading list, consisting of recent research papers, will be selected and finalized as the course progresses. Each student will present one or more papers (depending on the enrollment) during the semester. Students will participate in class discussions during and after each presentation. Attendance is required. The slides of the presentation must be emailed to the TA at least one day before the presentation. Failure to do so might result in deduction of few points.

Final Project

There will be a final project which will involve implementation of the systems described in the papers. You can work individually or form a team of at most two students. Potential topics for projects will be announced after the first midterm. The projects will have to demoed in the lab. The dates for the demo and other details will be updated as the course progresses. List of Sample Projects

Lecture Notes/Presentations

1. Sampling for Approximate Query Processing
2. Introduction to Social Network Analysis
3. Interesting Problems in Social Networks

Tentative Schedule (Presentation, Lectures and Exams)

Given below is a tentative schedule for class presentations over the course of this semester.

 

Date/Day

Presenter(s)

Paper

Aug 26 - Sep 06

Dr. Gautam Das

Lectures

Sep 08 Thursday

Satyakanth Kagitala

Ganti V., Lee M. L., Ramakrishnan R. ICICLES: Self-tuning Samples for Approximate Query Answering, VLDB, 2000. PPT

Sep 08 Thursday

Anvitha Banakal Sadananda

Acharya S., Gibbons P. B., Poosala V., Ramaswamy S. Join Synopses for Approximate Query Answering, ACM SIGMOD 1999. PPT

Sep 13  Tuesday

Harshit Shah

Chaudhuri S., Das G., Narasayya V. A Robust, Optimization- Based Approach for Approximate Answering of Aggregation Queries, ACM SIGMOD 2001. PPT

Sep 13  Tuesday

Nickolas Bielik

Babcock B., Chaudhuri C. and Das G. Dynamic Sample Selection for Approximate Query Processing, ACM SIGMOD 2003. PPT

Sep 15  Thursday

-

P. J. Haas and J. M. Hellerstein. Ripple Joins for Online Aggregation, ACM SIGMOD 1999.

Sep 15  Thursday

Rohan Ramekar

Arjun Dasgupta, Gautam Das, Heikki Mannila. A Random Walk Approach to Sampling Hidden Databases, ACM SIGMOD 2007 PPT

Sep 20 Tuesday

Sarker Ahmed Rumee

Ziv Bar-Yossef, Maxim Gurevich. Random sampling from a search engine's index, WWW 2006. PPT

Sep 22 Thursday

Dr.Gautam Das

Review for Mid-Term

Sep 27 Tuesday

-

MIDTERM EXAM

Sep 29 - Oct 6

Dr. Gautam Das

Lectures

Oct 11 Tuesday

Neha Matroja

Jon M. Kleinberg. Authoritative sources in a hyperlinked environment, Journal of the  ACM 46(1999). PPT

Oct 11 Tuesday

-

L. Page, S. Brin, R. Motwani, T. Winograd. The PageRank Citation Ranking: Bringing Order to the Web, 1998.

Oct 13 Thursday

Ganesh Viswanathan

Sanjay Agrawal, Surajit  Chaudhuri, Gautam Das. DBXplorer: A System For Keyword-Based Search Over Relational Databases, ICDE 2002. PPT

Oct 13 Thursday

Upa Gupta

Sanjay Agrawal, Surajit Chaudhuri, Gautam Das, Aristides Gionis. Automated Ranking of Database Query Results, CIDR 2003. PPT

Oct 18 Tuesday

Shrikant Desai

Surajit Chaudhuri, Gautam Das, Vagelis Hristidis, Gerhard Weikum, Probabilistic Ranking of Database Query Results, VLDB 2004. PPT

Oct 18 Tuesday

Mainul Islam

Ihab F. Ilyas, Walid G. Aref, Ahmed K. Elmagarmid. Supporting top-k join queries in relational databases, VLDB 2003. PPT

Oct 20 Thursday

Nandish Jayaram

Christopher Re, Nilesh Dalvi, Dan Suciu: Efficient Top-k Query Evaluation on Probabilistic Data, ICDE 2007. PDF

Oct 25 Tuesday

Na Li

Gautam Das, Dimitrios Gunopulos, Nick Koudas, Dimitris Tsirogiannis. Answering Top-k queries Using Views, VLDB 2006. PPT

Oct 25 Tuesday

-

Senjuti Basu Roy, Haidong Wang, Ullas Nambiar, Gautam Das and Mukesh Mohania. Minimum-Effort Driven Dynamic Faceted Search in Structured Databases. CIKM 2008.

Oct 27 Thursday

Dr.Gautam Das

Review for Mid-Term

Nov 01 Tuesday

-

MIDTERM EXAM

Nov 3-10

Dr.Gautam Das

Lectures

Recommeded Readings :
M. E. J. Newman. Models of the Small World A Review , Journal of Statistical Physics
Lada Adamic and Eytan Adar. How to search a social network, Social Networks, 27(3):187-203, July 2005.
Google News Personalization: Scalable Online Collaborative Filtering by Abhinandan Das, Mayur Datar, Ashutosh Garg and Shyam Rajaram in WWW 2007.

Nov 15 Tuesday

Sujoy Bhattacharya

D. Liben-Nowell, J. Kleinberg. The Link Prediction Problem for Social Networks. CIKM, 2003. PPT

Nov 17 Thursday

Manisha Kundoor

Sihem Amer Yahia, Michael Benedikt , Laks V. S. Lakshmanan , Julia Stoyanovich . Efficient Network-Aware Search in Collaborative Tagging Sites, VLDB 2008. PPT

Nov 22 Tuesday

Ajith Ajjaranichandrappa

Mahashweta Das, Gautam Das, and Vagelis Hristidis: Leveraging Collaborative Tagging for Web Item Design. Full paper, in Proc. of ACM SIGKDD 2011 PPT

Nov 22 Tuesday

Shreya Mamadapur

Sihem Amer-Yahia, Senjuti Basu Roy, Ashish Chawla, Gautam Das, Cong Yu: Group Recommendation: Semantics and Efficiency. Full paper, in VLDB 2009. PPT

Nov 24 Thursday

-

Thanksgiving Holidays

Nov 29 Tuesday

Ismat Jahan

Senjuti Basu Roy, Sihem Amer-Yahia, Gautam Das and Cong Yu: Interactive Itinerary Planning, In Proc. of ICDE 2011 . PPT

Dec 01 Thursday

Habibur Rahman

Arvind Narayanan and Vitaly Shmatikov : Robust De-anonymization of Large Sparse Datasets , IEEE Symposium on Security and Privacy 2008

Dec 6,8 

-

Project Presentations

  

Announcements

  • Please check this section regularly during the semester for updates and announcements on the course

  • Ethics statement is available here. Please print, sign and submit it to the instructor during class.

  • Group assignment for presentation might change in future. Any changes will be updated on the website.

  • Please send the TA your presentation slides one day before the scheduled presentation.

  • If you do not send you presentation slides before the scheduled presentation, you might loose some points.




 


Home Research Publications Professional | Teaching | Personal