Exploratory Data Analysis

Jesus A. Gonzalez

August 10, 2019

Introduction to Exploratory Data Analysis

What is Exploratory Data Analysis

*http://www.itl.nist.gov/div898/handbook/eda/section1/eda11.htm*

Focus

Philosophy

History

Show the way a data set should be analyzed

Techniques

Techniques

Techniques

Techniques

Exploratory Data Analysis with R

Exploratory Data Analysis Checklist

  1. Formulate your question
  2. Read your data
  3. Check the packaging
  4. Run str()
  5. Look at the top and botton of data
  6. Check your “n”s
  7. Validate with at least one external data source
  8. Try the easy solution first
  9. Challenge your solution
  10. Follow up

this checklist by by Roger D. Peng

Formulate your question

Formulate your question

Some common questions from the Engineering Statistics Handbook (http://www.itl.nist.gov/div898/handbook/eda/section3/eda32.htm)

  1. What is a typical value?
  2. What is the uncertainty for a typical value?
  3. What is a good distributional fit for a set of numbers?
  4. What is a percentile?
  5. Does an engineering modification have and effect?
  6. Does a factor have an effect?
  7. What are the most important factors?

Formulate your question

Some common questions from the Engineering Statistics Handbook (http://www.itl.nist.gov/div898/handbook/eda/section3/eda32.htm) Continues…

  1. Are measurements coming from different laboratories equivalent?
  2. What is the best function for relating a response variable to a set of factor variables?
  3. What are the best settings for factors?
  4. Can we separate signal from noise in time dependent data?
  5. Can we extract any structure from multivariate data?
  6. Does the data have outliers?

References

  1. Engineering Statistics Handbook, “Exploratory Data Analysis”, (http://www.itl.nist.gov/div898/handbook/eda/eda.htm), last visited: January 27, 2016.
  2. Roger D. Peng, Exploratory Data Analysis (https://leanpub.com/exdata), Lean Pubs, 2015.