Association Rules

Applications

Support for decision making
Alarm diagnosis and prediction in telecommunications
Sales information analysis
- Catalog design
- Merchandise distribution in stores
- Customer segmentation based on buying patterns

Association Rules

Similar to classification rules
Also found with a covering procedure
Main difference
- In the right hand of the rules we can find any attribute-value pair or pairs
In order to find this type of rules
- Consider each possible combination of attribute-value pair in the RHS

Association Rules

After that we prune
- Using coverage
  - Number of correctly predicted instances
- Using precision
  - Proportion of number of instances to which the rule applies

Example

Find the association rules \(X \& Y \rightarrow Z\) with the restrictions of having a minimum confidence and support
Rules with
- Minimum suport of 50%
- Minimum confidence of 50%
- \(A \rightarrow C\) (50%, 66%)
- \(C \rightarrow A\) (50%, 100%)

Association Rules

An association rule is an expression of the form \(X \rightarrow Z\) where \(X\) and \(Z\) are sets of elements
Intuitive meaning
- Transactions in the database that contain \(X\) tend to contain \(Z\)
- A minimum level of confidence and support is required

Definitions

\(I = \{i_1, i_2, i_3, \dots, i_m\} \rightarrow\) a set of literals, attributes
\(D \rightarrow\) a set of transactions \(T\), \(T \subseteq I\)
\(TID \rightarrow\) an identifier associated to each transaction
\(X \rightarrow\) a set of elements \(X \subset I\)
An association rule is an implication:
- \(X \rightarrow Z\)
- \(X \subset I\)
- \(Z \subset I\)
- \(X \cap Z = \emptyset\)

Definitions

Support (or coverage), \(s\), is the probability that a transaction contains \(\{X, Y, Z\}\)
Confidence (or efficiency), \(c\), is the conditional probability that a transaction containing \(\{X, Y\}\) also contains \(Z\)

Rule Evaluation

We evaluate the rules according to their support and confidence
- In association rules, coverage is known as
  - Support
- Precision is known as
  - Confidence

Rule Evaluation

Can be read as
- \(support(X \rightarrow Z) = P(X Z) =\)
- \(\frac{\#Trans-with-elements-in-X-and-Z}{\#Total-transactions}\)

Rule Evaluation

Can be read as
- \(confidence(X \rightarrow Z) = P(Z | X)\)

Rule Evaluation

Can be read as
- \(confidence(X \rightarrow Z) =\)
- \(\frac{support(X Z)}{support(X)} =\)
- \(\frac{\#Transactions-containing-X-and-Z}{\#Transactions-containing-X}\)

Rule Evaluation

We want rules with a minimum support and confidence
- support \(\geq\) min_supp
- confidence \(\geq\) min_conf
- We look for (independently of the side in which appear)
  - Attribute-value pairs that cover a large number of instances
The sets of attribute-value pairs are called
- item-sets
Each attribute-value pair is called
- item

Example

http://zulfiqar.typepad.com/photos/uncategorized/shoppingcart.jpg

Example

A typical example of association rules
- Market basket analysis
- Find associations from the customers’ products
- May impact marketing strategies
After generating all sets of itemsets
- Transform them into rules
  - With the minimun confidence required
  - Some items produce more than one rule and others just none

Example

Following the weather example
- Itemset: \(humidity = normal\), \(wind = no\), \(class = P\)
Produces the following rules
- If \(humidity = normal\) & \(wind = no\) Then \(Class = P\) 4/4
- If \(humidity = normal\) & \(class = P\) Then \(Wind = no\) 4/6
- If \(wind = no\) & \(class = P\) Then \(humidity = normal\) 4/6
- If \(humidity = normal\) Then \(wind = no\) & \(class = P\) 4/7
- If \(wind = no\) Then \(class = P\) & \(humidity = normal\) 4/8
- If \(class = P\) Then \(wind = no\) & \(humidity = normal\) 4/9
- If \(true\) Then \(humidity = normal\) & \(wind = no\) & \(class = P\) 4/12

Example

If we think about 100% of success, only the first rule complies
There are 58 rules considering the whole table that cover 2 examples with 100% of accuracy

Algorithm

The algorithm follows 2 steps
- Apriori (Agrawal et al. 1994)

Generate all itemsets with one element
- Uses these to generate those with 2 elements and so on
- Takes all possible pairs with minimum support
- Allows removing possible combinations (prune)
Generate rules verifying if they have the minimum confidence

Algorithm

http://prdeepakbabu.files.wordpress.com/2010/02/apriori.jpg

Algorithm

AprioriGen generates candidates \(C_k\) (of size \(k\)) from the frequent itemsets of size \(k-1\)
Subset determines which of the candidate itemsets are actually frequent in each pass (re-scans and compares)

Remarks

If a conjunction of consequents satisfies the minimum levels of support and confidence, its subsets of consequents will also satisfy them
If some item doesn’t satisfy those levels of support and confidence, we should not consider its supersets
Give a way to build rules with only 1 consequent, then from 1 consequent rules we build those with 2 and so on
- Given two itemsets of size 2
  - Itemset 1: X, Y
  - Itemset 2: X, Z
- We create an itemset of size 3
  - New itemset: X, Y, Z

Remarks

This method makes one pass to the database for each itemset of different size
The computational effort depends mainly on the minimum coverage/support required
- It takes almost all the first step
- The first step is the most time consuming
The iterative process of the first step is called
- Level-wise
- Considers the supersets level by level
We have an anti-monotonic property (called the downward closure)
- If a set doesn’t pass the test, none of its supersets will pass it

Remarks

If \(n\) itemsets don’t pass the support test
- None of their supersets will pass it
- We take advantage of this during the construction of candidates
  - So that we don’t consider all of them
  - We also apply pruning strategies

Example

Consider a table with shopping lists

Process with the data of the previous table
- Candidate genertion per levels

Remarks

Once we have the itemsets, generating rules is relatively simple
For each set \(l\) of items, generate all its subsets
For each subset \(s \in l\), generate the rule
- \(s \rightarrow (l-s)\) if \(\frac{support(l)}{support(s)} \geq confidence-level\)

Enhancements (for efficieny)

Some enhancements to the basic algorithm Apriori in order to make it more efficient
- Use hash tables to reduce the number of itemsets when generating candidates
- An Effective Hash-based Algorithm for Mining Association Rules
- Eliminate transactions (elements from the database) that do not contribute to the supersets to consider
- Divide the transactions in disjunct partitions, evaluate the local itemsets and then, based on those results, estimate the global results

Enhancements (for efficiency)

Create approximations with samples in the list of products, avoiding reading all the data
Avoid generating candidates using alternative data structures
- FP-trees (Frequent Pattern tree)

Extensions

Finding association rules at different levels of abstraction
- Usually starts with upper level classes
- Results can be used to filter lower level classes
- i.e. Association rules over computers and printers
  - Then over laptops and workstations in one side and lasser and inkjet on the other, and so on
- When proceeding to the subclasses we may consider
  - A uniform support criterion
  - Reducing the criterion for the subclasses
  - Consider all subclasses independently of the support criterion
- When finding association rules at different levels of abstraction
  - It’s common generating redundant rules
  - Rules that don’t add knowledge (i.e. a more general rule already means the same)
  - It’s necessary to incorporate filtering mechanisms

Extensions

Finding association rules by combining information from multiple tables or multi-dimensional association rules
- Datacubes may be used to find multi-dimensional association rules

Extensions

Originally, association rules work with discrete attributes
- Mechanisms to deal with continuous attributes have been created
  - Discretize before mining, using ranges from possible predefined hierarchies
- Discretize dynamically during the process, trying to maximize some confidence criterion or reduction of the size of the rules
  - i.e. ACRS Association Rule Clustering System
    - Maps quantitative attributes into a mesh and then uses clustering
    - First assign data to containers delimited by ranges (could be changed at a later time)
    - More common schemes: containers of the same size, with the same number of elements, with elements uniformly distributed
    - After that, find association rules using the containers
    - Then, rules may be grouped if they conform larger rectangles in the mesh
- Discretize using semantic information
  - i.e. create groups with elements that are close to each other (i.e. with clustering over the attributes)
  - Once clusters have been found, find association rules from those clusters
    - Based on similarity measures

Association versus Correlation

Finding an association rule doesn’t necessarily mean it will be useful
- i.e. We analyze 10,000 shopping tickets
- 6,000 of them bought videogames
- 7,500 bought dvds
- 4,000 both of them
- We might generate the rule: buy videogames \(\rightarrow\) buy dvds
  - [supp = 4,000/10,000 = 40%, conf = 4,000/6,000 = 66%]
  - But if 75% buy dvds, then buying videogames reduces the possibilities of buying dvds

Association versus Correlation

The occurrence of an itemset A is independent of another, B if \(P(AB) = P(A)P(B)\)
- Otherwise there exists a dependence or correlation
Correlation between two events:
- \(corr_{A,B} = \frac{P(AB)}{P(A)P(B)}\)
- A value smaller than 1 indicates that the occurrence of one decreases the occurrence of the other
- A value larger than 1 indicates that the occurrence of one increases the occurrence of the other
- A value of 0 indicates no relation at all, (they are independent)

Association versus Correlation

Now we can find correlated association rules
- May estimate if that correlation is significative with an statistical test \(X^2\)
  - If a set of elements is correlated, any superset of this will be as well
  - This may help to find the minimum correlated sets and build their supersets from them

Meta-Rules

Allow especify the form of the rules
We can look for association rules with especific forms
- P1(X,Y) & P2(X,W) \(\rightarrow\) buy(X, “KDD-BOOKS”)
- Where \(P_i\) is a variable predicate that takes instantiates to value of some attribute of the database
- X, Y, and W are possible attribute values

Use of Restrictions

We may use restrictions over the types of data, hierarchies, or possible forms of the rules to find in order to reduce the search space
Restrictions may be
- Antimonotonic
  - If a set doesn’t satisfy a condition, then its supersets won’t satisfy it either
- Monotone
  - If a set satisfies a restriction, then all its supersets will also satisfy it
- Succint
  - We can ennumerate all the sets that satisfy a restriction
- Convertible
  - We can convert a restriction to one of the previous classes
- Non-convertible

Association Rules vs Classification Rules vs Decision Trees

Association rules versus Classification rules
- Dependencies exploration vs Focused prediction
- Different combinations of dependent or independent attributes vs Prediction of an attribute (class) from other attributes
- Complete search (all rules found) vs Heuristic search (finds a subset of rules)

Association Rules vs Classification Rules vs Decision Trees

Trees use an evaluation heuristic over an attribute
- Based on splitting
- Usually produce overfit followed by pruning
Classification rules use an evaluation heuristic over an attribute-value pair condition
- Based on covering
- Using an stop criterion (sometimes overfit and then prune)
Association rules are based on support and confidence measures
- Consider any set of attributes with any other set of attributes

Association Rules

Machine Learning

Data Mining with Association Rules

Applications

Association Rules

Association Rules

Example

Association Rules

Definitions

Definitions

Rule Evaluation

Rule Evaluation

Rule Evaluation

Rule Evaluation

Rule Evaluation

Example

Example

Example

Example

Algorithm

Algorithm

Algorithm

Algorithm

Algorithm

Algorithm

Remarks

Remarks

Remarks

Example

Remarks

Enhancements (for efficieny)

Enhancements (for efficiency)

Extensions

Extensions

Extensions

Association versus Correlation

Association versus Correlation

Association versus Correlation

Meta-Rules

Use of Restrictions

Association Rules vs Classification Rules vs Decision Trees

Association Rules vs Classification Rules vs Decision Trees